Contribute

GluonNLP community welcomes contributions from anyone! Latest documentation can be found here.

There are lots of opportunities for you to become our contributors:

For a list of open starter tasks, check good first issues.

How-to

Make changes

Our package uses continuous integration and code coverage tools for verifying pull requests. Before submitting, contributor should perform the following checks:

Contribute Scripts

The scripts in GluonNLP are typically for reproducing state-of-the-art (SOTA) results, or for a simple and interesting application. They are intended for practitioners who are familiar with the libraries to tweak and hack. For SOTA scripts, we usually request training scripts to be uploaded here, and then linked to in the example documentation.

See existing examples.

Contribute Examples

Our examples are intended for people who are interested in NLP and want to get better familiarized on different parts in NLP. In order for people to easily understand the content, the code needs to be clean and readable, accompanied by good quality writing.

See existing examples.

We suggest you start the example with Jupyter notebook. When the content is ready, please

  • clear the output cells in the jupyter notebook,
  • install notedown, run
  • notedown input.ipynb –to markdown > output.md

and submit the .md file for submission.

Contribute new API

There are several different types of APIs, such as model definition APIs, public dataset APIs, and building block APIs.

Model definition APIs facilitate the sharing of pre-trained models. If you’d like to contribute models with pre-trained weights, you can open an issue and ping committers first, we will help with things such as hosting the model weights while you propose the patch.

Public dataset APIs facilitate the sharing of public datasets. Like model definition APIs, if you’d like to contribute new public datasets, you can open an issue and ping committers and review the dataset needs. If you’re unsure, feel free to open an issue anyway.

Finally, our data and model building block APIs come from repeated patterns in examples. It has the highest quality bar and should always starts from a good design. If you have an idea on proposing a new API, we encourage you to draft a design proposal first, so that the community can help iterate. Once the design is finalized, everyone who are interested in making it happen can help by submitting patches. For designs that require larger scopes, we can help set up GitHub project to make it easier for others to join.

Contribute Docs

Documentation is no less important than code. Good documentation delivers the correct message clearly and concisely. If you see any issue in the existing documentation, a patch to fix is most welcome! To locate the code responsible for the doc, you may use “View page source” in the top right corner, or the “[source]” links after each API. Also, git grep works nicely if there’s unique string.

Git Workflow Howtos

How to submit pull request

  • Before submit, please rebase your code on the most recent version of master, you can do it by
git remote add upstream https://github.com/dmlc/gluon-nlp
git fetch upstream
git rebase upstream/master
  • If you have multiple small commits, it might be good to merge them together(use git rebase then squash) into more meaningful groups.
  • Send the pull request!
    • Fix the problems reported by automatic checks
    • If you are contributing a new module or new function, add a test.

How to resolve conflict with master

  • First rebase to most recent master
# The first two steps can be skipped after you do it once.
git remote add upstream https://github.com/dmlc/gluon-nlp
git fetch upstream
git rebase upstream/master
  • The git may show some conflicts it cannot merge, say conflicted.py.

    • Manually modify the file to resolve the conflict.
    • After you resolved the conflict, mark it as resolved by
    git add conflicted.py
    
  • Then you can continue rebase by

git rebase --continue
  • Finally push to your fork, you may need to force push here.
git push --force

How to combine multiple commits into one

Sometimes we want to combine multiple commits, especially when later commits are only fixes to previous ones, to create a PR with set of meaningful commits. You can do it by following steps. - Before doing so, configure the default editor of git if you haven’t done so before.

git config core.editor the-editor-you-like
  • Assume we want to merge last 3 commits, type the following commands
git rebase -i HEAD~3
  • It will pop up an text editor. Set the first commit as pick, and change later ones to squash.
  • After you saved the file, it will pop up another text editor to ask you modify the combined commit message.
  • Push the changes to your fork, you need to force push.
git push --force

Reset to the most recent master

You can always use git reset to reset your version to the most recent master. Note that all your *local changes will get lost*. So only do it when you do not have local changes or when your pull request just get merged.

git reset --hard [hash tag of master]
git push --force

What is the consequence of force push

The previous two tips requires force push, this is because we altered the path of the commits. It is fine to force push to your own fork, as long as the commits changed are only yours.