Adding New Upstream#

Discuss#

Please make sure that you already go through General Guideline. Again, we might not always want new contributions, hence please make sure we have consensus on the new feature request. The best and the most transparent way is to submit your feature request.

Copy from the template#

To add new upstream, you can start with an example Suppose your new upstream called my_awesome_upstream, the simplest way to start will be the following:

cd ${S3PRL_ROOT}
cp -r s3prl/upstream/example/ s3prl/upstream/my_awesome_upstream
  1. In s3prl/upstream/my_awesome_upstream/hubconf.py, change customized_upstream to my_entry_1

  2. In s3prl/hub.py, add from s3prl.upstream.my_awesome_upstream.hubconf import *

python3 utility/extract_feat.py my_entry_1 sample_hidden_states
# this script extract hidden states from an upstream entry to the "sample_hidden_states" folder

This will extract the hidden states from this my_entry_1 entry. The default content in s3prl/upstream/example/ always works, so you can simply edit the files inside the new s3prl/upstream/my_awesome_upstream folder to enable your new upstream.

Implement#

The folder is in the following structure:

my_awesome_upstream
|
 ---- expert.py
|
 ---- hubconf.py

In principle, hubconf.py serves as the URL registry, where each callable function is an entry specifying the source of the checkpoint, while the expert.py serves as the wrapper of your model definition to fit with our upstream API.

During your implementation, please try to remove as many package dependencies as possible, since the upstream functionality is our core feature, and should have minimal dependencies to be maintainable.

Tests#

After you implementation, please make sure all your entries can pass the tests The test_upstream_with_extracted test case requires you to pre-extract the expected hidden states via:

python3 utility/extract_feat.py my_awesome_upstream ./sample_hidden_states

That is, the test case expects there will be a my_awesome_upstream.pt in the sample_hidden_states folder.

All the existing sampled hidden states are hosted at a Huggingface Dataset Repo, and we expect you to clone (by git lfs) this sample_hidden_states repo and add the sampled hidden states for your new entries.

To make changes to this hidden states repo, please follow the steps here to create a pull request, so that our core maintainer can sync the hidden states extracted by you.

In conclusion, to add new upstream one needs to make two pull requests:

Note

In fact, due to the huge time cost, most of the upstreams in S3PRL will not be tested in Github Action CI (or else it will take several hours to download all the checkpoints for every PRs). However, our core maintainers will still clone the repository and run tox locally to make sure everything works fine, and there is a tox environment testing all the upstreams.

Documentation#

After all the implementation, make sure your efforts are known by the users by adding documentation of your entries at the S3PRL Upstream Collection tutorial page. Also, you can add your name at the bottom of the tutorial page if you like.