Adding New Upstream#
Discuss#
Please make sure that you already go through General Guideline. Again, please make sure we have consensus on the new feature request. The best and the most transparent way is to submit your feature request.
Copy from the template#
To add new upstream, you can start with an example
Suppose your new upstream called my_awesome_upstream
, the simplest way to start will be the following:
cd ${S3PRL_ROOT}
cp -r s3prl/upstream/example/ s3prl/upstream/my_awesome_upstream
In
s3prl/upstream/my_awesome_upstream/hubconf.py
, changecustomized_upstream
tomy_entry_1
In
s3prl/hub.py
, addfrom s3prl.upstream.my_awesome_upstream.hubconf import *
python3 utility/extract_feat.py my_entry_1 sample_hidden_states
# this script extract hidden states from an upstream entry to the "sample_hidden_states" folder
This will extract the hidden states from this my_entry_1
entry.
The default content in s3prl/upstream/example/
always works, so you can simply edit the files
inside the new s3prl/upstream/my_awesome_upstream
folder to enable your new upstream.
Implement#
The folder is in the following structure:
my_awesome_upstream
|
---- expert.py
|
---- hubconf.py
In principle, hubconf.py
serves as the URL registry, where each callable function is an entry specifying
the source of the checkpoint, while the expert.py
serves as the wrapper of your model definition to fit
with our upstream API.
During your implementation, please try to remove as many package dependencies as possible, since the upstream functionality is our core feature, and should have minimal dependencies to be maintainable.
Tests#
After you implementation, please make sure all your entries can pass the tests
The test_upstream_with_extracted
test case requires you to pre-extract the expected hidden states via:
python3 utility/extract_feat.py my_awesome_upstream ./sample_hidden_states
That is, the test case expects there will be a my_awesome_upstream.pt
in the sample_hidden_states
folder.
All the existing sampled hidden states are hosted at a Huggingface Dataset Repo,
and we expect you to clone (by git lfs
) this sample_hidden_states
repo and add the sampled hidden states for your new entries.
To make changes to this hidden states repo, please follow the steps here to create a pull request, so that our core maintainer can sync the hidden states extracted by you.
In conclusion, to add new upstream one needs to make two pull requests:
Note
In fact, due to the huge time cost, most of the upstreams in S3PRL will not be tested in Github Action CI (or else it will take several hours to download all the checkpoints for every PRs). However, our core maintainers will still clone the repository and run tox locally to make sure everything works fine, and there is a tox environment testing all the upstreams.
Documentation#
After all the implementation, make sure your efforts are known by the users by adding documentation of your entries at the S3PRL Upstream Collection tutorial page. Also, you can add your name at the bottom of the tutorial page if you like.