problem#
(s3prl.problem)
Pre-defined python recipes with customizable methods
Speech-to-text based recipes |
|
Speaker Verification recipes |
|
The shared backbone of common ML train/test procedure for all problems |
|
The most common and simple train/valid/test recipes |
|
Speaker Diarization recipes |
SuperbASR#
- class s3prl.problem.SuperbASR[source][source]#
Bases:
ASR- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_dataprepare_tokenizer_databuild_tokenizerbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainstart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? train_sets: - train-clean-100 valid_sets: - dev-clean test_sets: - test-clean prepare_tokenizer_data: {} build_tokenizer: vocab_type: character build_dataset: {} build_batch_sampler: train: batch_size: 32 max_length: 2000 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: model_conf: module: LSTM proj_size: 1024 hidden_size: - 1024 - 1024 dropout: - 0.2 - 0.2 layer_norm: - false - false proj: - false - false sample_rate: - 1 - 1 sample_style: concat bidirectional: true specaug_conf: freq_mask_width_range: !!python/tuple - 0 - 50 num_freq_mask: 4 time_mask_width_range: !!python/tuple - 0 - 40 num_time_mask: 2 build_model: upstream_trainable: false build_task: log_metrics: - cer - wer build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: extra_conf: build_downstream_conf: ${build_downstream} save_task: {} train: total_steps: 200000 log_step: 100 eval_step: 2000 save_step: 500 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: wer valid_higher_better: false auto_resume: true resume_ckpt_dir: null
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
prepare_librispeechwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments inprepare_librispeechtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
transcription
(str) - a text string
- prepare_tokenizer_data(prepare_tokenizer_data: dict, target_dir: str, cache_dir: str, train_csv: str, valid_csv: str, test_csvs: List[str], get_path_only: bool = False)[source][source]#
Prepare the text file used for training tokenizer. By default only use the transcription in the
train_csvreturned fromprepare_dataThe defaultprepare_tokenizer_databuild the character-based tokenizer- Parameters:
prepare_tokenizer_data (dict) – same in
default_config, no supported argument for nowtarget_dir (str) – Save the text file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv (str) – The train data given by
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
The text file path, the text file should be in the format
This is the first line This is the second line These are all text used for training tokenizer
- build_tokenizer(build_tokenizer: dict, target_dir: str, cache_dir: str, tokenizer_data_path: str, get_path_only: bool = False)[source][source]#
Build the tokenizer from the data prepared by
prepare_tokenizer_dataBy default callprepare_common_tokenizerwith**build_tokenizer- Parameters:
build_tokenizer (dict) – same in
default_config, arguments forprepare_common_tokenizertarget_dir (str) – Current experinment directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)tokenizer_data_path (str) – The text file from
prepare_tokenizer_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
filepath of the pickled
s3prl.dataio.encoder.tokenizer.Tokenizer
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, tokenizer_path: str)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) – same in
default_config, not usedtarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modetokenizer_path (str) – The pickled tokenizer path for encoding transcription
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_ids
(torch.LongTensor) - the encoded class ids of a transcription (sentence)
labels
(str) - the text transcription
unique_name
(str) - the unique id for this datapoint
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset: Dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
SortedBucketingSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#
Return the task-specific downstream model. By default build the
RNNEncodermodel wrapped withModelWithSpecaug- Parameters:
build_downstream (dict) – same in
default_config, has two keys:model_confis the arguments forRNNEncoder;specaug_confis the arguments forModelWithSpecaugdownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, prepare_tokenizer_data: dict = None, build_tokenizer: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file for ASR (waveform path, label…)
1
Prepare the metadata file for training tokenizer
2
Train the tokenizer
3
Train the ASR model
4
Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See
test_ckpt_steps)- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by
test_ckpts_steps.**others – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbPR#
- class s3prl.problem.SuperbPR[source][source]#
Bases:
SuperbASR- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_dataprepare_tokenizer_databuild_tokenizerbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? train_sets: - train-clean-100 valid_sets: - dev-clean test_sets: - test-clean prepare_tokenizer_data: {} build_tokenizer: vocab_type: phoneme build_dataset: {} build_batch_sampler: train: batch_size: 16 max_length: 300000 valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_size: 256 build_model: upstream_trainable: false build_task: log_metrics: - per build_optimizer: name: Adam conf: lr: 0.01 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: extra_conf: build_downstream_conf: ${build_downstream} save_task: {} train: total_steps: 100000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 2 valid_metric: per valid_higher_better: false auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
prepare_librispeechwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments inprepare_librispeechtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
transcription
(str) - a text string
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
SortedSliceSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#
Return the task-specific downstream model. By default build the
FrameLevelLinear- Parameters:
build_downstream (dict) – same in
default_config, supports arguments inFrameLevelLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, tokenizer_path: str)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) – same in
default_config, not usedtarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modetokenizer_path (str) – The pickled tokenizer path for encoding transcription
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_ids
(torch.LongTensor) - the encoded class ids of a transcription (sentence)
labels
(str) - the text transcription
unique_name
(str) - the unique id for this datapoint
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_tokenizer(build_tokenizer: dict, target_dir: str, cache_dir: str, tokenizer_data_path: str, get_path_only: bool = False)[source]#
Build the tokenizer from the data prepared by
prepare_tokenizer_dataBy default callprepare_common_tokenizerwith**build_tokenizer- Parameters:
build_tokenizer (dict) – same in
default_config, arguments forprepare_common_tokenizertarget_dir (str) – Current experinment directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)tokenizer_data_path (str) – The text file from
prepare_tokenizer_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
filepath of the pickled
s3prl.dataio.encoder.tokenizer.Tokenizer
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_tokenizer_data(prepare_tokenizer_data: dict, target_dir: str, cache_dir: str, train_csv: str, valid_csv: str, test_csvs: List[str], get_path_only: bool = False)[source]#
Prepare the text file used for training tokenizer. By default only use the transcription in the
train_csvreturned fromprepare_dataThe defaultprepare_tokenizer_databuild the character-based tokenizer- Parameters:
prepare_tokenizer_data (dict) – same in
default_config, no supported argument for nowtarget_dir (str) – Save the text file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv (str) – The train data given by
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
The text file path, the text file should be in the format
This is the first line This is the second line These are all text used for training tokenizer
- run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, prepare_tokenizer_data: dict = None, build_tokenizer: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file for ASR (waveform path, label…)
1
Prepare the metadata file for training tokenizer
2
Train the tokenizer
3
Train the ASR model
4
Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See
test_ckpt_steps)- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by
test_ckpts_steps.**others – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbSF#
- class s3prl.problem.SuperbSF[source][source]#
Bases:
SuperbASR- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_dataprepare_tokenizer_databuild_tokenizerbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainstart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? train_speakers: - Ivy - Joanna - Joey - Justin - Kendra - Kimberly - Matthew - Salli valid_speakers: - Aditi - Amy - Geraint - Nicole test_speakers: - Brian - Emma - Raveena - Russell prepare_tokenizer_data: {} build_tokenizer: vocab_type: character build_dataset: {} build_batch_sampler: train: batch_size: 32 max_length: 300000 valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: model_conf: module: LSTM proj_size: 1024 hidden_size: - 1024 - 1024 dropout: - 0.2 - 0.2 layer_norm: - false - false proj: - false - false sample_rate: - 1 - 1 sample_style: concat bidirectional: true specaug_conf: freq_mask_width_range: !!python/tuple - 0 - 50 num_freq_mask: 4 time_mask_width_range: !!python/tuple - 0 - 40 num_time_mask: 2 build_model: upstream_trainable: false build_task: log_metrics: - wer - cer - slot_type_f1 - slot_value_cer - slot_value_wer - slot_edit_f1_full - slot_edit_f1_part build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 200000 log_step: 100 eval_step: 2000 save_step: 500 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: slot_type_f1 valid_higher_better: true auto_resume: true resume_ckpt_dir: null
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
audio_snips_for_slot_fillingwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments inaudio_snips_for_slot_fillingtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
transcription
- (str) - a text string where words are separted by a space.
Eg. “I want to fly from Taipei to New York”
iob
- (str) - iob tags, use “O” if no tag, every word should have a tag, separted by a space.
Eg. “O O O O O from_location O to_location to_location”
- prepare_tokenizer_data(prepare_tokenizer_data: dict, target_dir: str, cache_dir: str, train_csv: str, valid_csv: str, test_csvs: str, get_path_only: bool = False)[source][source]#
Prepare the text file used for training tokenizer. By default only use the transcription in the
train_csvreturned fromprepare_dataThe defaultprepare_tokenizer_databuild the character-based tokenizer- Parameters:
prepare_tokenizer_data (dict) – same in
default_config, no supported argument for nowtarget_dir (str) – Save the text file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv (str) – The train data given by
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
The text file path, the text file should be in the format
This is the first line This is the second line These are all text used for training tokenizer
- build_tokenizer(build_tokenizer: dict, target_dir: str, cache_dir: str, tokenizer_data_path: str, get_path_only: bool = False)[source][source]#
Build the tokenizer from the data prepared by
prepare_tokenizer_dataBy default callprepare_common_tokenizerwith**build_tokenizer- Parameters:
build_tokenizer (dict) – same in
default_config, arguments forprepare_common_tokenizertarget_dir (str) – Current experinment directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)tokenizer_data_path (str) – The text file from
prepare_tokenizer_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
filepath of the pickled
s3prl.dataio.encoder.tokenizer.Tokenizer
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, tokenizer_path: str)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) – same in
default_config, not usedtarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modetokenizer_path (str) – The pickled tokenizer path for encoding transcription
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_ids
(torch.LongTensor) - the encoded class ids of a transcription (sentence)
labels
(str) - the text transcription
unique_name
(str) - the unique id for this datapoint
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
SortedSliceSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
RNNEncodermodel wrapped withModelWithSpecaug- Parameters:
build_downstream (dict) – same in
default_config, has two keys:model_confis the arguments forRNNEncoder;specaug_confis the arguments forModelWithSpecaugdownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, prepare_tokenizer_data: dict = None, build_tokenizer: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file for ASR (waveform path, label…)
1
Prepare the metadata file for training tokenizer
2
Train the tokenizer
3
Train the ASR model
4
Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See
test_ckpt_steps)- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by
test_ckpts_steps.**others – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbASV#
- class s3prl.problem.SuperbASV[source][source]#
Bases:
ASV- default_config()[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_modelbuild_taskbuild_optimizerbuild_schedulertraintarget_dir: ??? cache_dir: null test_ckpt_steps: null prepare_data: dataset_root: ??? build_dataset: train: min_secs: 2.0 max_secs: 8.0 build_batch_sampler: train: batch_size: 10 shuffle: true test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_model: upstream_trainable: false build_task: loss_type: amsoftmax loss_conf: margin: 0.4 scale: 30 build_optimizer: name: AdamW conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 train: total_steps: 200000 log_step: 500 eval_step: 1.0e+20 save_step: 10000 gradient_clipping: 1000.0 gradient_accumulate: 5 valid_metric: null valid_higher_better: null auto_resume: true resume_ckpt_dir: null keep_num_ckpts: null
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
prepare_voxceleb1_for_svwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments inprepare_voxceleb1_for_svtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (bool) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
test_trial_paths (List[str])
The
train_pathshould be a csv file containing the following columns:column
description
id
(str) - the unique id for this utterance
wav_path
(str) - the absolute path of the waveform file
spk
(str) - a string speaker label
Each
test_trial_pathshould be a csv file containing the following columns:column
description
id1
(str) - the unique id of the first utterance
id2
(str) - the unique id of the second utterance
wav_path1
(str) - the absolute path of the first utterance
wav_path2
(str) - the absolute path of the second utterance
label
(int) - 0 when two utterances are from different speakers, 1 when same speaker
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv: str, test_csvs: list, get_path_only: bool)[source][source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of the train csv.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (bool) – Directly return the filepaths no matter they exist or not
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config, havetrainandtestkeys, each is a dictionary, fortraindictionary:key
description
min_secs
(float) - Drop a waveform if it is not longer than
min_secsmax_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secsseconds. Default: None, no croppingfor
testdictionary, no argument supported yettarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For train mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(str) - the label class id encoded by
encoder_pathunique_name
(str) - the unique id for this datapoint
For test mode:
x (torch.FloatTensor) - the waveform in (seq_len, 1) x_len (int) - the waveform length
seq_lenunique_name (str) - the unique id for this datapoint
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplerNote that ASV does not support valid
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#
Return the task-specific downstream model. By default build the
SuperbXvectormodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofSuperbXvectordownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model, encoder, test_trials=None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
SpeakerVerification- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encodertest_trials (List[Tuple[int, str, str]]) – each tuple in the list consists of
(label, enroll_utt_id, test_utt_id). label is either 0 or 1
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, test_ckpt_steps: List[int] = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder for encoding the speaker labels
2
Train the model
3
Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See
test_ckpt_steps)4
Report the best result find on each test set
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by
test_ckpts_steps.test_ckpt_steps (List[int]) – After training, multiple steps of checkpoints are saved. This option specifies which checkpoints (multiple) will be used for evaluation.
**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbER#
- class s3prl.problem.SuperbER[source][source]#
Bases:
SuperbSID- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_encoderbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: iemocap: ??? test_fold: ??? build_encoder: {} build_dataset: {} build_batch_sampler: train: batch_size: 4 shuffle: true valid: batch_size: 4 test: batch_size: 4 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_size: 256 build_model: upstream_trainable: false build_task: {} build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 30000 log_step: 500 eval_step: 1000 save_step: 1000 gradient_clipping: 1.0 gradient_accumulate: 8 valid_metric: accuracy valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
iemocap_for_superbwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments iniemocap_for_superbtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbIC#
- class s3prl.problem.SuperbIC[source][source]#
Bases:
Common- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_encoderbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainstart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? build_encoder: {} build_dataset: {} build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 32 test: batch_size: 32 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_size: 256 build_model: upstream_trainable: false build_task: {} build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 200000 log_step: 100 eval_step: 5000 save_step: 250 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: accuracy valid_higher_better: true auto_resume: true resume_ckpt_dir: null
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
fsc_for_multi_classificationwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, arguments forfsc_for_multi_classificationtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
labels
(str) - the string labels of the waveform, separated by a ‘;’
The number of the label columns can be arbitrary.
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncodersfrom all the columns prefixinglabelfrom all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (bool) – Directly return the filepaths no matter they exist or not.
- Returns:
str
tokenizer_path: The tokenizer should be saved in the pickle format
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_ids
(torch.LongTensor) - the encoded class ids. shape: (num_class, )
labels
(List[str]) - the class name. length: num_class
unique_name
(str) - the unique id for this datapoint
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset: Dataset)[source][source]#
Return the batch sampler for torch DataLoader. By default call
superb_sid_batch_samplerwith**build_batch_sampler.- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
AbsUtteranceModel
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source][source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceMultiClassClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encodervalid_df (pd.DataFrame) – metadata of the valid set
test_df (pd.DataFrame) – metadata of the test set
- Returns:
Task
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbKS#
- class s3prl.problem.SuperbKS[source][source]#
Bases:
SuperbSID- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_encoderbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: gsc1: ??? gsc1_test: ??? build_encoder: {} build_dataset: train: sox_effects: - - channels - '1' - - rate - '16000' - - gain - '-3.0' valid: sox_effects: - - channels - '1' - - rate - '16000' - - gain - '-3.0' test: sox_effects: - - channels - '1' - - rate - '16000' - - gain - '-3.0' build_batch_sampler: train: batch_size: 32 valid: batch_size: 32 test: batch_size: 32 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_size: 256 build_model: upstream_trainable: false build_task: {} build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 200000 log_step: 100 eval_step: 5000 save_step: 1000 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: accuracy valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
gsc1_for_classificationwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments ingsc1_for_classificationtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
tokenizer_path: The tokenizer should be saved in the pickle format
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset: Dataset)[source][source]#
Return the batch sampler for torch DataLoader. By default for train and valid, use
BalancedWeightedSampler; for test useFixedBatchSizeBatchSampler- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
BalancedWeightedSamplervalid
(dict) - arguments for
BalancedWeightedSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_downsample_rate: int)[source][source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
AbsUtteranceModel
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbSID#
- class s3prl.problem.SuperbSID[source][source]#
Bases:
CommonThe standard SUPERB SID task
- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_encoderbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? build_encoder: {} build_dataset: train: max_secs: 8.0 build_batch_sampler: train: batch_size: 8 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_size: 256 build_model: upstream_trainable: false build_task: {} build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 200000 log_step: 500 eval_step: 5000 save_step: 1000 gradient_clipping: 1.0 gradient_accumulate: 4 valid_metric: accuracy valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
SuperbSD#
- class s3prl.problem.SuperbSD[source][source]#
Bases:
Diarization- default_config()[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_optimizerbuild_schedulersave_modelsave_tasktrainscoringstart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: data_dir: ??? build_dataset: chunk_size: 2000 subsampling: 1 rate: 16000 use_last_samples: true label_delay: 0 build_batch_sampler: train: batch_size: 8 shuffle: true valid: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_size: 512 rnn_layers: 1 build_model: upstream_trainable: false build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: extra_conf: build_downstream_conf: ${build_downstream} save_task: {} train: total_steps: 30000 log_step: 500 eval_step: 500 save_step: 500 gradient_clipping: 1.0 gradient_accumulate: 4 valid_metric: der valid_higher_better: false auto_resume: true resume_ckpt_dir: null scoring: thresholds: - 0.3 - 0.4 - 0.5 - 0.6 - 0.7 median_filters: - 1 - 11
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only=False)[source][source]#
Prepare the task-specific data metadata (path, labels…).
- Parameters:
prepare_data (dict) –
same in
default_configkey
description
data_dir
(str) - the standard Kaldi data directory
target_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
record_id
(str) - the id for the recording
duration
(float) - the total seconds of the recording
wav_path
(str) - the absolute path of the recording
utt_id
(str) - the id for the segmented utterance, should be globally unique across all recordings instead of just unique in a recording
speaker
(str) - the speaker label for the segmented utterance
start_sec
(float) - segment start second in the recording
end_sec
(float) - segment end second in the recording
Instead of one waveform file per row, the above file format is one segment per row, and a waveform file can have multiple overlapped segments uttered by different speakers.
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, data_dir: str, num_speakers: int, frame_shift: int)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) – same in
default_config, supports arguments forDiarizationDatasettarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modedata_dir (str) – The converted kaldi data directory from
data_csvnum_speakers (int) – The number of speaker per utterance
frame_shift (int) – The frame shift of the upstream model (downsample rate from 16 KHz)
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenlabel
(torch.LongTensor) - the binary label for each upstream frame, shape:
(upstream_len, 2)label_len
(int) - the upstream feature’s seq length
upstream_lenrecord_id
(str) - the unique id for the recording
chunk_id
(int) - since recording can be chunked into several segments for efficient training, this field indicate the segment’s original position (order, 0-index) in the recording. This field is only useful during the testing stage
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, data_dir: str, dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
GroupSameItemSampler, should always use this batch sampler for the testing stagetarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modedata_dir (str) – The converted kaldi data directory from
data_csvdataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#
Return the task-specific downstream model. By default build the
SuperbDiarizationModelmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofSuperbDiarizationModeldownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
DiarizationPIT- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_model
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, num_speaker: int = 2, prepare_data: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None, scoring: dict = None)[source]#
stage
description
0
Parse the corpus and save the Kaldi-style data directory for speaker diarization
1
Train the model
2
Inference the prediction
3
Score the prediction
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by
test_ckpts_steps.num_speaker (int) – How many speakers per utterance
**others – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- scoring(scoring: dict, stage_id: int, test_dirs: List[str], test_rttms: List[str], frame_shift: int)[source]#
Score the prediction
- Parameters:
scoring (dict) –
key
description
thresholds
(List[int]) - Given the 0~1 (float) soft prediction, the threshold decides how to get the 0/1 hard prediction. This list are all the thresholds to try.
median_filters
(List[int]) - After getting hard prediction, use median filter to smooth out the prediction. This list are all the median filter sizes to try.
*others – This method is not designed to be overridden
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearFSD#
- class s3prl.problem.HearFSD[source][source]#
Bases:
SuperbSID- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? build_batch_sampler: train: batch_size: 10 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multilabel scores: - mAP - top1_acc - d_prime - aucroc build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 40000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: mAP valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source][source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearESC50#
- class s3prl.problem.HearESC50[source][source]#
Bases:
HearFSD- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - d_prime - aucroc - mAP build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 4000 log_step: 100 eval_step: 500 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 4 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearBeijingOpera#
- class s3prl.problem.HearBeijingOpera[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - d_prime - aucroc - mAP build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearCremaD#
- class s3prl.problem.HearCremaD[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - mAP - d_prime - aucroc build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearGSC5hr#
- class s3prl.problem.HearGSC5hr[source][source]#
Bases:
HearFSD- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearGtzanMusicSpeech#
- class s3prl.problem.HearGtzanMusicSpeech[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - mAP - d_prime - aucroc build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearGtzan#
- class s3prl.problem.HearGtzan[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 10 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - mAP - d_prime - aucroc build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearGunshot#
- class s3prl.problem.HearGunshot[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 7 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - d_prime - aucroc - mAP build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearLibriCount#
- class s3prl.problem.HearLibriCount[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - d_prime - aucroc - mAP build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearNsynth5hr#
- class s3prl.problem.HearNsynth5hr[source][source]#
Bases:
HearFSD- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - pitch_acc - chroma_acc build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: pitch_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearStroke#
- class s3prl.problem.HearStroke[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - d_prime - aucroc - mAP build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearTonic#
- class s3prl.problem.HearTonic[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - d_prime - aucroc - mAP build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearVocal#
- class s3prl.problem.HearVocal[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 3 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - mAP - top1_acc - d_prime - aucroc build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: mAP valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearVoxLingual#
- class s3prl.problem.HearVoxLingual[source][source]#
Bases:
HearESC50- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? num_folds: 5 build_batch_sampler: train: batch_size: 32 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 pooling_type: MeanPooling build_model: upstream_trainable: false build_task: prediction_type: multiclass scores: - top1_acc - d_prime - aucroc - mAP build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 150000 log_step: 100 eval_step: 1000 save_step: 100 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: top1_acc valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearDcase2016Task2#
- class s3prl.problem.HearDcase2016Task2[source][source]#
Bases:
HearFSD- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? build_dataset: train: chunk_secs: 4.0 step_secs: 4.0 valid: chunk_secs: 4.0 step_secs: 4.0 test: chunk_secs: 4.0 step_secs: 4.0 build_batch_sampler: train: batch_size: 5 shuffle: true build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 build_model: upstream_trainable: false build_task: prediction_type: multilabel scores: - event_onset_200ms_fms - segment_1s_er postprocessing_grid: median_filter_ms: - 250 min_duration: - 125 - 250 build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 15000 log_step: 100 eval_step: 500 save_step: 500 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: event_onset_200ms_fms valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source][source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
HearMaestro#
- class s3prl.problem.HearMaestro[source][source]#
Bases:
HearDcase2016Task2- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: dataset_root: ??? test_fold: ??? build_batch_sampler: train: batch_size: 5 shuffle: true valid: item: record_id test: item: record_id build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_layers: 2 build_model: upstream_trainable: false build_task: prediction_type: multilabel scores: - event_onset_50ms_fms - event_onset_offset_50ms_20perc_fms postprocessing_grid: median_filter_ms: - 150 min_duration: - 50 build_optimizer: name: Adam conf: lr: 0.001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 15000 log_step: 100 eval_step: 500 save_step: 500 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: event_onset_50ms_fms valid_higher_better: true auto_resume: true resume_ckpt_dir: null evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
CommonExample#
- class s3prl.problem.CommonExample[source][source]#
Bases:
SuperbSID- default_config() dict[source][source]#
The default arguments for
runin yaml. Note that for the fields with inner values, likebuild_model, the outer field name corresponds to a method name, so you can find the methodbuild_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.The methods affected by the following config are:
prepare_databuild_encoderbuild_datasetbuild_batch_samplerbuild_upstreambuild_featurizerbuild_downstreambuild_modelbuild_taskbuild_optimizerbuild_schedulersave_modelsave_tasktrainevaluatestart: 0 stop: null target_dir: ??? cache_dir: null remove_all_cache: false prepare_data: {} build_encoder: {} build_dataset: train: max_secs: 8.0 build_batch_sampler: train: batch_size: 8 shuffle: true valid: batch_size: 1 test: batch_size: 1 build_upstream: name: ??? build_featurizer: layer_selections: null normalize: false build_downstream: hidden_size: 256 build_model: upstream_trainable: false build_task: {} build_optimizer: name: Adam conf: lr: 0.0001 build_scheduler: name: ExponentialLR gamma: 0.9 save_model: {} save_task: {} train: total_steps: 10 log_step: 1 eval_step: 5 save_step: 5 gradient_clipping: 1.0 gradient_accumulate: 1 valid_metric: accuracy valid_higher_better: true auto_resume: true evaluate: {}
- prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#
Prepare the task-specific data metadata (path, labels…). By default call
voxceleb1_for_sidwith**prepare_data- Parameters:
prepare_data (dict) – same in
default_config, support arguments invoxceleb1_for_sidtarget_dir (str) – Parse your corpus and save the csv file into this directory
cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)get_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
tuple
train_path (str)
valid_path (str)
test_paths (List[str])
Each path (str) should be a csv file containing the following columns:
column
description
id
(str) - the unique id for this data point
wav_path
(str) - the absolute path of the waveform file
label
(str) - a string label of the waveform
start_sec
(float) - optional, load the waveform from
start_secseconds. If not presented or ismath.nan, load from the beginning.end_sec
(float) - optional, load the waveform from
end_secseconds. If not presented or ismath.nan, load to the end.
- build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#
Return the batch sampler for torch DataLoader.
- Parameters:
build_batch_sampler (dict) –
same in
default_configkey
description
train
(dict) - arguments for
FixedBatchSizeBatchSamplervalid
(dict) - arguments for
FixedBatchSizeBatchSamplertest
(dict) - arguments for
FixedBatchSizeBatchSamplertarget_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – the
modespecific csv fromprepare_datadataset – the dataset from
build_dataset
- Returns:
batch sampler for torch DataLoader
- build_collate_fn(build_collate_fn: dict, mode: str)[source]#
By default returns
s3prl.dataset.base.default_collate_fn- Parameters:
build_collate_fn (dict) – same in
default_config, no argument supported for nowmode (str) – train, valid, or test
- Returns:
callable
the collate_fn for torch DataLoader in train/valid/test
mode
- build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#
Build the dataset for train/valid/test.
- Parameters:
build_dataset (dict) –
same in
default_config. withtrain,valid,testkeys, each is a dictionary with the following supported options:key
description
max_secs
(float) - If a waveform is longer than
max_secsseconds, randomly crop the waveform intomax_secssecondssox_effects
(List[List[str]]) - If not None, apply sox effects on the utterance
target_dir (str) – Current experiment directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)mode (str) – train/valid/test
data_csv (str) – The metadata csv file for the specific
modeencoder_path (str) – The pickled encoder path for encoding the labels
- Returns:
torch Dataset
For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:
key
description
x
(torch.FloatTensor) - the waveform in (seq_len, 1)
x_len
(int) - the waveform length
seq_lenclass_id
(int) - the encoded class id
label
(str) - the class name
unique_name
(str) - the unique id for this datapoint
- build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#
Return the task-specific downstream model. By default build the
MeanPoolingLinearmodel- Parameters:
build_downstream (dict) – same in
default_config, support arguments ofMeanPoolingLineardownstream_input_size (int) – the required input size of the model
downstream_output_size (int) – the required output size of the model
downstream_input_stride (int) – the input feature’s stride (from 16 KHz)
- Returns:
- build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#
Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a
s3prl.dataio.encoder.CategoryEncoderfrom thelabelcolumn of all the csv files.- Parameters:
build_encoder (dict) – same in
default_config, no argument supported for nowtarget_dir (str) – Save your encoder into this directory
cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and
target_dir)train_csv_path (str) – the train path from
prepare_datavalid_csv_path (str) – the valid path from
prepare_datatest_csv_paths (List[str]) – the test paths from
prepare_dataget_path_only (str) – Directly return the filepaths no matter they exist or not.
- Returns:
str
encoder_path: The encoder should be saved in the pickle format
- build_featurizer(build_featurizer: dict, upstream)[source]#
By default build the featurizer with
s3prl.nn.Featurizer- Parameters:
build_featurizer (dict) – same in
default_config, arguments fors3prl.nn.Featurizerupstream (
AbsUpstream) – the upstream model built bybuild_upstream
- Returns:
s3prl.nn.interface.AbsFeaturizerReturn the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by
build_upstream) into a single hidden state, so can be easliy fed into the downstream model
- build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#
By default build model with
s3prl.nn.upstream.UpstreamDownstreamModel- Parameters:
build_model (dict) – same in
default_config, arguments fors3prl.nn.upstream.UpstreamDownstreamModelmodel_output_size (int) – the required model’s output hidden size
build_upstream (dict) – same in
default_config, refer tobuild_upstreambuild_featurizer (dict) – same in
default_config, refer tobuild_featurizerbuild_downstream (dict) – same in
default_config, refer tobuild_downstream
- Returns:
torch.nn.Module
Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by
build_upstream,build_featurizer,build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.
- build_optimizer(build_optimizer: dict, parameters)[source]#
- Parameters:
build_optimizer (dict) –
same in
default_config, refer to belowkey
description
name
(str) - the optimizer class name in
torch.optimconf
(dict) - the arguments for initializing the optimizer class. e.g.
{"lr": 1.0e-4}parameters (iterable) – the standard params accepted by
torch.optim.Optimizer.
- Returns:
torch.optim.OptimizerAn optimizer following standard torch usage
- build_scheduler(build_scheduler: dict, optimizer)[source]#
- Parameters:
build_scheduler (dict) –
same in
default_configkey
description
name
(str) - the scheduler class name in
torch.optim.lr_schedulerconf
(dict) - the arguments for initializing the scheduler class. e.g.
{"gamma": 0.01}fortorch.optim.lr_scheduler.StepLRoptimizer – the standard torch optimizer accepted by Scheduler in
torch.optim.lr_scheduler.
- Returns:
torch scheduler
A scheduler following standard torch usage
- build_task(build_task: dict, model: Module, encoder, valid_df: DataFrame = None, test_df: DataFrame = None)[source]#
Build the task, which defines the logics for every train/valid/test forward step for the
model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metricsBy default build
UtteranceClassificationTask- Parameters:
build_task (dict) – same in
default_config, no argument supported for nowmodel (torch.nn.Module) – the model built by
build_modelencoder – the encoder built by
build_encoder
- Returns:
Task
- build_upstream(build_upstream: dict)[source]#
By default build the upstream with
s3prl.nn.upstream.S3PRLUpstream- Parameters:
build_upstream (dict) – same in
default_config, arguments fors3prl.nn.upstream.S3PRLUpstream- Returns:
s3prl.nn.interface.AbsUpstreamReturn an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.
- evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#
The evaluate routine used by
train(during validation phase) andrun(during testing phase).- Parameters:
evaluate (dict) – same in
default_config, no argument supported for now**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.
- classmethod get_class_from_name(name: str)[source]#
- Parameters:
name (str) – the
__name__of the problem class- Returns:
Problem
- load_model(model_ckpt_dir: str)[source]#
Return the saved model.
- Parameters:
model_ckpt_dir (str) – Restore the model with
build_modeland the checkpoint saved in this directory.- Returns:
torch.nn.Module
- load_model_and_task(ckpts_dir: str, task_overrides: dict = None)[source]#
This is a helper method to combine
load_modelandload_tasktogether to directly load the model and the task. This method assumes the model is saved underckpts_dir / 'model'and the task is saved underckpts_dir / 'task'- Returns:
tuple
model (
torch.nn.Module)task (
s3prl.task.Task)
- load_task(task_ckpt_dir: str, model: Module, task_overrides: dict = None)[source]#
Return the saved task.
- Parameters:
task_ckpt_dir (str) – Restore the task with
build_taskand the checkpoint saved in this directory.model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for
build_task.task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.
- Returns:
- run(target_dir: str, cache_dir: str = None, remove_all_cache: bool = False, start: int = 0, stop: int = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: str = None, prepare_data: dict = None, build_encoder: dict = None, build_dataset: dict = None, build_batch_sampler: dict = None, build_collate_fn: dict = None, build_upstream: dict = None, build_featurizer: dict = None, build_downstream: dict = None, build_model: dict = None, build_task: dict = None, build_optimizer: dict = None, build_scheduler: dict = None, save_model: dict = None, save_task: dict = None, train: dict = None, evaluate: dict = None)[source]#
stage
description
0
Parse the corpus and save the metadata file (waveform path, label…)
1
Build the encoder to encode the labels
2
Train the model
3
Evaluate the model on multiple test sets
- Parameters:
target_dir (str) – The directory that stores the script result.
cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data
remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False
start (int) – The starting stage of the problem script. Default: 0
stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None
num_workers (int) – num_workers for all the torch DataLoder
eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1
device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”
world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1
rank (int) – When distributed training, world_size > 1. Take
world_size == 8for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case,rankcan range from 0~7. All the 8 processes have the sameworld_sizebut differentrank(process id).test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given
target_dirdirectory.**kwds – The other arguments like
prepare_dataandbuild_modelare method specific-arguments for methods likeprepare_dataandbuild_model, and will not be used in the corerunlogic. See the specific method documentation for their supported arguments and meaning
- save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#
Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override
load_model- Parameters:
save_model (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_modelfield. You can rely on theomegaconfpackage to simplify the duplication.model_ckpt_dir (str) – save the model into the this directory.
build_model_all_args (dict) – all the arguments of
build_model. By saving this dictionary, you can easily reconstruct the same model by callingbuild_modelwith the saved dictionary.model (torch.nn.Module) – the model to be saved.
- Returns:
None
- save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#
Save the task’s state,
task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to overrideload_task.- Parameters:
save_task (dict) – same in
default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside thesave_taskfield. You can rely on theomegaconfpackage to simplify the duplication.task_ckpt_dir (str) – save the task into this directory.
build_task_all_args_except_model (dict) – all the arguments of
build_taskexcept themodelargument since the model should be sapartely saved bysave_model. By saving this dictionary, you can easily reconstruct the same task by callingbuild_taskwith the saved dictionary.task (Task) – the task to be saved.
- Returns:
None
- train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: dict = None)[source]#
- Parameters:
train (dict) –
same in
default_configkey
description
total_steps
(int) - the total optimization steps
log_step
(int) - logging frequency. log every
log_stepstepeval_step
(int) - evaluation frequency. Evaluate every
eval_stepstep. Note that you can control how many batch to evaluate to speed up the development by theeval_batchargument inrunsave_step
(int) - save the checkpoint every
save_stepstep.gradient_clipping
(float) - clip the gradient. important for RNNs.
gradient_accumulate
(int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.
valid_metric
(str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See
build_taskfor the supported metrics.valid_higher_better
(bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.
auto_resume
(bool) - if there are already the last checkpoint in
target_dir(seerun), whether to resume from it or delete it and start a new training session.resume_ckpt_dir
(str) - you can directly specify the checkpoint path to resume which is not necessary in
target_dir(seerun).seed
(int) - fix the seed before the training start
keep_num_ckpts
(int) - to prevent saving too many checkpoints, only save the
keep_num_ckptslatest checkpoints and delete the old ones.use_scheduler
(bool) - whether to use the scheduler
**others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.