problem#

(s3prl.problem)

Pre-defined python recipes with customizable methods

s3prl.problem.asr

Speech-to-text based recipes

s3prl.problem.asv

Speaker Verification recipes

s3prl.problem.base

The shared backbone of common ML train/test procedure for all problems

s3prl.problem.common

The most common and simple train/valid/test recipes

s3prl.problem.diarization

Speaker Diarization recipes

SuperbASR#

class s3prl.problem.SuperbASR[source][source]#

Bases: ASR

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data prepare_tokenizer_data build_tokenizer build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  train_sets:
  - train-clean-100
  valid_sets:
  - dev-clean
  test_sets:
  - test-clean
prepare_tokenizer_data: {}
build_tokenizer:
  vocab_type: character
build_dataset: {}
build_batch_sampler:
  train:
    batch_size: 32
    max_length: 2000
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  model_conf:
    module: LSTM
    proj_size: 1024
    hidden_size:
    - 1024
    - 1024
    dropout:
    - 0.2
    - 0.2
    layer_norm:
    - false
    - false
    proj:
    - false
    - false
    sample_rate:
    - 1
    - 1
    sample_style: concat
    bidirectional: true
  specaug_conf:
    freq_mask_width_range: !!python/tuple
    - 0
    - 50
    num_freq_mask: 4
    time_mask_width_range: !!python/tuple
    - 0
    - 40
    num_time_mask: 2
build_model:
  upstream_trainable: false
build_task:
  log_metrics:
  - cer
  - wer
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model:
  extra_conf:
    build_downstream_conf: ${build_downstream}
save_task: {}
train:
  total_steps: 200000
  log_step: 100
  eval_step: 2000
  save_step: 500
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: wer
  valid_higher_better: false
  auto_resume: true
  resume_ckpt_dir: null
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call prepare_librispeech with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in prepare_librispeech

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

transcription

(str) - a text string

prepare_tokenizer_data(prepare_tokenizer_data: dict, target_dir: str, cache_dir: str, train_csv: str, valid_csv: str, test_csvs: List[str], get_path_only: bool = False)[source][source]#

Prepare the text file used for training tokenizer. By default only use the transcription in the train_csv returned from prepare_data The default prepare_tokenizer_data build the character-based tokenizer

Parameters:
  • prepare_tokenizer_data (dict) – same in default_config, no supported argument for now

  • target_dir (str) – Save the text file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv (str) – The train data given by prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

The text file path, the text file should be in the format

This is the first line
This is the second line
These are all text used for training tokenizer

build_tokenizer(build_tokenizer: dict, target_dir: str, cache_dir: str, tokenizer_data_path: str, get_path_only: bool = False)[source][source]#

Build the tokenizer from the data prepared by prepare_tokenizer_data By default call prepare_common_tokenizer with **build_tokenizer

Parameters:
  • build_tokenizer (dict) – same in default_config, arguments for prepare_common_tokenizer

  • target_dir (str) – Current experinment directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • tokenizer_data_path (str) – The text file from prepare_tokenizer_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

filepath of the pickled s3prl.dataio.encoder.tokenizer.Tokenizer

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, tokenizer_path: str)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) – same in default_config, not used

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • tokenizer_path (str) – The pickled tokenizer path for encoding transcription

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_ids

(torch.LongTensor) - the encoded class ids of a transcription (sentence)

labels

(str) - the text transcription

unique_name

(str) - the unique id for this datapoint

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset: Dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for SortedBucketingSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#

Return the task-specific downstream model. By default build the RNNEncoder model wrapped with ModelWithSpecaug

Parameters:
  • build_downstream (dict) – same in default_config, has two keys: model_conf is the arguments for RNNEncoder; specaug_conf is the arguments for ModelWithSpecaug

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsFrameModel

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model, tokenizer)[source]#
build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, prepare_tokenizer_data: Optional[dict] = None, build_tokenizer: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file for ASR (waveform path, label…)

1

Prepare the metadata file for training tokenizer

2

Train the tokenizer

3

Train the ASR model

4

Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See test_ckpt_steps)

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by test_ckpts_steps.

  • **others – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbPR#

class s3prl.problem.SuperbPR[source][source]#

Bases: SuperbASR

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data prepare_tokenizer_data build_tokenizer build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  train_sets:
  - train-clean-100
  valid_sets:
  - dev-clean
  test_sets:
  - test-clean
prepare_tokenizer_data: {}
build_tokenizer:
  vocab_type: phoneme
build_dataset: {}
build_batch_sampler:
  train:
    batch_size: 16
    max_length: 300000
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_size: 256
build_model:
  upstream_trainable: false
build_task:
  log_metrics:
  - per
build_optimizer:
  name: Adam
  conf:
    lr: 0.01
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model:
  extra_conf:
    build_downstream_conf: ${build_downstream}
save_task: {}
train:
  total_steps: 100000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 2
  valid_metric: per
  valid_higher_better: false
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call prepare_librispeech with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in prepare_librispeech

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

transcription

(str) - a text string

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for SortedSliceSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#

Return the task-specific downstream model. By default build the FrameLevelLinear

Parameters:
  • build_downstream (dict) – same in default_config, supports arguments in FrameLevelLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsFrameModel

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, tokenizer_path: str)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) – same in default_config, not used

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • tokenizer_path (str) – The pickled tokenizer path for encoding transcription

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_ids

(torch.LongTensor) - the encoded class ids of a transcription (sentence)

labels

(str) - the text transcription

unique_name

(str) - the unique id for this datapoint

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model, tokenizer)[source]#
build_tokenizer(build_tokenizer: dict, target_dir: str, cache_dir: str, tokenizer_data_path: str, get_path_only: bool = False)[source]#

Build the tokenizer from the data prepared by prepare_tokenizer_data By default call prepare_common_tokenizer with **build_tokenizer

Parameters:
  • build_tokenizer (dict) – same in default_config, arguments for prepare_common_tokenizer

  • target_dir (str) – Current experinment directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • tokenizer_data_path (str) – The text file from prepare_tokenizer_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

filepath of the pickled s3prl.dataio.encoder.tokenizer.Tokenizer

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_tokenizer_data(prepare_tokenizer_data: dict, target_dir: str, cache_dir: str, train_csv: str, valid_csv: str, test_csvs: List[str], get_path_only: bool = False)[source]#

Prepare the text file used for training tokenizer. By default only use the transcription in the train_csv returned from prepare_data The default prepare_tokenizer_data build the character-based tokenizer

Parameters:
  • prepare_tokenizer_data (dict) – same in default_config, no supported argument for now

  • target_dir (str) – Save the text file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv (str) – The train data given by prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

The text file path, the text file should be in the format

This is the first line
This is the second line
These are all text used for training tokenizer

run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, prepare_tokenizer_data: Optional[dict] = None, build_tokenizer: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file for ASR (waveform path, label…)

1

Prepare the metadata file for training tokenizer

2

Train the tokenizer

3

Train the ASR model

4

Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See test_ckpt_steps)

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by test_ckpts_steps.

  • **others – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbSF#

class s3prl.problem.SuperbSF[source][source]#

Bases: SuperbASR

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data prepare_tokenizer_data build_tokenizer build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  train_speakers:
  - Ivy
  - Joanna
  - Joey
  - Justin
  - Kendra
  - Kimberly
  - Matthew
  - Salli
  valid_speakers:
  - Aditi
  - Amy
  - Geraint
  - Nicole
  test_speakers:
  - Brian
  - Emma
  - Raveena
  - Russell
prepare_tokenizer_data: {}
build_tokenizer:
  vocab_type: character
build_dataset: {}
build_batch_sampler:
  train:
    batch_size: 32
    max_length: 300000
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  model_conf:
    module: LSTM
    proj_size: 1024
    hidden_size:
    - 1024
    - 1024
    dropout:
    - 0.2
    - 0.2
    layer_norm:
    - false
    - false
    proj:
    - false
    - false
    sample_rate:
    - 1
    - 1
    sample_style: concat
    bidirectional: true
  specaug_conf:
    freq_mask_width_range: !!python/tuple
    - 0
    - 50
    num_freq_mask: 4
    time_mask_width_range: !!python/tuple
    - 0
    - 40
    num_time_mask: 2
build_model:
  upstream_trainable: false
build_task:
  log_metrics:
  - wer
  - cer
  - slot_type_f1
  - slot_value_cer
  - slot_value_wer
  - slot_edit_f1_full
  - slot_edit_f1_part
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 200000
  log_step: 100
  eval_step: 2000
  save_step: 500
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: slot_type_f1
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call audio_snips_for_slot_filling with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in audio_snips_for_slot_filling

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

transcription

(str) - a text string where words are separted by a space.

Eg. “I want to fly from Taipei to New York”

iob

(str) - iob tags, use “O” if no tag, every word should have a tag, separted by a space.

Eg. “O O O O O from_location O to_location to_location”

prepare_tokenizer_data(prepare_tokenizer_data: dict, target_dir: str, cache_dir: str, train_csv: str, valid_csv: str, test_csvs: str, get_path_only: bool = False)[source][source]#

Prepare the text file used for training tokenizer. By default only use the transcription in the train_csv returned from prepare_data The default prepare_tokenizer_data build the character-based tokenizer

Parameters:
  • prepare_tokenizer_data (dict) – same in default_config, no supported argument for now

  • target_dir (str) – Save the text file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv (str) – The train data given by prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

The text file path, the text file should be in the format

This is the first line
This is the second line
These are all text used for training tokenizer

build_tokenizer(build_tokenizer: dict, target_dir: str, cache_dir: str, tokenizer_data_path: str, get_path_only: bool = False)[source][source]#

Build the tokenizer from the data prepared by prepare_tokenizer_data By default call prepare_common_tokenizer with **build_tokenizer

Parameters:
  • build_tokenizer (dict) – same in default_config, arguments for prepare_common_tokenizer

  • target_dir (str) – Current experinment directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • tokenizer_data_path (str) – The text file from prepare_tokenizer_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

filepath of the pickled s3prl.dataio.encoder.tokenizer.Tokenizer

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, tokenizer_path: str)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) – same in default_config, not used

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • tokenizer_path (str) – The pickled tokenizer path for encoding transcription

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_ids

(torch.LongTensor) - the encoded class ids of a transcription (sentence)

labels

(str) - the text transcription

unique_name

(str) - the unique id for this datapoint

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for SortedSliceSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the RNNEncoder model wrapped with ModelWithSpecaug

Parameters:
  • build_downstream (dict) – same in default_config, has two keys: model_conf is the arguments for RNNEncoder; specaug_conf is the arguments for ModelWithSpecaug

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsFrameModel

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model, tokenizer)[source]#
build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, prepare_tokenizer_data: Optional[dict] = None, build_tokenizer: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file for ASR (waveform path, label…)

1

Prepare the metadata file for training tokenizer

2

Train the tokenizer

3

Train the ASR model

4

Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See test_ckpt_steps)

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by test_ckpts_steps.

  • **others – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbASV#

class s3prl.problem.SuperbASV[source][source]#

Bases: ASV

default_config()[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_dataset build_batch_sampler build_upstream build_featurizer build_model build_task build_optimizer build_scheduler train

target_dir: ???
cache_dir: null
test_ckpt_steps: null
prepare_data:
  dataset_root: ???
build_dataset:
  train:
    min_secs: 2.0
    max_secs: 8.0
build_batch_sampler:
  train:
    batch_size: 10
    shuffle: true
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_model:
  upstream_trainable: false
build_task:
  loss_type: amsoftmax
  loss_conf:
    margin: 0.4
    scale: 30
build_optimizer:
  name: AdamW
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
train:
  total_steps: 200000
  log_step: 500
  eval_step: 1.0e+20
  save_step: 10000
  gradient_clipping: 1000.0
  gradient_accumulate: 5
  valid_metric: null
  valid_higher_better: null
  auto_resume: true
  resume_ckpt_dir: null
  keep_num_ckpts: null
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call prepare_voxceleb1_for_sv with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in prepare_voxceleb1_for_sv

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (bool) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. test_trial_paths (List[str])

The train_path should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this utterance

wav_path

(str) - the absolute path of the waveform file

spk

(str) - a string speaker label

Each test_trial_path should be a csv file containing the following columns:

column

description

id1

(str) - the unique id of the first utterance

id2

(str) - the unique id of the second utterance

wav_path1

(str) - the absolute path of the first utterance

wav_path2

(str) - the absolute path of the second utterance

label

(int) - 0 when two utterances are from different speakers, 1 when same speaker

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv: str, test_csvs: list, get_path_only: bool)[source][source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of the train csv.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (bool) – Directly return the filepaths no matter they exist or not

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config, have train and test keys, each is a dictionary, for train dictionary:

    key

    description

    min_secs

    (float) - Drop a waveform if it is not longer than min_secs

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds. Default: None, no cropping

    for test dictionary, no argument supported yet

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For train mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(str) - the label class id encoded by encoder_path

unique_name

(str) - the unique id for this datapoint

For test mode:

x (torch.FloatTensor) - the waveform in (seq_len, 1) x_len (int) - the waveform length seq_len unique_name (str) - the unique id for this datapoint

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

    Note that ASV does not support valid

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#

Return the task-specific downstream model. By default build the SuperbXvector model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of SuperbXvector

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model, encoder, test_trials=None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build SpeakerVerification

Parameters:
  • build_task (dict) – same in default_config, no argument supported for now

  • model (torch.nn.Module) – the model built by build_model

  • encoder – the encoder built by build_encoder

  • test_trials (List[Tuple[int, str, str]]) – each tuple in the list consists of (label, enroll_utt_id, test_utt_id). label is either 0 or 1

Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, test_ckpt_steps: Optional[List[int]] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder for encoding the speaker labels

2

Train the model

3

Evaluate the model on multiple test sets, multiple checkpoints will be evaluated for each test set (See test_ckpt_steps)

4

Report the best result find on each test set

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by test_ckpts_steps.

  • test_ckpt_steps (List[int]) – After training, multiple steps of checkpoints are saved. This option specifies which checkpoints (multiple) will be used for evaluation.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbER#

class s3prl.problem.SuperbER[source][source]#

Bases: SuperbSID

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_encoder build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  iemocap: ???
  test_fold: ???
build_encoder: {}
build_dataset: {}
build_batch_sampler:
  train:
    batch_size: 4
    shuffle: true
  valid:
    batch_size: 4
  test:
    batch_size: 4
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_size: 256
build_model:
  upstream_trainable: false
build_task: {}
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 30000
  log_step: 500
  eval_step: 1000
  save_step: 1000
  gradient_clipping: 1.0
  gradient_accumulate: 8
  valid_metric: accuracy
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call iemocap_for_superb with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in iemocap_for_superb

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbIC#

class s3prl.problem.SuperbIC[source][source]#

Bases: Common

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_encoder build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
build_encoder: {}
build_dataset: {}
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 32
  test:
    batch_size: 32
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_size: 256
build_model:
  upstream_trainable: false
build_task: {}
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 200000
  log_step: 100
  eval_step: 5000
  save_step: 250
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: accuracy
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call fsc_for_multi_classification with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, arguments for fsc_for_multi_classification

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

labels

(str) - the string labels of the waveform, separated by a ‘;’

The number of the label columns can be arbitrary.

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoders from all the columns prefixing label from all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (bool) – Directly return the filepaths no matter they exist or not.

Returns:

str

tokenizer_path: The tokenizer should be saved in the pickle format

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_ids

(torch.LongTensor) - the encoded class ids. shape: (num_class, )

labels

(List[str]) - the class name. length: num_class

unique_name

(str) - the unique id for this datapoint

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset: Dataset)[source][source]#

Return the batch sampler for torch DataLoader. By default call superb_sid_batch_sampler with **build_batch_sampler.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

AbsUtteranceModel

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source][source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceMultiClassClassificationTask

Parameters:
  • build_task (dict) – same in default_config, no argument supported for now

  • model (torch.nn.Module) – the model built by build_model

  • encoder – the encoder built by build_encoder

  • valid_df (pd.DataFrame) – metadata of the valid set

  • test_df (pd.DataFrame) – metadata of the test set

Returns:

Task

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbKS#

class s3prl.problem.SuperbKS[source][source]#

Bases: SuperbSID

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_encoder build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  gsc1: ???
  gsc1_test: ???
build_encoder: {}
build_dataset:
  train:
    sox_effects:
    - - channels
      - '1'
    - - rate
      - '16000'
    - - gain
      - '-3.0'
  valid:
    sox_effects:
    - - channels
      - '1'
    - - rate
      - '16000'
    - - gain
      - '-3.0'
  test:
    sox_effects:
    - - channels
      - '1'
    - - rate
      - '16000'
    - - gain
      - '-3.0'
build_batch_sampler:
  train:
    batch_size: 32
  valid:
    batch_size: 32
  test:
    batch_size: 32
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_size: 256
build_model:
  upstream_trainable: false
build_task: {}
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 200000
  log_step: 100
  eval_step: 5000
  save_step: 1000
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: accuracy
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call gsc1_for_classification with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in gsc1_for_classification

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

tokenizer_path: The tokenizer should be saved in the pickle format

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset: Dataset)[source][source]#

Return the batch sampler for torch DataLoader. By default for train and valid, use BalancedWeightedSampler; for test use FixedBatchSizeBatchSampler

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for BalancedWeightedSampler

    valid

    (dict) - arguments for BalancedWeightedSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_downsample_rate: int)[source][source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

AbsUtteranceModel

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbSID#

class s3prl.problem.SuperbSID[source][source]#

Bases: Common

The standard SUPERB SID task

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_encoder build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
build_encoder: {}
build_dataset:
  train:
    max_secs: 8.0
build_batch_sampler:
  train:
    batch_size: 8
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_size: 256
build_model:
  upstream_trainable: false
build_task: {}
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 200000
  log_step: 500
  eval_step: 5000
  save_step: 1000
  gradient_clipping: 1.0
  gradient_accumulate: 4
  valid_metric: accuracy
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

SuperbSD#

class s3prl.problem.SuperbSD[source][source]#

Bases: Diarization

default_config()[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_optimizer build_scheduler save_model save_task train scoring

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  data_dir: ???
build_dataset:
  chunk_size: 2000
  subsampling: 1
  rate: 16000
  use_last_samples: true
  label_delay: 0
build_batch_sampler:
  train:
    batch_size: 8
    shuffle: true
  valid:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_size: 512
  rnn_layers: 1
build_model:
  upstream_trainable: false
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model:
  extra_conf:
    build_downstream_conf: ${build_downstream}
save_task: {}
train:
  total_steps: 30000
  log_step: 500
  eval_step: 500
  save_step: 500
  gradient_clipping: 1.0
  gradient_accumulate: 4
  valid_metric: der
  valid_higher_better: false
  auto_resume: true
  resume_ckpt_dir: null
scoring:
  thresholds:
  - 0.3
  - 0.4
  - 0.5
  - 0.6
  - 0.7
  median_filters:
  - 1
  - 11
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only=False)[source][source]#

Prepare the task-specific data metadata (path, labels…).

Parameters:
  • prepare_data (dict) –

    same in default_config

    key

    description

    data_dir

    (str) - the standard Kaldi data directory

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

record_id

(str) - the id for the recording

duration

(float) - the total seconds of the recording

wav_path

(str) - the absolute path of the recording

utt_id

(str) - the id for the segmented utterance, should be globally unique across all recordings instead of just unique in a recording

speaker

(str) - the speaker label for the segmented utterance

start_sec

(float) - segment start second in the recording

end_sec

(float) - segment end second in the recording

Instead of one waveform file per row, the above file format is one segment per row, and a waveform file can have multiple overlapped segments uttered by different speakers.

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, data_dir: str, num_speakers: int, frame_shift: int)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) – same in default_config, supports arguments for DiarizationDataset

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • data_dir (str) – The converted kaldi data directory from data_csv

  • num_speakers (int) – The number of speaker per utterance

  • frame_shift (int) – The frame shift of the upstream model (downsample rate from 16 KHz)

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

label

(torch.LongTensor) - the binary label for each upstream frame, shape: (upstream_len, 2)

label_len

(int) - the upstream feature’s seq length upstream_len

record_id

(str) - the unique id for the recording

chunk_id

(int) - since recording can be chunked into several segments for efficient training, this field indicate the segment’s original position (order, 0-index) in the recording. This field is only useful during the testing stage

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, data_dir: str, dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for GroupSameItemSampler, should always use this batch sampler for the testing stage

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • data_dir (str) – The converted kaldi data directory from data_csv

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#

Return the task-specific downstream model. By default build the SuperbDiarizationModel model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of SuperbDiarizationModel

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsFrameModel

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build DiarizationPIT

Parameters:
  • build_task (dict) – same in default_config, no argument supported for now

  • model (torch.nn.Module) – the model built by build_model

Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: str, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, num_speaker: int = 2, prepare_data: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None, scoring: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the Kaldi-style data directory for speaker diarization

1

Train the model

2

Inference the prediction

3

Score the prediction

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use checkpoints specified by test_ckpts_steps.

  • num_speaker (int) – How many speakers per utterance

  • **others – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

scoring(scoring: dict, stage_id: int, test_dirs: List[str], test_rttms: List[str], frame_shift: int)[source]#

Score the prediction

Parameters:
  • scoring (dict) –

    key

    description

    thresholds

    (List[int]) - Given the 0~1 (float) soft prediction, the threshold decides how to get the 0/1 hard prediction. This list are all the thresholds to try.

    median_filters

    (List[int]) - After getting hard prediction, use median filter to smooth out the prediction. This list are all the median filter sizes to try.

  • *others – This method is not designed to be overridden

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearFSD#

class s3prl.problem.HearFSD[source][source]#

Bases: SuperbSID

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
build_batch_sampler:
  train:
    batch_size: 10
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multilabel
  scores:
  - mAP
  - top1_acc
  - d_prime
  - aucroc
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 40000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: mAP
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source][source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source][source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source][source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearESC50#

class s3prl.problem.HearESC50[source][source]#

Bases: HearFSD

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - d_prime
  - aucroc
  - mAP
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 4000
  log_step: 100
  eval_step: 500
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 4
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearBeijingOpera#

class s3prl.problem.HearBeijingOpera[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - d_prime
  - aucroc
  - mAP
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearCremaD#

class s3prl.problem.HearCremaD[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - mAP
  - d_prime
  - aucroc
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearGSC5hr#

class s3prl.problem.HearGSC5hr[source][source]#

Bases: HearFSD

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearGtzanMusicSpeech#

class s3prl.problem.HearGtzanMusicSpeech[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - mAP
  - d_prime
  - aucroc
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearGtzan#

class s3prl.problem.HearGtzan[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 10
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - mAP
  - d_prime
  - aucroc
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearGunshot#

class s3prl.problem.HearGunshot[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 7
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - d_prime
  - aucroc
  - mAP
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearLibriCount#

class s3prl.problem.HearLibriCount[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - d_prime
  - aucroc
  - mAP
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearNsynth5hr#

class s3prl.problem.HearNsynth5hr[source][source]#

Bases: HearFSD

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - pitch_acc
  - chroma_acc
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: pitch_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearStroke#

class s3prl.problem.HearStroke[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - d_prime
  - aucroc
  - mAP
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearTonic#

class s3prl.problem.HearTonic[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - d_prime
  - aucroc
  - mAP
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearVocal#

class s3prl.problem.HearVocal[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 3
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - mAP
  - top1_acc
  - d_prime
  - aucroc
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: mAP
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearVoxLingual#

class s3prl.problem.HearVoxLingual[source][source]#

Bases: HearESC50

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
  num_folds: 5
build_batch_sampler:
  train:
    batch_size: 32
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
  pooling_type: MeanPooling
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multiclass
  scores:
  - top1_acc
  - d_prime
  - aucroc
  - mAP
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 150000
  log_step: 100
  eval_step: 1000
  save_step: 100
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: top1_acc
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearDcase2016Task2#

class s3prl.problem.HearDcase2016Task2[source][source]#

Bases: HearFSD

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
build_dataset:
  train:
    chunk_secs: 4.0
    step_secs: 4.0
  valid:
    chunk_secs: 4.0
    step_secs: 4.0
  test:
    chunk_secs: 4.0
    step_secs: 4.0
build_batch_sampler:
  train:
    batch_size: 5
    shuffle: true
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multilabel
  scores:
  - event_onset_200ms_fms
  - segment_1s_er
  postprocessing_grid:
    median_filter_ms:
    - 250
    min_duration:
    - 125
    - 250
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 15000
  log_step: 100
  eval_step: 500
  save_step: 500
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: event_onset_200ms_fms
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source][source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source][source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source][source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

HearMaestro#

class s3prl.problem.HearMaestro[source][source]#

Bases: HearDcase2016Task2

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data:
  dataset_root: ???
  test_fold: ???
build_batch_sampler:
  train:
    batch_size: 5
    shuffle: true
  valid:
    item: record_id
  test:
    item: record_id
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_layers: 2
build_model:
  upstream_trainable: false
build_task:
  prediction_type: multilabel
  scores:
  - event_onset_50ms_fms
  - event_onset_offset_50ms_20perc_fms
  postprocessing_grid:
    median_filter_ms:
    - 150
    min_duration:
    - 50
build_optimizer:
  name: Adam
  conf:
    lr: 0.001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 15000
  log_step: 100
  eval_step: 500
  save_step: 500
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: event_onset_50ms_fms
  valid_higher_better: true
  auto_resume: true
  resume_ckpt_dir: null
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

CommonExample#

class s3prl.problem.CommonExample[source][source]#

Bases: SuperbSID

default_config() dict[source][source]#

The default arguments for run in yaml. Note that for the fields with inner values, like build_model, the outer field name corresponds to a method name, so you can find the method build_model. Furthermore, the values inside that field will be directly passed into the method. So by changing these inner values, you can directly affect the behavior of the corresponding method. See the method documentation for all the supported arguments and their meanings.

The methods affected by the following config are: prepare_data build_encoder build_dataset build_batch_sampler build_upstream build_featurizer build_downstream build_model build_task build_optimizer build_scheduler save_model save_task train evaluate

start: 0
stop: null
target_dir: ???
cache_dir: null
remove_all_cache: false
prepare_data: {}
build_encoder: {}
build_dataset:
  train:
    max_secs: 8.0
build_batch_sampler:
  train:
    batch_size: 8
    shuffle: true
  valid:
    batch_size: 1
  test:
    batch_size: 1
build_upstream:
  name: ???
build_featurizer:
  layer_selections: null
  normalize: false
build_downstream:
  hidden_size: 256
build_model:
  upstream_trainable: false
build_task: {}
build_optimizer:
  name: Adam
  conf:
    lr: 0.0001
build_scheduler:
  name: ExponentialLR
  gamma: 0.9
save_model: {}
save_task: {}
train:
  total_steps: 10
  log_step: 1
  eval_step: 5
  save_step: 5
  gradient_clipping: 1.0
  gradient_accumulate: 1
  valid_metric: accuracy
  valid_higher_better: true
  auto_resume: true
evaluate: {}
prepare_data(prepare_data: dict, target_dir: str, cache_dir: str, get_path_only: bool = False)[source][source]#

Prepare the task-specific data metadata (path, labels…). By default call voxceleb1_for_sid with **prepare_data

Parameters:
  • prepare_data (dict) – same in default_config, support arguments in voxceleb1_for_sid

  • target_dir (str) – Parse your corpus and save the csv file into this directory

  • cache_dir (str) – If the parsing or preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

tuple

  1. train_path (str)

  2. valid_path (str)

  3. test_paths (List[str])

Each path (str) should be a csv file containing the following columns:

column

description

id

(str) - the unique id for this data point

wav_path

(str) - the absolute path of the waveform file

label

(str) - a string label of the waveform

start_sec

(float) - optional, load the waveform from start_sec seconds. If not presented or is math.nan, load from the beginning.

end_sec

(float) - optional, load the waveform from end_sec seconds. If not presented or is math.nan, load to the end.

build_batch_sampler(build_batch_sampler: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, dataset)[source]#

Return the batch sampler for torch DataLoader.

Parameters:
  • build_batch_sampler (dict) –

    same in default_config

    key

    description

    train

    (dict) - arguments for FixedBatchSizeBatchSampler

    valid

    (dict) - arguments for FixedBatchSizeBatchSampler

    test

    (dict) - arguments for FixedBatchSizeBatchSampler

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – the mode specific csv from prepare_data

  • dataset – the dataset from build_dataset

Returns:

batch sampler for torch DataLoader

build_collate_fn(build_collate_fn: dict, mode: str)[source]#

By default returns s3prl.dataset.base.default_collate_fn

Parameters:
  • build_collate_fn (dict) – same in default_config, no argument supported for now

  • mode (str) – train, valid, or test

Returns:

callable

the collate_fn for torch DataLoader in train/valid/test mode

build_dataset(build_dataset: dict, target_dir: str, cache_dir: str, mode: str, data_csv: str, encoder_path: str, frame_shift: int)[source]#

Build the dataset for train/valid/test.

Parameters:
  • build_dataset (dict) –

    same in default_config. with train, valid, test keys, each is a dictionary with the following supported options:

    key

    description

    max_secs

    (float) - If a waveform is longer than max_secs seconds, randomly crop the waveform into max_secs seconds

    sox_effects

    (List[List[str]]) - If not None, apply sox effects on the utterance

  • target_dir (str) – Current experiment directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • mode (str) – train/valid/test

  • data_csv (str) – The metadata csv file for the specific mode

  • encoder_path (str) – The pickled encoder path for encoding the labels

Returns:

torch Dataset

For all train/valid/test mode, the dataset should return each item as a dictionary containing the following keys:

key

description

x

(torch.FloatTensor) - the waveform in (seq_len, 1)

x_len

(int) - the waveform length seq_len

class_id

(int) - the encoded class id

label

(str) - the class name

unique_name

(str) - the unique id for this datapoint

build_downstream(build_downstream: dict, downstream_input_size: int, downstream_output_size: int, downstream_input_stride: int)[source]#

Return the task-specific downstream model. By default build the MeanPoolingLinear model

Parameters:
  • build_downstream (dict) – same in default_config, support arguments of MeanPoolingLinear

  • downstream_input_size (int) – the required input size of the model

  • downstream_output_size (int) – the required output size of the model

  • downstream_input_stride (int) – the input feature’s stride (from 16 KHz)

Returns:

s3prl.nn.interface.AbsUtteranceModel

build_encoder(build_encoder: dict, target_dir: str, cache_dir: str, train_csv_path: str, valid_csv_path: str, test_csv_paths: list, get_path_only: bool = False)[source]#

Build the encoder (for the labels) given the data metadata, and return the saved encoder path. By default generate and save a s3prl.dataio.encoder.CategoryEncoder from the label column of all the csv files.

Parameters:
  • build_encoder (dict) – same in default_config, no argument supported for now

  • target_dir (str) – Save your encoder into this directory

  • cache_dir (str) – If the preprocessing takes too long time, you can save the temporary files into this directory. This directory is expected to be shared across different training sessions (different hypers and target_dir)

  • train_csv_path (str) – the train path from prepare_data

  • valid_csv_path (str) – the valid path from prepare_data

  • test_csv_paths (List[str]) – the test paths from prepare_data

  • get_path_only (str) – Directly return the filepaths no matter they exist or not.

Returns:

str

encoder_path: The encoder should be saved in the pickle format

build_featurizer(build_featurizer: dict, upstream)[source]#

By default build the featurizer with s3prl.nn.Featurizer

Parameters:
Returns:

s3prl.nn.interface.AbsFeaturizer

Return the featurizer model. The featurizer is used to reduce the multiple hidden states returned from the upstream model (built by build_upstream) into a single hidden state, so can be easliy fed into the downstream model

build_model(build_model: dict, model_output_size: int, build_upstream: dict, build_featurizer: dict, build_downstream: dict)[source]#

By default build model with s3prl.nn.upstream.UpstreamDownstreamModel

Parameters:
Returns:

torch.nn.Module

Return the entire model for the task, which takes the direct items from DataLoader as the input. Usually, the components can be built by build_upstream, build_featurizer, build_downstream, and are concated together to get the final model. The upstream extracts multiple hidden states, the featuizer reduce them into a single hidden state, and the downstream takes the hidden states as the feature for the downstream-specific model.

build_optimizer(build_optimizer: dict, parameters)[source]#
Parameters:
  • build_optimizer (dict) –

    same in default_config, refer to below

    key

    description

    name

    (str) - the optimizer class name in torch.optim

    conf

    (dict) - the arguments for initializing the optimizer class. e.g. {"lr": 1.0e-4}

  • parameters (iterable) – the standard params accepted by torch.optim.Optimizer.

Returns:

torch.optim.Optimizer

An optimizer following standard torch usage

build_scheduler(build_scheduler: dict, optimizer)[source]#
Parameters:
  • build_scheduler (dict) –

    same in default_config

    key

    description

    name

    (str) - the scheduler class name in torch.optim.lr_scheduler

    conf

    (dict) - the arguments for initializing the scheduler class. e.g. {"gamma": 0.01} for torch.optim.lr_scheduler.StepLR

  • optimizer – the standard torch optimizer accepted by Scheduler in torch.optim.lr_scheduler.

Returns:

torch scheduler

A scheduler following standard torch usage

build_task(build_task: dict, model: Module, encoder, valid_df: Optional[DataFrame] = None, test_df: Optional[DataFrame] = None)[source]#

Build the task, which defines the logics for every train/valid/test forward step for the model, and the logics for how to reduce all the batch results from multiple train/valid/test steps into metrics

By default build UtteranceClassificationTask

Parameters:
Returns:

Task

build_upstream(build_upstream: dict)[source]#

By default build the upstream with s3prl.nn.upstream.S3PRLUpstream

Parameters:

build_upstream (dict) – same in default_config, arguments for s3prl.nn.upstream.S3PRLUpstream

Returns:

s3prl.nn.interface.AbsUpstream

Return an upstream model, whose forward takes the waveform input and returns multiple hidden states as features.

evaluate(evaluate: dict, mode: str, task, dataset, batch_sampler, collate_fn, eval_batch: int, dump_dir: str, device: str, num_workers: int)[source]#

The evaluate routine used by train (during validation phase) and run (during testing phase).

Parameters:
  • evaluate (dict) – same in default_config, no argument supported for now

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.

classmethod get_class_from_name(name: str)[source]#
Parameters:

name (str) – the __name__ of the problem class

Returns:

Problem

load_model(model_ckpt_dir: str)[source]#

Return the saved model.

Parameters:

model_ckpt_dir (str) – Restore the model with build_model and the checkpoint saved in this directory.

Returns:

torch.nn.Module

load_model_and_task(ckpts_dir: str, task_overrides: Optional[dict] = None)[source]#

This is a helper method to combine load_model and load_task together to directly load the model and the task. This method assumes the model is saved under ckpts_dir / 'model' and the task is saved under ckpts_dir / 'task'

Returns:

tuple

  1. model (torch.nn.Module)

  2. task (s3prl.task.Task)

load_task(task_ckpt_dir: str, model: Module, task_overrides: Optional[dict] = None)[source]#

Return the saved task.

Parameters:
  • task_ckpt_dir (str) – Restore the task with build_task and the checkpoint saved in this directory.

  • model (torch.nn.Module) – the model for the task, since the model is separately saved and is required for build_task.

  • task_overrides (dict) – overrides the saved initialization arguments, so can change the loaded task’s behavior. Like, change the decoding hyperparameters.

Returns:

s3prl.task.Task

main(args: Optional[List[str]] = None)[source]#
run(target_dir: str, cache_dir: Optional[str] = None, remove_all_cache: bool = False, start: int = 0, stop: Optional[int] = None, num_workers: int = 6, eval_batch: int = -1, device: str = 'cuda', world_size: int = 1, rank: int = 0, test_ckpt_dir: Optional[str] = None, prepare_data: Optional[dict] = None, build_encoder: Optional[dict] = None, build_dataset: Optional[dict] = None, build_batch_sampler: Optional[dict] = None, build_collate_fn: Optional[dict] = None, build_upstream: Optional[dict] = None, build_featurizer: Optional[dict] = None, build_downstream: Optional[dict] = None, build_model: Optional[dict] = None, build_task: Optional[dict] = None, build_optimizer: Optional[dict] = None, build_scheduler: Optional[dict] = None, save_model: Optional[dict] = None, save_task: Optional[dict] = None, train: Optional[dict] = None, evaluate: Optional[dict] = None)[source]#

stage

description

0

Parse the corpus and save the metadata file (waveform path, label…)

1

Build the encoder to encode the labels

2

Train the model

3

Evaluate the model on multiple test sets

Parameters:
  • target_dir (str) – The directory that stores the script result.

  • cache_dir (str) – The directory that caches the processed data. Default: /home/user/.cache/s3prl/data

  • remove_all_cache (bool) – Whether to remove all the cache stored under cache_dir. Default: False

  • start (int) – The starting stage of the problem script. Default: 0

  • stop (int) – The stoping stage of the problem script, set None to reach the final stage. Default: None

  • num_workers (int) – num_workers for all the torch DataLoder

  • eval_batch (int) – During evaluation (valid or test), limit the number of batch. This is helpful for the fast development to check everything won’t crash. If is -1, disable this feature and evaluate the entire epoch. Default: -1

  • device (str) – The device type for all torch-related operation: “cpu” or “cuda” Default: “cuda”

  • world_size (int) – How many processes are running this script simultaneously (in parallel). Usually this is just 1, however if you are runnig distributed training, this should be > 1. Default: 1

  • rank (int) – When distributed training, world_size > 1. Take world_size == 8 for example, this means 8 processes (8 GPUs) are runing in parallel. The script needs to know which process among 8 processes it is. In this case, rank can range from 0~7. All the 8 processes have the same world_size but different rank (process id).

  • test_ckpt_dir (str) – Specify the checkpoint path for testing. If not, use the validation best checkpoint under the given target_dir directory.

  • **kwds – The other arguments like prepare_data and build_model are method specific-arguments for methods like prepare_data and build_model, and will not be used in the core run logic. See the specific method documentation for their supported arguments and meaning

save_model(save_model: dict, model_ckpt_dir: str, build_model_all_args: dict, model: Module)[source]#

Save the model state_dict and the model initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_model

Parameters:
  • save_model (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_model field. You can rely on the omegaconf package to simplify the duplication.

  • model_ckpt_dir (str) – save the model into the this directory.

  • build_model_all_args (dict) – all the arguments of build_model. By saving this dictionary, you can easily reconstruct the same model by calling build_model with the saved dictionary.

  • model (torch.nn.Module) – the model to be saved.

Returns:

None

save_task(save_task: dict, task_ckpt_dir: str, build_task_all_args_except_model: dict, task: Task)[source]#

Save the task’s state, task.get_state(), and the initialization arguments into the given directory. If you override this method, it is highly possible you also need to override load_task.

Parameters:
  • save_task (dict) – same in default_config, so the user can save additional settings, like the configuration of the dataset by duplicating the dataset hypers inside the save_task field. You can rely on the omegaconf package to simplify the duplication.

  • task_ckpt_dir (str) – save the task into this directory.

  • build_task_all_args_except_model (dict) – all the arguments of build_task except the model argument since the model should be sapartely saved by save_model. By saving this dictionary, you can easily reconstruct the same task by calling build_task with the saved dictionary.

  • task (Task) – the task to be saved.

Returns:

None

train(train: dict, train_dir: str, build_model_all_args: dict, build_task_all_args_except_model: dict, save_model: dict, save_task: dict, build_optimizer: dict, build_scheduler: dict, evaluate: dict, train_dataset, train_batch_sampler, train_collate_fn, valid_dataset, valid_batch_sampler, valid_collate_fn, num_workers: int, world_size: int, rank: int, eval_batch: int, device: str, global_config: Optional[dict] = None)[source]#
Parameters:
  • train (dict) –

    same in default_config

    key

    description

    total_steps

    (int) - the total optimization steps

    log_step

    (int) - logging frequency. log every log_step step

    eval_step

    (int) - evaluation frequency. Evaluate every eval_step step. Note that you can control how many batch to evaluate to speed up the development by the eval_batch argument in run

    save_step

    (int) - save the checkpoint every save_step step.

    gradient_clipping

    (float) - clip the gradient. important for RNNs.

    gradient_accumulate

    (int) - accumulate multiple steps’ gradient before updating network parameters to simulate large-batch optimization.

    valid_metric

    (str) - the metric to select the best valid checkpoint. Different Tasks have different supported valid_metrics. See build_task for the supported metrics.

    valid_higher_better

    (bool) - some metrics are higher better, while some are lower better this will affect how to save the best validation checkpoint.

    auto_resume

    (bool) - if there are already the last checkpoint in target_dir (see run), whether to resume from it or delete it and start a new training session.

    resume_ckpt_dir

    (str) - you can directly specify the checkpoint path to resume which is not necessary in target_dir (see run).

    seed

    (int) - fix the seed before the training start

    keep_num_ckpts

    (int) - to prevent saving too many checkpoints, only save the keep_num_ckpts latest checkpoints and delete the old ones.

    use_scheduler

    (bool) - whether to use the scheduler

  • **others – only meaningful when you want to override this train method, which is not the common case. Hence we skip the documentation for now.