corpus#
(s3prl.dataio.corpus)
Parse the commonly used corpus into standardized dictionary structure
Parse the Fluent Speech Command corpus |
|
Parse the IEMOCAP corpus |
|
Parse the LibriLight corpus |
|
Parse the LibriSpeech corpus |
|
Parse the QUESST14 corpus |
|
Parse the Audio SNIPS corpus |
|
Parse the Google Speech Commands V1 corpus |
|
Parse VoxCeleb1 corpus for classification |
|
Parse VoxCeleb1 corpus for verification |
FluentSpeechCommands#
- class s3prl.dataio.corpus.FluentSpeechCommands(dataset_root: str, n_jobs: int = 4)[source][source]#
Bases:
Corpus
Parse the Fluent Speech Command dataset
- Parameters:
dataset_root – (str) The dataset root of Fluent Speech Command
- property all_data[source]#
Return all the data points in a dict of the format
data_id1: path: (str) The waveform path speakerId: (str) The speaker name transcription: (str) The transcription action: (str) The action object: (str) The action's targeting object location: (str) The location where the action happens data_id2: ...
- property data_split[source]#
Return a list:
train_data
,valid_data
,test_data
each is a dict following the format specified in
all_data
- property data_split_ids[source]#
Return a list:
train_ids
,valid_ids
,test_ids
Each is a list containing data_ids. data_ids can be used as the key to access the
all_data
IEMOCAP#
- class s3prl.dataio.corpus.IEMOCAP(dataset_root: str, n_jobs: int = 4)[source][source]#
Bases:
Corpus
Parse the IEMOCAP dataset
- Parameters:
dataset_root – (str) The dataset root of IEMOCAP
- property all_data[source]#
Return: dict
all the data points of IEMOCAP in the format of
data_id1: wav_path (str): The waveform path speaker (str): The speaker name act (str): improvised / scripted emotion (str): The emotion label session_id (int): The session data_id2: ...
- get_whole_session(session_id: int)[source][source]#
- Parameters:
session_id (int) – The session index selected from 1, 2, 3, 4, 5
- Returns:
dict
data points in a single session (containing improvised and scripted recordings) in the same format as
all_data
- get_session_with_act(session_id: int, act: str)[source][source]#
- Parameters:
session_id (int) – The session index selected from 1, 2, 3, 4, 5
act (str) – ‘improvised’ or ‘scripted’
- Returns:
s3prl.base.container.Container
data points in a single session with a specific act (either improvised or scripted) in the same format as
all_data
LibriSpeech#
- class s3prl.dataio.corpus.LibriSpeech(dataset_root: str, n_jobs: int = 4, train_split: List[str] = ['train-clean-100'], valid_split: List[str] = ['dev-clean'], test_split: List[str] = ['test-clean'])[source][source]#
Bases:
Corpus
LibriSpeech Corpus Link: https://www.openslr.org/12
- Parameters:
dataset_root (str) – Path to LibriSpeech corpus directory.
n_jobs (int, optional) – Number of jobs. Defaults to 4.
train_split (List[str], optional) – Training splits. Defaults to [“train-clean-100”].
valid_split (List[str], optional) – Validation splits. Defaults to [“dev-clean”].
test_split (List[str], optional) – Testing splits. Defaults to [“test-clean”].
- property all_data[source]#
Return all the data points in a dict of the format
data_id1: wav_path: (str) The waveform path transcription: (str) The transcription speaker: (str) The speaker name gender: (str) The speaker's gender corpus_split: (str) The split of corpus this sample belongs to data_id2: ...
LibriLight#
Quesst14#
SNIPS#
SpeechCommandsV1#
- class s3prl.dataio.corpus.SpeechCommandsV1(gsc1: str, gsc1_test: str, n_jobs: int = 4)[source][source]#
Bases:
Corpus
- Parameters:
dataset_root (str) – should contain a ‘dev’ sub-folder for the training/validation set and a ‘test’ sub-folder for the testing set
- static split_dataset(root_dir: Union[str, Path], max_uttr_per_class=134217727) Tuple[List[Tuple[str, str]], List[Tuple[str, str]]] [source][source]#
Split Speech Commands into 3 set.
- Parameters:
root_dir – speech commands dataset root dir
max_uttr_per_class – predefined value in the original paper
- Returns:
[(class_name, audio_path), …] valid_list: as above
- Return type:
train_list