librispeech#
(s3prl.dataio.corpus.librispeech)
Parse the LibriSpeech corpus
- Authors:
Heng-Jui Chang 2022
LibriSpeech#
- class s3prl.dataio.corpus.librispeech.LibriSpeech(dataset_root: str, n_jobs: int = 4, train_split: List[str] = ['train-clean-100'], valid_split: List[str] = ['dev-clean'], test_split: List[str] = ['test-clean'])[source][source]#
Bases:
Corpus
LibriSpeech Corpus Link: https://www.openslr.org/12
- Parameters:
dataset_root (str) – Path to LibriSpeech corpus directory.
n_jobs (int, optional) – Number of jobs. Defaults to 4.
train_split (List[str], optional) – Training splits. Defaults to [“train-clean-100”].
valid_split (List[str], optional) – Validation splits. Defaults to [“dev-clean”].
test_split (List[str], optional) – Testing splits. Defaults to [“test-clean”].
- property all_data[source]#
Return all the data points in a dict of the format
data_id1: wav_path: (str) The waveform path transcription: (str) The transcription speaker: (str) The speaker name gender: (str) The speaker's gender corpus_split: (str) The split of corpus this sample belongs to data_id2: ...