frame_label#
(s3prl.dataio.dataset.frame_label)
- Authors:
Leo (2022)
FrameLabelDataset#
- class s3prl.dataio.dataset.frame_label.FrameLabelDataset(df: DataFrame, num_class: int, frame_shift: int, chunk_secs: float, step_secs: float, use_unfull_chunks: bool = True, load_audio_conf: Optional[dict] = None, sample_rate: int = 16000)[source][source]#
Bases:
Dataset
- Parameters:
df (pd.DataFrame) – the dataframe should have the following columns record_id (str), wav_path (str), duration (float), utt_id (str), label (int), start_sec (float), end_sec (float)
chunking#
- s3prl.dataio.dataset.frame_label.chunking(start_sec: float, end_sec: float, chunk_secs: float, step_secs: float, use_unfull_chunks: bool = True) List[Tuple[float, float]] [source][source]#
Produce chunks (start, end points) from a given start, end seconds
- Parameters:
start_sec (float) – The start second of the utterance
end_sec (float) – The end second of the utterance
chunk_secs (float) – The length (in seconds) of a chunked chunk
step_secs (float) – The stride seconds between chunks
use_unfull_chunks (bool) – Whether to produce chunks shorter than
chunk_secs
at the end of the recording
- Returns:
- Each tuple describes the starting point (in sec)
and the ending point (in sec) of each chunk in order
- Return type:
List[Tuple[float, float]]
scale_labels_secs#
- s3prl.dataio.dataset.frame_label.scale_labels_secs(labels: List[Tuple[Any, float, float]], ratio: float)[source][source]#
When the recording length is changed due to like pitch or speed manipulation, the start/end timestamp (in seconds) should also be changed
- Parameters:
labels (List[Tuple[Any, float, float]]) – each chunk label is in (label, start_sec, end_sec)
ratio (float) – the scaling ratio
- Returns:
the scaled labels
- Return type:
List[Tuple[Any, float, float]]
get_chunk_labels#
- s3prl.dataio.dataset.frame_label.get_chunk_labels(start_sec: float, end_sec: float, labels: List[Tuple[Any, float, float]])[source][source]#
Given a pair a start, end points, filter out the relevant labels from the given
labels
and refine the start/end points of each label to reside betweenstart_sec
andend_sec
- Parameters:
start_sec (float) – the starting point
end_sec (float) – the ending point
labels (List[Tuple[Any, float, float]]) – the chunk labels
- Returns:
- filtered labels. Only the labels relevant to the assigned
start/end point are left
- Return type:
List[Tuple[str, float, float]]
chunk_labels_to_frame_tensor_label#
- s3prl.dataio.dataset.frame_label.chunk_labels_to_frame_tensor_label(start_sec: float, end_sec: float, labels: List[Tuple[int, float, float]], num_class: int, frame_shift: int, sample_rate: int = 16000)[source][source]#
Produce frame-level labels for the given chunk labels
- Parameters:
start_sec (float) – the starting point of the chunk
end_sec (float) – the ending point of the chunk
labels (List[Tuple[int, float, float]]) – the chunk labels, each label is a tuple in (class_id, start_sec, end_sec)
num_class (int) – number of classes
frame_shift (int) – produce a frame per
frame_shift
samplessample_rate (int) – the sample rate of the recording. default: 16000
- Returns:
- shape (num_frames, num_class).
the binary frame labels for the given
labels
- Return type:
torch.FloatTensor