frame_label#

(s3prl.dataio.dataset.frame_label)

Authors:
  • Leo (2022)

FrameLabelDataset#

class s3prl.dataio.dataset.frame_label.FrameLabelDataset(df: DataFrame, num_class: int, frame_shift: int, chunk_secs: float, step_secs: float, use_unfull_chunks: bool = True, load_audio_conf: Optional[dict] = None, sample_rate: int = 16000)[source][source]#

Bases: Dataset

Parameters:

df (pd.DataFrame) – the dataframe should have the following columns record_id (str), wav_path (str), duration (float), utt_id (str), label (int), start_sec (float), end_sec (float)

getinfo(index: int)[source][source]#

chunking#

s3prl.dataio.dataset.frame_label.chunking(start_sec: float, end_sec: float, chunk_secs: float, step_secs: float, use_unfull_chunks: bool = True) List[Tuple[float, float]][source][source]#

Produce chunks (start, end points) from a given start, end seconds

Parameters:
  • start_sec (float) – The start second of the utterance

  • end_sec (float) – The end second of the utterance

  • chunk_secs (float) – The length (in seconds) of a chunked chunk

  • step_secs (float) – The stride seconds between chunks

  • use_unfull_chunks (bool) – Whether to produce chunks shorter than chunk_secs at the end of the recording

Returns:

Each tuple describes the starting point (in sec)

and the ending point (in sec) of each chunk in order

Return type:

List[Tuple[float, float]]

scale_labels_secs#

s3prl.dataio.dataset.frame_label.scale_labels_secs(labels: List[Tuple[Any, float, float]], ratio: float)[source][source]#

When the recording length is changed due to like pitch or speed manipulation, the start/end timestamp (in seconds) should also be changed

Parameters:
  • labels (List[Tuple[Any, float, float]]) – each chunk label is in (label, start_sec, end_sec)

  • ratio (float) – the scaling ratio

Returns:

the scaled labels

Return type:

List[Tuple[Any, float, float]]

get_chunk_labels#

s3prl.dataio.dataset.frame_label.get_chunk_labels(start_sec: float, end_sec: float, labels: List[Tuple[Any, float, float]])[source][source]#

Given a pair a start, end points, filter out the relevant labels from the given labels and refine the start/end points of each label to reside between start_sec and end_sec

Parameters:
  • start_sec (float) – the starting point

  • end_sec (float) – the ending point

  • labels (List[Tuple[Any, float, float]]) – the chunk labels

Returns:

filtered labels. Only the labels relevant to the assigned

start/end point are left

Return type:

List[Tuple[str, float, float]]

chunk_labels_to_frame_tensor_label#

s3prl.dataio.dataset.frame_label.chunk_labels_to_frame_tensor_label(start_sec: float, end_sec: float, labels: List[Tuple[int, float, float]], num_class: int, frame_shift: int, sample_rate: int = 16000)[source][source]#

Produce frame-level labels for the given chunk labels

Parameters:
  • start_sec (float) – the starting point of the chunk

  • end_sec (float) – the ending point of the chunk

  • labels (List[Tuple[int, float, float]]) – the chunk labels, each label is a tuple in (class_id, start_sec, end_sec)

  • num_class (int) – number of classes

  • frame_shift (int) – produce a frame per frame_shift samples

  • sample_rate (int) – the sample rate of the recording. default: 16000

Returns:

shape (num_frames, num_class).

the binary frame labels for the given labels

Return type:

torch.FloatTensor