sorted_sampler#

(s3prl.dataio.sampler.sorted_sampler)

The most commonly used batch sampler in S3PRL legacy codebase, which sorts the lengths of all the data points and group the instances with the similar lengths together.

Authors:

Leo 2022

SortedSliceSampler#

class s3prl.dataio.sampler.sorted_sampler.SortedSliceSampler(lengths: List[int], batch_size: int, max_length: int = 300000, seed: int = 12345678, in_batch_shuffle: bool = False)[source][source]#

Bases: object

This sampler should only be used for training hence is always in random shuffle mode

Parameters:
  • lengths (List[int]) –

  • batch_size (int) – the default batch size

  • max_length (int) – if a batch contains at least on utt longer than max_length, half the batch

  • get_length_func (callable) – get the length of each item in the dataset, if None, a default function will be used

  • in_batch_shuffle (bool) – if False, batches are sorted by length from long to short

set_epoch(epoch: int)[source][source]#

SortedBucketingSampler#

class s3prl.dataio.sampler.sorted_sampler.SortedBucketingSampler(lengths: List[int], batch_size: int, max_length: int = 300000, shuffle: bool = False, in_batch_shuffle: bool = False, seed: int = 12345678)[source][source]#

Bases: object

Parameters:
  • lengths (List[int]) –

  • batch_size (int) – the default batch size

  • max_length (int) – if a batch contains at least on utt longer than max_length, half the batch

  • get_length_func (callable) – get the length of each item in the dataset, if None, a default function will be used

  • shuffle (bool) – Whether to shuffle the batches

  • in_batch_shuffle (bool) – if False, batches are sorted by length from long to short

set_epoch(epoch: int)[source][source]#