sorted_sampler#
(s3prl.dataio.sampler.sorted_sampler)
The most commonly used batch sampler in S3PRL legacy codebase, which sorts the lengths of all the data points and group the instances with the similar lengths together.
- Authors:
Leo 2022
SortedSliceSampler#
- class s3prl.dataio.sampler.sorted_sampler.SortedSliceSampler(lengths: List[int], batch_size: int, max_length: int = 300000, seed: int = 12345678, in_batch_shuffle: bool = False)[source][source]#
Bases:
object
This sampler should only be used for training hence is always in random shuffle mode
- Parameters:
lengths (List[int]) –
batch_size (int) – the default batch size
max_length (int) – if a batch contains at least on utt longer than max_length, half the batch
get_length_func (callable) – get the length of each item in the dataset, if None, a default function will be used
in_batch_shuffle (bool) – if False, batches are sorted by length from long to short
SortedBucketingSampler#
- class s3prl.dataio.sampler.sorted_sampler.SortedBucketingSampler(lengths: List[int], batch_size: int, max_length: int = 300000, shuffle: bool = False, in_batch_shuffle: bool = False, seed: int = 12345678)[source][source]#
Bases:
object
- Parameters:
lengths (List[int]) –
batch_size (int) – the default batch size
max_length (int) – if a batch contains at least on utt longer than max_length, half the batch
get_length_func (callable) – get the length of each item in the dataset, if None, a default function will be used
shuffle (bool) – Whether to shuffle the batches
in_batch_shuffle (bool) – if False, batches are sorted by length from long to short