beam_decoder#

(s3prl.nn.beam_decoder)

The beam search decoder of flashlight

Authors:
  • Heng-Jui Chang 2022

BeamDecoder#

class s3prl.nn.beam_decoder.BeamDecoder(token: str = '', lexicon: str = '', lm: str = '', nbest: int = 1, beam: int = 5, beam_size_token: int = -1, beam_threshold: float = 25.0, lm_weight: float = 2.0, word_score: float = -1.0, unk_score: float = -inf, sil_score: float = 0.0)[source][source]#

Bases: object

Beam decoder powered by flashlight.

Parameters:
  • token (str, optional) – Path to dictionary file. Defaults to “”.

  • lexicon (str, optional) – Path to lexicon file. Defaults to “”.

  • lm (str, optional) – Path to KenLM file. Defaults to “”.

  • nbest (int, optional) – Returns nbest hypotheses. Defaults to 1.

  • beam (int, optional) – Beam size. Defaults to 5.

  • beam_size_token (int, optional) – Token beam size. Defaults to -1.

  • beam_threshold (float, optional) – Beam search log prob threshold. Defaults to 25.0.

  • lm_weight (float, optional) – language model weight. Defaults to 2.0.

  • word_score (float, optional) – score for words appearance in the transcription. Defaults to -1.0.

  • unk_score (float, optional) – score for unknown word appearance in the transcription. Defaults to -math.inf.

  • sil_score (float, optional) – score for silence appearance in the transcription. Defaults to 0.0.

get_tokens(idxs: Iterable) LongTensor[source][source]#

Normalize tokens by handling CTC blank, ASG replabels, etc.

Parameters:

idxs (Iterable) – Token ID list output by self.decoder

Returns:

Token ID list after normalization.

Return type:

torch.LongTensor

get_timesteps(token_idxs: List[int]) List[int][source][source]#

Returns frame numbers corresponding to every non-blank token.

Parameters:

token_idxs (List[int]) – IDs of decoded tokens.

Returns:

Frame numbers corresponding to every non-blank token.

Return type:

List[int]

decode(emissions: Tensor) List[List[dict]][source][source]#

Decode sequence.

Parameters:

emissions (torch.Tensor) – Emission probabilities (in log scale).

Returns:

Decoded hypotheses.

Return type:

List[List[dict]]