Enabling word-level timestamps for all W2L Decoders #5403

abarcovschi · 2023-12-17T22:37:58Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #3371 and extends #3627 to include the ability to return the frame numbers of all non-blank characters of a hypothesis for all wav2letter decoder classes, not only just for W2lKenLMDecoder. A method called get_symbols() was also added to the parent class for all the decoders (W2lDecoder) so that the non-blank characters of the hypothesis can be returned as a list of natural language characters and not just token ids. This helps in finding the word-boundary tokens later when calculating the word-level timestamp information using the following formula:

timestamp = frame_num * (audio_len / (num_frames * sample_rate))

where:

frame_num = the timestep of the symbol, as returned in the 'timesteps' field of Wl2Decoder.decode() outputs.
audio_len = the number of samples in the loaded audio file corresponding to the transcript (if using batched w2v2 acoustic model inference, will be zero padded to the length of the longest loaded audio file in the batch).
num_frames = the number of frames in the emission matrix returned by the w2v2 acoustic model inference for that audio file (if using batched inference, the number of frames for each audio file will be the same as in this case all loaded audio files are padded to the length of the longest audio file in the batch).
sample_rate = sample rate of loaded audio files (usually 16000 Hz).

PR review

@alexeib

…R decoders

alexeib

lgtm bar comments. also i am no longer at meta so can't merge PRs into this repo

examples/speech_recognition/w2l_decoder.py

alexeib

thanks! hopefully someone from meta will merge!

abarcovschi added 3 commits December 9, 2023 20:12

implement word-level time alignment functionality to all CTC-based AS…

84df912

…R decoders

update w2l decoders

5eadefc

check if pre-commit hooks pass

28b7c02

facebook-github-bot added the CLA Signed label Dec 17, 2023

alexeib suggested changes Dec 18, 2023

View reviewed changes

examples/speech_recognition/w2l_decoder.py Outdated Show resolved Hide resolved

examples/speech_recognition/w2l_decoder.py Outdated Show resolved Hide resolved

call get_tokens once in decode

ad40afd

alexeib approved these changes Dec 19, 2023

View reviewed changes

Merge branch 'main' into main

7bfed86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling word-level timestamps for all W2L Decoders #5403

Enabling word-level timestamps for all W2L Decoders #5403

abarcovschi commented Dec 17, 2023

alexeib left a comment

alexeib left a comment

Enabling word-level timestamps for all W2L Decoders #5403

Are you sure you want to change the base?

Enabling word-level timestamps for all W2L Decoders #5403

Conversation

abarcovschi commented Dec 17, 2023

Before submitting

What does this PR do?

PR review

alexeib left a comment

Choose a reason for hiding this comment

alexeib left a comment

Choose a reason for hiding this comment