How to use whisperX to do forced alignment #939

jhkonan · 2024-12-05T07:19:30Z

I understand whisper is able to give the predicted transcript and alignment. However, I have both the audio and the original transcript, so I would like to do forced alignment with whisper (turbo) using whisperX.

Is this capability possible within the current framework? If a working example could be made, it would be very helpful.

Here is an example speech text pair to start...

!wget https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav -O LDC93S1.wav
!wget https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.txt -O LDC93S1.txt

import librosa
import numpy

speech, sr = librosa.load("LDC93S1.wav", sr=16000)
text = " ".join(numpy.loadtxt("LDC93S1.txt", dtype=str)[2:])

The text was updated successfully, but these errors were encountered:

zou8944 · 2024-12-12T03:17:39Z

you can refer this: https://huggingface.co/ericmattmann/whisperX-endpoint/blob/961372e2ea1e0fa0dcf34d9d797e9d3dffabdd6d/handler.py#L281

jhkonan changed the title ~~How do I use whisperX to do forced alignment?~~ How to use whisperX to do forced alignment Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use whisperX to do forced alignment #939

How to use whisperX to do forced alignment #939

jhkonan commented Dec 5, 2024

zou8944 commented Dec 12, 2024

How to use whisperX to do forced alignment #939

How to use whisperX to do forced alignment #939

Comments

jhkonan commented Dec 5, 2024

zou8944 commented Dec 12, 2024