Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use whisperX to do forced alignment #939

Open
jhkonan opened this issue Dec 5, 2024 · 1 comment
Open

How to use whisperX to do forced alignment #939

jhkonan opened this issue Dec 5, 2024 · 1 comment

Comments

@jhkonan
Copy link

jhkonan commented Dec 5, 2024

I understand whisper is able to give the predicted transcript and alignment. However, I have both the audio and the original transcript, so I would like to do forced alignment with whisper (turbo) using whisperX.

Is this capability possible within the current framework? If a working example could be made, it would be very helpful.

Here is an example speech text pair to start...

!wget https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav -O LDC93S1.wav
!wget https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.txt -O LDC93S1.txt

import librosa
import numpy

speech, sr = librosa.load("LDC93S1.wav", sr=16000)
text = " ".join(numpy.loadtxt("LDC93S1.txt", dtype=str)[2:])
@jhkonan jhkonan changed the title How do I use whisperX to do forced alignment? How to use whisperX to do forced alignment Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants