You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand whisper is able to give the predicted transcript and alignment. However, I have both the audio and the original transcript, so I would like to do forced alignment with whisper (turbo) using whisperX.
Is this capability possible within the current framework? If a working example could be made, it would be very helpful.
I understand whisper is able to give the predicted transcript and alignment. However, I have both the audio and the original transcript, so I would like to do forced alignment with whisper (turbo) using whisperX.
Is this capability possible within the current framework? If a working example could be made, it would be very helpful.
Here is an example speech text pair to start...
The text was updated successfully, but these errors were encountered: