Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature suggestion] Integrate something like demucs as a preprocessing stage before Whisper #9189

Open
justinkb opened this issue Jan 8, 2025 · 2 comments

Comments

@justinkb
Copy link

justinkb commented Jan 8, 2025

I find it helps to isolate the actual spoken voices from videos first, before letting whisper do its thing

@niksedk
Copy link
Member

niksedk commented Jan 8, 2025

Purfview Faster Whisper XXL has several VAD (Voice Activity Detection) engines included.

@justinkb
Copy link
Author

justinkb commented Jan 8, 2025

thanks for pointing me in that direction, unfortunately I don't think their builds support AMD gpus through HIP. I'm currently using whisper.cpp built for HIP SDK, while preprocessing using wsl rocm builds of pytorch

as a workaround, maybe you could add an option to define our own whisper invocations with placeholder variables. that way I can invoke it from Windows into a wsl distro like for example;

wsl -d Ubuntu-22.04 -- source /home/paul/dev/venvs/ml/bin/activate `&`& python3 my-fw-script.py %videofile% %output

if you don't want to do that, I understand, I will implement it locally and build my own subtitleedit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants