[Feature suggestion] Integrate something like demucs as a preprocessing stage before Whisper #9189

justinkb · 2025-01-08T12:22:36Z

I find it helps to isolate the actual spoken voices from videos first, before letting whisper do its thing

niksedk · 2025-01-08T14:03:51Z

Purfview Faster Whisper XXL has several VAD (Voice Activity Detection) engines included.

justinkb · 2025-01-08T14:29:50Z

thanks for pointing me in that direction, unfortunately I don't think their builds support AMD gpus through HIP. I'm currently using whisper.cpp built for HIP SDK, while preprocessing using wsl rocm builds of pytorch

as a workaround, maybe you could add an option to define our own whisper invocations with placeholder variables. that way I can invoke it from Windows into a wsl distro like for example;

wsl -d Ubuntu-22.04 -- source /home/paul/dev/venvs/ml/bin/activate `&`& python3 my-fw-script.py %videofile% %output

if you don't want to do that, I understand, I will implement it locally and build my own subtitleedit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature suggestion] Integrate something like demucs as a preprocessing stage before Whisper #9189

[Feature suggestion] Integrate something like demucs as a preprocessing stage before Whisper #9189

justinkb commented Jan 8, 2025

niksedk commented Jan 8, 2025

justinkb commented Jan 8, 2025 •

edited

Loading

[Feature suggestion] Integrate something like demucs as a preprocessing stage before Whisper #9189

[Feature suggestion] Integrate something like demucs as a preprocessing stage before Whisper #9189

Comments

justinkb commented Jan 8, 2025

niksedk commented Jan 8, 2025

justinkb commented Jan 8, 2025 • edited Loading

justinkb commented Jan 8, 2025 •

edited

Loading