Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The train natchatts on my own data ,synthesis speech is silence. #122

Open
zhaojingxin123 opened this issue Dec 1, 2024 · 2 comments
Open

Comments

@zhaojingxin123
Copy link

Use the speech of GenShin inside the KeQing (a game character) train the matcha_tts model , and use the trained model synthesis speech which is silence.
but the synthesis speech have mel spectrogram。
the synthesised mel spectrogram picture
image
Has this ever happened to you?What do you think is the cause?
thank you .

@shivammehta25
Copy link
Owner

Could you try replacing the vocoder with BigVGAN or even Griffin-lim for testing? I haven't faced this issue before; usually, the hifigan works just fine for the audio I have tested it with.

@shivammehta25
Copy link
Owner

shivammehta25 commented Dec 2, 2024

Also, could you just verify that the sample rate of the input audio is 22050?

sample_rate: 22050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants