-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some question about prior_loss #96
Comments
what do you mean.. |
Is your LLM frozen or are you training any aspect of it? |
yes, I froze my llm, I noticed that your input text is first converted into a phoneme sequence through the phonemizer library and provided to the speech synthesis model, but I directly use the hidden_state output by llm as input; between the two, is the former easier Training? have you ever tried discretizing text directly as input to the model? |
I think since your input is not text but already some representation that should be able to capture hidden nuances of phonemization, it should be fine. It is definitely an easier mapping if the input is phonetised but the model should be able to learn. I am actually not sure, why the prior loss is so high. Did you try listening to the outputs of the model is it utter garbage? (prior loss being MSE can be a bit high sometimes) |
Thanks for your such a quick reply! I generated the inference results of the model, which gt is |
Then, I would have to believe that the hidden representations might not capture what is required to synthesise speech. I am not sure what would be an easy fix to this, perhaps train some part of the output embeddings using LoRA? |
Any ideas? |
Thanks for your great work. Recently, I am using the hidden_state output from a large language model as the input of the matcha_tts encoder for training. I have fit a sample tens of thousands of times, but the loss is still very large, especially the priority_loss has always been between 1-2. Is there a solution to this problem? ?
The text was updated successfully, but these errors were encountered: