Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Matcha for other languages: Fine tunning HiFiGAN to specific speakers #124

Open
ibonsc opened this issue Dec 6, 2024 Discussed in #120 · 1 comment
Open

Comments

@ibonsc
Copy link

ibonsc commented Dec 6, 2024

Discussed in #120

Originally posted by ibonsc November 28, 2024
Hello
I'm training Matcha for different languages and I think it would be good to fine tune the universal HiFiGAN vocoder to these new languages. Have you done such a thing?
To do that it is necessary to get the ground truth-aligned spectrograms of the training material, that will be used as input for the HiFiGAN finetuning. Is there any way to do that?
Thank you very much for your help.

@shivammehta25
Copy link
Owner

Yeah! Ofcourse, what you can do it extract alignments from a trained model, using the first part of: https://github.com/shivammehta25/Matcha-TTS/wiki/Extracting-phoneme-alignments-and-improving-GPU-utilisation

Then, you would have to hack the inference code and use these alignments instead. One way I can suggest is to use the batched dataset in the CLI and also load the transcripts and durations:

Matcha-TTS/matcha/cli.py

Lines 292 to 300 in 108906c

class BatchedSynthesisDataset(torch.utils.data.Dataset):
def __init__(self, processed_texts):
self.processed_texts = processed_texts
def __len__(self):
return len(self.processed_texts)
def __getitem__(self, idx):
return self.processed_texts[idx]

Then use these saved durations instead of w

w = torch.exp(logw) * x_mask

And it should be fine. Hope this helps, let me know if you have any more doubts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants