Cannot continue training from last checkpoint #188

chunping-xt · 2024-07-16T02:49:58Z

After the first finetune with your checkpoint with ~30 hours of data at epoch=6, I tried inference but the result didn't sound like anything like speech. I was going to train a few more epochs to see if that improved but got an error with my last checkpoint.

inference with the last checkpoint: ckpt_0035000.pt

from IPython.display import Audio, display
from fam.llm.fast_inference import TTS

tts = TTS(first_stage_path = '/mnt/f/ckpt_0035000.pt')
wav_file = tts.synthesise( text, spk_ref_path="/mnt/f/sample.mp3" )
display(Audio(wav_file, autoplay=True)) # bad result

continue training from last checkpoint

!python fam/llm/finetune.py \
--train '/mnt/f/train.csv' --val '/mnt/f/eval.csv' \
--ckpt '/mnt/f/ckpt_0035000.pt' \
--spk-emb-ckpt '/mnt/f/metavoice-1B-v0.1/speaker_encoder.pt'

... error...:
/usr/local/envs/env_metavoice/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/usr/local/envs/env_metavoice/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Training: 90%|███████████████████████████████▎ | 35000/39060 [00:00<?, ?it/s]Before layer freezing trainable_count(model)=1243191296...
After freezing excl. last 1 transformer blocks: trainable_count(model)=51386368...
Traceback (most recent call last):
File "/mnt/f/repo_metavoice-src/fam/llm/finetune.py", line 387, in
main()
File "/usr/local/envs/env_metavoice/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/envs/env_metavoice/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/envs/env_metavoice/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/envs/env_metavoice/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/mnt/f/repo_metavoice-src/fam/llm/finetune.py", line 263, in main
finetune_jobid = hash_dictionary(properties)
File "/mnt/f/repo_metavoice-src/fam/llm/utils.py", line 97, in hash_dictionary
serialized = json.dumps(d, sort_keys=True)
File "/usr/local/envs/env_metavoice/lib/python3.10/json/init.py", line 238, in dumps
**kw).encode(obj)
File "/usr/local/envs/env_metavoice/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/envs/env_metavoice/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/local/envs/env_metavoice/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type PosixPath is not JSON serializable
...

CaptEug · 2024-12-06T06:46:06Z

not if u alr solve this problem, the solution is:

change the ckpt type from posixpath to str

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot continue training from last checkpoint #188

Cannot continue training from last checkpoint #188

chunping-xt commented Jul 16, 2024 •

edited

Loading

CaptEug commented Dec 6, 2024

Cannot continue training from last checkpoint #188

Cannot continue training from last checkpoint #188

Comments

chunping-xt commented Jul 16, 2024 • edited Loading

inference with the last checkpoint: ckpt_0035000.pt

continue training from last checkpoint

CaptEug commented Dec 6, 2024

chunping-xt commented Jul 16, 2024 •

edited

Loading