Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When starting the training, there is no response #131

Open
EobardThawne721 opened this issue Dec 26, 2024 · 7 comments
Open

When starting the training, there is no response #131

EobardThawne721 opened this issue Dec 26, 2024 · 7 comments

Comments

@EobardThawne721
Copy link

Based on the information shown below, why is it that when I restart the training, no further information is displayed, and it stays stuck at the last line, where only the model parameters are printed and nothing happens afterward?

And also, there's no information displayed on TensorBoard.

[2024-12-26 15:33:23,763][matcha.utils.utils][INFO] - Enforcing tags! <cfg.extras.enforce_tags=True>
[2024-12-26 15:33:23,787][matcha.utils.utils][INFO] - Printing config tree with Rich! <cfg.extras.print_config=True>
CONFIG
├── data
│ └── target: matcha.data.text_mel_datamodule.TextMelDataModule
│ name: ljspeech
│ train_filelist_path: D:/Artificial Intelligence/python_project/Matcha-T
│ valid_filelist_path: D:/Artificial Intelligence/python_project/Matcha-T
│ batch_size: 32
│ num_workers: 20
│ pin_memory: true
│ cleaners:
│ - english_cleaners2
│ add_blank: true
│ n_spks: 1
│ n_fft: 1024
│ n_feats: 80
│ sample_rate: 22050
│ hop_length: 256
│ win_length: 1024
│ f_min: 0
│ f_max: 8000
│ data_statistics:
│ mel_mean: -5.517027
│ mel_std: 2.064394
│ seed: 1234
│ load_durations: false

├── model
│ └── target: matcha.models.matcha_tts.MatchaTTS
│ n_vocab: 178
│ n_spks: 1
│ spk_emb_dim: 64
│ n_feats: 80
│ data_statistics:
│ mel_mean: -5.517027
│ mel_std: 2.064394
│ out_size: null
│ prior_loss: true
│ use_precomputed_durations: false
│ encoder:
│ encoder_type: RoPE Encoder
│ encoder_params:
│ n_feats: 80
│ n_channels: 192
│ filter_channels: 768
│ filter_channels_dp: 256
│ n_heads: 2
│ n_layers: 6
│ kernel_size: 3
│ p_dropout: 0.1
│ spk_emb_dim: 64
│ n_spks: 1
│ prenet: true
│ duration_predictor_params:
│ filter_channels_dp: 256
│ kernel_size: 3
│ p_dropout: 0.1
│ decoder:
│ channels:
│ - 256
│ - 256
│ dropout: 0.05
│ attention_head_dim: 64
│ n_blocks: 1
│ num_mid_blocks: 2
│ num_heads: 2
│ act_fn: snakebeta
│ cfm:
│ name: CFM
│ solver: euler
│ sigma_min: 0.0001
│ optimizer:
target: torch.optim.Adam
partial: true
│ lr: 0.0001
│ weight_decay: 0.0

├── callbacks
│ └── model_checkpoint:
target: lightning.pytorch.callbacks.ModelCheckpoint
│ dirpath: D:\Artificial Intelligence\python_project\Matcha-TTS-main\lo
│ filename: checkpoint_{epoch:03d}
│ monitor: epoch
│ verbose: false
│ save_last: true
│ save_top_k: 10
│ mode: max
│ auto_insert_metric_name: true
│ save_weights_only: false
│ every_n_train_steps: null
│ train_time_interval: null
│ every_n_epochs: 100
│ save_on_train_epoch_end: null
│ model_summary:
target: lightning.pytorch.callbacks.RichModelSummary
│ max_depth: 3
│ rich_progress_bar:
target: lightning.pytorch.callbacks.RichProgressBar

├── logger
│ └── tensorboard:
target: lightning.pytorch.loggers.tensorboard.TensorBoardLogger
│ save_dir: D:\Artificial Intelligence\python_project\Matcha-TTS-main\l
│ name: null
│ log_graph: false
│ default_hp_metric: true
│ prefix: ''

├── trainer
│ └── target: lightning.pytorch.trainer.Trainer
│ default_root_dir: D:\Artificial Intelligence\python_project\Matcha-TTS-
│ max_epochs: -1
│ accelerator: gpu
│ devices:
│ - 0
│ precision: 16-mixed
│ check_val_every_n_epoch: 1
│ deterministic: false
│ gradient_clip_val: 5.0

├── paths
│ └── root_dir: D:\Artificial Intelligence\python_project\Matcha-TTS-main
│ data_dir: D:\Artificial Intelligence\python_project\Matcha-TTS-main/dat
│ log_dir: D:\Artificial Intelligence\python_project\Matcha-TTS-main/logs
│ output_dir: D:\Artificial Intelligence\python_project\Matcha-TTS-main\l
│ work_dir: D:\Artificial Intelligence\python_project\Matcha-TTS-main\mat

├── extras
│ └── ignore_warnings: false
│ enforce_tags: true
│ print_config: true

├── task_name
│ └── train
├── run_name
│ └── ljspeech
├── tags
│ └── ['ljspeech']
├── train
│ └── True
├── test
│ └── True
├── ckpt_path
│ └── None
└── seed
└── 1234
Global seed set to 1234
[2024-12-26 15:33:24,106][main][INFO] - Instantiating datamodule <matcha.data.text_mel_datamodule.TextMelDataModule>
[2024-12-26 15:33:27,319][main][INFO] - Instantiating model <matcha.models.matcha_tts.MatchaTTS>
D:\anaconda\envs\matcha-tts\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: LoRACompatibleLinear is deprecated and will be removed in version 1.0.0. Use of LoRACompatibleLinear is deprecated. Please switch to PEFT backend by installing PEFT: pip install peft.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
[2024-12-26 15:33:28,402][main][INFO] - Instantiating callbacks...
[2024-12-26 15:33:28,403][matcha.utils.instantiators][INFO] - Instantiating callback <lightning.pytorch.callbacks.ModelCheckpoint>
[2024-12-26 15:33:28,410][matcha.utils.instantiators][INFO] - Instantiating callback <lightning.pytorch.callbacks.RichModelSummary>
[2024-12-26 15:33:28,414][matcha.utils.instantiators][INFO] - Instantiating callback <lightning.pytorch.callbacks.RichProgressBar>
[2024-12-26 15:33:28,419][main][INFO] - Instantiating loggers...
[2024-12-26 15:33:28,420][matcha.utils.instantiators][INFO] - Instantiating logger <lightning.pytorch.loggers.tensorboard.TensorBoardLogger>
[2024-12-26 15:33:28,428][main][INFO] - Instantiating trainer <lightning.pytorch.trainer.Trainer>
Using 16bit Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'lightning.pytorch.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default ModelSummary callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2024-12-26 15:33:28,590][main][INFO] - Logging hyperparameters!
[2024-12-26 15:33:28,932][main][INFO] - Starting training!
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┌────┬───────────────────────────────────┬───────────────────┬────────┐
│ │ Name │ Type │ Params │
├────┼───────────────────────────────────┼───────────────────┼────────┤
│ 0 │ encoder │ TextEncoder │ 7.2 M │
│ 1 │ encoder.emb │ Embedding │ 34.2 K │
│ 2 │ encoder.prenet │ ConvReluNorm │ 591 K │
│ 3 │ encoder.prenet.conv_layers │ ModuleList │ 553 K │
│ 4 │ encoder.prenet.norm_layers │ ModuleList │ 1.2 K │
│ 5 │ encoder.prenet.relu_drop │ Sequential │ 0 │
│ 6 │ encoder.prenet.proj │ Conv1d │ 37.1 K │
│ 7 │ encoder.encoder │ Encoder │ 6.2 M │
│ 8 │ encoder.encoder.drop │ Dropout │ 0 │
│ 9 │ encoder.encoder.attn_layers │ ModuleList │ 889 K │
│ 10 │ encoder.encoder.norm_layers_1 │ ModuleList │ 2.3 K │
│ 11 │ encoder.encoder.ffn_layers │ ModuleList │ 5.3 M │
│ 12 │ encoder.encoder.norm_layers_2 │ ModuleList │ 2.3 K │
│ 13 │ encoder.proj_m │ Conv1d │ 15.4 K │
│ 14 │ encoder.proj_w │ DurationPredictor │ 345 K │
│ 15 │ encoder.proj_w.drop │ Dropout │ 0 │
│ 16 │ encoder.proj_w.conv_1 │ Conv1d │ 147 K │
│ 17 │ encoder.proj_w.norm_1 │ LayerNorm │ 512 │
│ 18 │ encoder.proj_w.conv_2 │ Conv1d │ 196 K │
│ 19 │ encoder.proj_w.norm_2 │ LayerNorm │ 512 │
│ 20 │ encoder.proj_w.proj │ Conv1d │ 257 │
│ 21 │ decoder │ CFM │ 11.0 M │
│ 22 │ decoder.estimator │ Decoder │ 11.0 M │
│ 23 │ decoder.estimator.time_embeddings │ SinusoidalPosEmb │ 0 │
│ 24 │ decoder.estimator.time_mlp │ TimestepEmbedding │ 1.2 M │
│ 25 │ decoder.estimator.down_blocks │ ModuleList │ 3.1 M │
│ 26 │ decoder.estimator.mid_blocks │ ModuleList │ 2.8 M │
│ 27 │ decoder.estimator.up_blocks │ ModuleList │ 3.7 M │
│ 28 │ decoder.estimator.final_block │ Block1D │ 197 K │
│ 29 │ decoder.estimator.final_proj │ Conv1d │ 20.6 K │
└────┴───────────────────────────────────┴───────────────────┴────────┘
Trainable params: 18.2 M
Non-trainable params: 0
Total params: 18.2 M
Total estimated model params size (MB): 72

image

@EobardThawne721
Copy link
Author

Is there anything else I need to change? I followed the steps in the README.

@shivammehta25
Copy link
Owner

I am not sure what could be the reason for it. I could not reproduce it, could you update your cuda version and create a new environment and then retry?

@shivammehta25
Copy link
Owner

Are you seeing realtime console logs or are these some logs from a .log file? I know that rich progressbar can have some issues logging into the .log file.

@EobardThawne721
Copy link
Author

Thank you for your reply. I found that during training, my real-time console and log files did not output any information, but in reality, it was during training. Is this problem due to a conflict in the package for printing relevant information? The model can still be saved normally, but there is no information output
image

image

@EobardThawne721
Copy link
Author

Thank you for your reply. I found that during training, my real-time console and log files did not output any information, but in reality, it was during training. Is this problem due to a conflict in the package for printing relevant information? The model can still be saved normally, but there is no information output image

image

I'm not sure if it's because there's a conflict in the printed packages

@shivammehta25
Copy link
Owner

shivammehta25 commented Jan 8, 2025

You can just launch tensorboard and also have more information: tensorboard --logdir logs

@EobardThawne721
Copy link
Author

Anyway, thank you for your contribution and timely response. Perhaps there was a package conflict

You can just launch tensorboard and also have more information: tensorboard --logdir logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants