Colab training error #244

astro9ine · 2023-07-14T00:06:29Z

Describe the bug

Thank you so much for this repo and all hard work, i tried to train a realistic model using these settings

!accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --seed=3434554 \
  --resolution=512 \
  --train_batch_size=1 \
  --train_text_encoder \
  --shuffle_after_epoch \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=354 \
  --sample_batch_size=1 \
  --max_train_steps=3000 \
  --save_sample_prompt="photo of sks person" \
  --concepts_list="concepts_list.json"

# Reduce the `--save_interval` to lower than `--max_train_steps` to save weights from intermediate steps.
# `--save_sample_prompt` can be same as `--instance_prompt` to generate intermediate samples (saved along with weights in samples directory).

and i got this error after an hour and 40 minutes

Reproduction

Steps: 100% 3000/3000 [1:46:53<00:00, 1.69s/it, loss=0.259, lr=1e-6]
Downloading (…)on_pytorch_model.bin: 0% 0.00/335M [00:00<?, ?B/s]
Downloading (…)on_pytorch_model.bin: 13% 41.9M/335M [00:00<00:00, 371MB/s]
Downloading (…)on_pytorch_model.bin: 25% 83.9M/335M [00:00<00:00, 378MB/s]
Downloading (…)on_pytorch_model.bin: 38% 126M/335M [00:00<00:00, 328MB/s]
Downloading (…)on_pytorch_model.bin: 50% 168M/335M [00:00<00:00, 304MB/s]
Downloading (…)on_pytorch_model.bin: 60% 199M/335M [00:00<00:00, 286MB/s]
Downloading (…)on_pytorch_model.bin: 69% 231M/335M [00:00<00:00, 289MB/s]
Downloading (…)on_pytorch_model.bin: 81% 273M/335M [00:00<00:00, 298MB/s]
Downloading (…)on_pytorch_model.bin: 91% 304M/335M [00:01<00:00, 88.4MB/s]
Downloading (…)on_pytorch_model.bin: 100% 335M/335M [00:02<00:00, 166MB/s]

Downloading (…)lve/main/config.json: 100% 547/547 [00:00<00:00, 2.52MB/s]
You have passed None for safety_checker to disable its functionality in <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'>. Note that this might lead to problems when using <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> and is not recommended.
Traceback (most recent call last):
File "/content/train_dreambooth.py", line 870, in
main(args)
File "/content/train_dreambooth.py", line 863, in main
save_weights(global_step)
File "/content/train_dreambooth.py", line 727, in save_weights
pipeline = StableDiffusionPipeline.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipeline_utils.py", line 537, in from_pretrained
raise ValueError(
ValueError: The component <class 'transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor'> of <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}.
Steps: 100% 3000/3000 [1:46:57<00:00, 2.14s/it, loss=0.259, lr=1e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=/content/realistic', '--pretrained_vae_name_or_path=stabilityai/sd-vae-ft-mse', '--output_dir=/content/stable_diffusion_weights/ksi2', '--with_prior_preservation', '--prior_loss_weight=1.0', '--seed=3434554', '--resolution=512', '--train_batch_size=1', '--train_text_encoder', '--shuffle_after_epoch', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=354', '--sample_batch_size=1', '--max_train_steps=3000', '--save_sample_prompt=photo of sks person', '--concepts_list=concepts_list.json']' returned non-zero exit status 1.

Logs

No response

System Info

Colab diffusers version

The text was updated successfully, but these errors were encountered:

astro9ine added the bug Something isn't working label Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab training error #244

Colab training error #244

astro9ine commented Jul 14, 2023

Colab training error #244

Colab training error #244

Comments

astro9ine commented Jul 14, 2023

Describe the bug

Reproduction

Logs

System Info