Github Repository for kohya-ss/sd-scripts colab notebook implementation
Notebook Name | Description | Link |
---|---|---|
Kohya LoRA Dreambooth | LoRA Training (Dreambooth method) | |
Kohya LoRA Fine-Tuning | LoRA Training (Fine-tune method) | |
Kohya Trainer | Native Training | |
Kohya Dreambooth | Dreambooth Training | |
Kohya Textual Inversion | Textual Inversion Training | SOON |
What Changes?
- Refactored the 4 notebooks (again)
- Restored the
--learning_rate
function inkohya-LoRA-dreambooth.ipynb
andkohya-LoRA-finetuner.ipynb
#52 - Fixed the cell for inputting custom tags #48 and added the
--keep_tokens
function to prevent custom tags from being shuffled. - Added a cell to check if all LoRA modules have been trained properly.
- Added descriptions for each notebook and links to the relevant notebooks to prevent "training on the wrong notebook" from happening again.
- Added a cell to check the metadata in the LoRA model.
- Added a cell to change the transparent background in the train data.
- Added a cell to upscale the train data using R-ESRGAN
- Divided the Data Annotation section into two cells:
- Removed BLIP and replaced it with
Microsoft/GIT
as the auto-captioning for natural language (git-large-textcaps is the default model). - Updated the Waifu Diffusion 1.4 Tagger to version v2 (SwinV2 is the default model).
- The user can adjust the threshold for general tags. It is recommended to set the threshold higher (e.g.
0.85
) if you are training on objects or characters, and lower the threshold (e.g.0.35
) for training on general, style, or environment. - The user can choose from three available models.
- The user can adjust the threshold for general tags. It is recommended to set the threshold higher (e.g.
- Removed BLIP and replaced it with
- Added a field for uploading to the Huggingface organization account.
- Added the
--min_bucket_reso=320
and--max_bucket_reso=1280
functions for training resolutions above 512 (e.g. 640 and 768), Thanks Trauter!
Training script Changes(kohya_ss)
- Please read Updates 3 Feb. 2023, 2023/2/3 for recent updates.
- Official repository : kohya-ss/sd-scripts
- Gradio Web UI Implementation : bmaltais/kohya_ss
- Automatic1111 Web UI extensions : dPn08/kohya-sd-scripts-webui
- Fine tuning of Stable Diffusion's U-Net using Diffusers
- Addressing improvements from the NovelAI article, such as using the output of the penultimate layer of CLIP (Text Encoder) instead of the last layer and learning at non-square resolutions with aspect ratio bucketing.
- Extends token length from 75 to 225 and offers automatic caption and automatic tagging with BLIP, DeepDanbooru, and WD14Tagger
- Supports hypernetwork learning and is compatible with Stable Diffusion v2.0 (base and 768/v)
- By default, does not train Text Encoder for fine tuning of the entire model, but option to train Text Encoder is available.
- Ability to make learning even more flexible than with DreamBooth by preparing a certain number of images (several hundred or more seems to be desirable).
- gen_img_diffusers
- merge_vae
- convert_diffusers20_original_sd
- detect_face_rotate
- diffusers_fine_tuning
- train_db_fixed
- merge_block_weighted
What Changes?
- Refactored the 4 notebooks, removing unhelpful comments and making some code more efficient.
- Removed the
download and generate
regularization images function fromkohya-dreambooth.ipynb
andkohya-LoRA-dreambooth.ipynb
. - Simplified cells to create the
train_folder_directory
andreg_folder_directory
folders inkohya-dreambooth.ipynb
andkohya-LoRA-dreambooth.ipynb
. - Improved the download link function from outside
huggingface
usingaria2c
. - Set
Anything V3.1
which has been improved CLIP and VAE models as the default pretrained model. - Fixed the
parameter table
and created the remaining tables for the dreambooth notebooks. - Added
network_alpha
as a supporting hyperparameter fornetwork_dim
in the LoRA notebook. - Added the
lr_scheduler_num_cycles
function forcosine_with_restarts
and thelr_scheduler_power
function forpolynomial
. - Removed the global syntax
--learning_rate
in each LoRA notebook becauseunet_lr
andtext_encoder_lr
are already available. - Fixed the
upload to hf_hub
cell function.
Training script Changes(kohya_ss)
- Please read release version 0.4.0 for recent updates.
- Reformat notebook,
- Added
%store
IPython magic command to store important variable - Now you can change the active directory only by editing directory path in
1.1. Clone Kohya Trainer
cell, and save it using%store
magic command. - Deleted
unzip
cell and adjustdownload zip
cell to do auto unzip as well if it detect path startswith /content/ - Added
--flip_aug
to Buckets and Latents cell. - Added
--output_name (your-project)
cell to save Trained Model with custom nam. - Added ability to auto compress
train_data_dir
,last-state
andtraining_logs
before upload them to Huggingface
- Added
- Added
colab_ram_patch
as temporary fix for newest version of Colab after Ubuntu update toload Stable Diffusion model in GPU instead of RAM
Training script Changes(kohya_ss)
- Please read release version 0.3.0 for recent updates.
- Added a function to automatically download the BLIP weight in
make_caption.py
- Added functions for LoRA training and generation
- Fixed issue where text encoder training was not stopped
- Fixed conversion error for v1 Diffusers->ckpt in
convert_diffusers20_original_sd.py
- Fixed npz file name for images with dots in
prepare_buckets_latents.py
Colab UI changes:
- Integrated the repository's format with kohya-ss/sd-script to facilitate merging
- You can no longer choose older script versions in the clone cell because the new format does not support it
- The requirements for both blip and wd tagger have been merged into one requirements.txt file
- The blip cell has been simplified because
make_caption.py
will now automatically download the BLIP weight, as will the wd tagger - A list of sdv2 models has been added to the "download pretrained model" cell
- The "v2" option has been added to the bucketing and training cells
- An image generation cell using
gen_img_diffusers.py
has been added below the training cell
- Added the
save_model_as
option tofine_tune.py
, which allows you to save the model in any format. - Added the
keep_tokens
option tofine_tune.py
, which allows you to fix the first n tokens of the caption and not shuffle them. - Added support for left-right flipping augmentation in
prepare_buckets_latents.py
andfine_tune.py
with theflip_aug
option.
- Added support for training with fp16 gradients (experimental feature). This allows training with 8GB VRAM on SD1.x. See "Training with fp16 gradients (experimental feature)" for details.
- Updated WD14Tagger script to automatically download weights.
- Requires Diffusers 0.10.2 (0.10.0 or later will work, but there are reported issues with 0.10.0 so we recommend using 0.10.2). To update, run
pip install -U diffusers[torch]==0.10.2
in your virtual environment. - Added support for Diffusers 0.10 (uses code in Diffusers for
v-parameterization
training and also supportssafetensors
). - Added support for accelerate 0.15.0.
- Added support for multiple teacher data folders. For caption and tag preprocessing, use the
--full_path
option. The arguments for the cleaning script have also changed, see "Caption and Tag Preprocessing" for details.
- Temporary fix for an error when saving in the .safetensors format with some models. If you experienced this error with v5, please try v6.
- Added support for the .safetensors format. Install safetensors with
pip install safetensors
and specify theuse_safetensors
option when saving. - Added the
log_prefix
option. - The cleaning script can now be used even when one of the captions or tags is missing.
- The script name has changed to fine_tune.py.
- Added the option
--train_text_encoder
to train the Text Encoder. - Added the option
--save_precision
to specify the data format of the saved checkpoint. Can be selected from float, fp16, or bf16. - Added the option
--save_state
to save the training state, including the optimizer. Can be resumed with the--resume
option.
- Requires Diffusers 0.9.0. To update it, run
pip install -U diffusers[torch]==0.9.0
. - Supports Stable Diffusion v2.0. Use the
--v2
option when training (and when pre-acquiring latents). If you are using 768-v-ema.ckpt or stable-diffusion-2 instead of stable-diffusion-v2-base, also use the--v_parameterization
option when training. - Added options to specify the minimum and maximum resolutions of the bucket when pre-acquiring latents.
- Modified the loss calculation formula.
- Added options for the learning rate scheduler.
- Added support for downloading Diffusers models directly from Hugging Face and for saving during training.
- The cleaning script can now be used even when only one of the captions or tags is missing.
- Added options for the learning rate scheduler.
- Implemented Waifu Diffusion 1.4 Tagger for alternative DeepDanbooru for auto-tagging
- Added a tagging script using WD14Tagger.
- Fixed a bug that caused data to be shuffled twice.
- Corrected spelling mistakes in the options for each script.
While Stable Diffusion fine tuning is typically based on CompVis, using Diffusers as a base allows for efficient and fast fine tuning with less memory usage. We have also added support for the features proposed by Novel AI, so we hope this article will be useful for those who want to fine tune their models.
— kohya_ss