ide-cap-chan

ide-cap-chan is a utility for batch captioning images with natural language using various VL models.

Features

Batch caption generation for Idefics3-8B-Llama3, llava-v1.6, Llama JoyCaption Alpha Two, Qwen2-VL-7B-Instruct, Molmo-7B-D, Molmo-7B-O, Molmo-72B, Pixtral models
Support for multi-GPU captioning
Support of fp16 or nf4 quants for lower VRAM requirements
Default models (You can specify any other supported architectural model):
Support for huggingface/local/external models
Support for additional tag files to enhance captions
Interrupting and resuming the captioning process
Recursive processing of subfolders in the specified input folder

Requirements

A video card with CUDA support (from 8GB for llava up to 24GB Qwen2/Molmo7B and 2x24GB for Molmo 72B)

Installation

Clone the repository: git clone https://github.com/2dameneko/ide-cap-chan
On Windows: run install.bat, on Linux: make venv and run pip install -r requirements.txt

Usage

Place images and corresponding tag files in the input folder (default: 2tag).
On Windows: run batch_processing.bat, on Linux: run the script with the following command: python ide-cap-chan.py
You can use different models of supported architectures by specifying them on the command line
You can optionally modify the prompt according to your specific conditions in the model_handler.py file: system_prompt and user_prompt.
- Note: molmo72b use it's own shorter promt to prevent OOM

Update

On Windows: run update.cmd

Options

By default, no command line arguments are required. Additional command line arguments: python ide-cap-chan.py -h

--model_path - Path to the used model. Default cyan2k/molmo-7B-O-bnb-4bit
--model_type - Model type (supported architectures: idefics3, llava, joy-caption, molmo, molmo72b, qwen2vl, pixtral). Default molmo
--input_dir - Path to the folder containing images. Default 2tag
--CUDA_VISIBLE_DEVICES comma-separated list of CUDA devices. Default 0.
- WARNING: multi-GPU captioning can overload your power supply unit
- Note: molmo72b model ignore CUDA_VISIBLE_DEVICES arg and use 0 and 1 GPUs
--caption_suffix Extension for generated caption files. Default .ttxt
--dont_use_tags Don't use existing *booru tags to enhance captioning. Default False
--tags_suffix Extension for existing *booru tag files. Default .txt'Extension for existing *booru tag files'

File formats supported:

.jpg, .png, .webp,.jpeg

Version history

0.7: Added molmo, molmo72b, qwen2vl, pixtral architectures support. Set default to molmo. Fixed quants milti-GPU processing - at full speed now. Project structure changed. Refactored.
0.6: Refactored, internal.
0.5: Added joy-caption architecture support. Refactored.
0.4: Added llava architecture support. Reworked args. Removed temporally pinned pytorch ver to 2.4.1 due bugged 2.5 release. Now it's all ok with pytorch 2.5.1
0.3: Reworked 'using' args, fixed minor bug with file extension case
0.2:
- Support for multi-GPU captioning (-h for command line args) with proportional workload balancing
- Support of nf4 quants, enabled by default. ~5Gb model size instead of ~18Gb
- Fixed filtering bug with same name files for captioning in different folders in one batch
- Reworked scripts for VENV creation and update
- Code refactoring
0.1 Inital release

License

https://www.apache.org/licenses/LICENSE-2.0

Credits

Thank you for your interest in ide-cap-chan!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
!del_txt.bat		!del_txt.bat
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arg_parser.py		arg_parser.py
batch_processing.bat		batch_processing.bat
convert_to_bnb_nf4.bat		convert_to_bnb_nf4.bat
convert_to_bnb_nf4.py		convert_to_bnb_nf4.py
gpu_utils.py		gpu_utils.py
ide-cap-chan.py		ide-cap-chan.py
image_processing.py		image_processing.py
install.bat		install.bat
model_handler.py		model_handler.py
reinstall.bat		reinstall.bat
requirements.txt		requirements.txt
update.cmd		update.cmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ide-cap-chan

Features

Requirements

Installation

Usage

Update

Options

File formats supported:

Version history

License

Credits

About

Releases

Packages

Languages

License

2dameneko/ide-cap-chan

Folders and files

Latest commit

History

Repository files navigation

ide-cap-chan

Features

Requirements

Installation

Usage

Update

Options

File formats supported:

Version history

License

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages