🎉 We've updated the CLMP training and fine-tuning code and documentation! Come check it out~ 🚀 [2024-11-09]
🎉 We've released the MelodySet dataset. [2024-11-08]
🎉 We've released the MusicSet dataset! Come and try it out~ 🎵 [2024-11-05]
This repository contains the implementation of the music generation model MG2, the first novel approach using melody to guide the music generation that, despite a pretty simple method and extremely limited resources, achieves excellent performance.
Anyone can use this model to generate personalized background music for their short videos on platforms like TikTok, YouTube Shorts, and Meta Reels. Additionally, it is very cost-effective to fine-tune the model with your own private music dataset.
You can watch the introduction video on
Now you can try music generation with your own prompt on our
Tips: To generate high-quality music using MG2, you would want to craft detailed and descriptive prompts that provide rich context and specific musical elements.
To get started with MG2, follow the steps below:
git clone https://github.com/shaopengw/Awesome-Music-Generation.git
cd Awesome-Music-Generation
# Create and activate the environment from the provided environment file
conda env create -f environment.yml
conda activate MMGen_quickstart
# Ensure that the checkpoints are stored in the following directory structure
Awesome-Music-Generation/
└── data/
└── checkpoints/
# Update the paths to reflect your local environment setup
# Replace:
export PYTHONPATH=/mnt/sda/quick_start_demonstration/Awesome-Music-Generation:$PYTHONPATH
export PYTHONPATH=/mnt/sda/quick_start_demonstration/Awesome-Music-Generation/data:$PYTHONPATH
# With:
export PYTHONPATH=/your/local/path/Awesome-Music-Generation:$PYTHONPATH
export PYTHONPATH=/your/local/path/Awesome-Music-Generation/data:$PYTHONPATH
chmod +x quick_start.sh
bash quick_start.sh
Allow the script to run for several minutes. Upon completion, the results will be available in the following directory:
Awesome-Music-Generation/log/latent_diffusion/quick_start/quick_start
We introduce the newly proposed MusicSet dataset, featuring approximately 150,000 high-quality 10-second music-melody-text pairs.
We propose CLMP (Contrastive Language-Music Pretraining) to align text description, music waveform and melody before the training of diffusion module. We utilize the Webdataset as a dataloader for music waveform and text description, and we use another dataloader for melody. The MusicSet has been orginized as following for the traning of CLMP:
# Ensure that the training data packaged with Webdataset format is orginized as following:
clmp/
└── dataset/
└── MusicSet/
└──train/pretrain0.tar
pretrain1.tar
pretrain2.tar
...
└──valid/
└──test/
The dataset structure of diffusion module is as following:
(Noted that you must convert .flac
files to .wav
format.)
Awesome-Music-Generation/
└── data/
└── dataset/
└── audioset/
└── wav/00040020.wav
00009570.wav
...
└── metadata/dataset_root.json
└── MusicSet/
└── datafiles/train.json
valid.json
test.json
Below is an example of dataset_root.json:
{
"MusicSet": "/mnt/data/wmz/Awesome-Music-Generation/data/dataset/audioset",
"comments": {},
"metadata": {
"path": {
"MusicSet": {
"train": "./data/dataset/metadata/MusicSet/datafiles/train.json",
"test": "./data/dataset/metadata/MusicSet/datafiles/test.json",
"val": "./data/dataset/metadata/MusicSet/datafiles/valid.json",
"class_label_indices": ""
}
}
}
}
Below is an example of train.json:
{
"data": [
{
"wav": "wav/00040020.wav",
"seg_label": "",
"labels": "",
"caption": " The song starts with the high and fuzzy tone of an alarm bell beeping until a button is pressed, which triggers the grungy sound of an electric guitar being played in a rock style.", "The beat then counts to four, enhancing the overall rhythm."
},
{
"wav": "wav/00009570.wav",
"seg_label": "",
"labels": "",
"caption": "This lively song features a male vocalist singing humorous lyrics over a medium-fast tempo of 106.", "0 beats per minute.", "Accompanied by keyboard harmony, acoustic guitar, steady drumming, and simple bass lines, the catchy tune is easy to sing along with.", "Set in the key of B major, the chord sequence includes Abm7, F#/G#, and Emaj7.", "With its spirited and animated feel, this fun track is sure to keep listeners engaged from start to finish."
}
]
}
We will release the MelodySet, containing processed melodies for MusicCaps and Musicbench. We extract the melodies using basic-pitch and organize them using melody triplets. MelodySet is a subset of the MusicSet Each waveform file .wav
has a corresponding melody file.txt
with same filename prefix. For example, 00040020.wav
corresponds to 00040020.txt
, and all melodies are placed in a single directory.
The orginization of music waveform and text description are same as that in MusicSet. Thus we only show the dataset structure of melody part as following:
your_path/
└── melody_text/00040020.txt
00009570.txt
Below is an example of melody, which consists of melody triplets:
<G4>,<114>,<79>|<A4>,<119>,<81>|<B2>,<159>,<0>|<G4>,<117>,<62>|<A4>,<91>,<77>|<D3>,<202>,<0>|<B4>,<92>,<72>|<A4>,<95>,<77>|<B4>,<98>,<80>|<G3>,<200>,<0>|<A4>,<151>,<30>|<G4>,<95>,<77>|<A4>,<93>,<82>|<F#3>,<146>,<0>|<A2>,<201>,<0>|<G2>,<116>,<117>|<G3>,<149>,<0>|<B2>,<122>,<75>|<D3>,<110>,<77>|<B4>,<206>,<0>|<B4>,<113>,<111>|<B3>,<90>,<95>|<A3>,<110>,<57>|<E5>,<113>,<41>|<G3>,<177>,<0>|<D#5>,<119>,<73>|<B3>,<119>,<32>|<C4>,<108>,<78>|<E5>,<111>,<49>|<F#5>,<117>,<82>|<E5>,<111>,<78>|<F#5>,<114>,<82>|<G3>,<151>,<0>|<G5>,<95>,<73>|<F#5>,<91>,<81>|<G5>,<92>,<78>|<A3>,<143>,<43>|<E4>,<202>,<0>|<F#5>,<152>,<30>|<E5>,<98>,<86>|<D#4>,<139>,<8>|<B3>,<142>,<0>|<F#5>,<94>,<68>|<B3>,<111>,<120>|<G3>,<114>,<84>|<B3>,<118>,<83>|<E3>,<122>,<81>|<G5>,<231>,<0>|<E4>,<234>,<0>|<F#5>,<118>,<63>|<E5>,<114>,<79>|<G3>,<118>,<37>|<D5>,<122>,<76>|<C#5>,<119>,<78>|<E5>,<119>,<77>|<B3>,<100>,<78>|<B4>,<123>,<57>|<E5>,<112>,<71>|<A3>,<209>,<0>|<G5>,<123>,<105>|<A4>,<154>,<0>|<F#5>,<124>,<73>|<A3>,<136>,<22>|<C#4>,<205>,<0>|<E5>,<125>,<28>|<F#5>,<121>,<74>|<A5>,<115>,<72>|<D3>,<144>,<0>|<E3>,<95>,<81>|<E5>,<122>,<62>|<A5>,<115>,<76>|<F#3>,<106>,<84>|<D5>,<117>,<48>|<C5>,<125>,<74>|<D3>,<102>,<74>|<B4>,<120>,<50>|<A4>,<123>,<76>|<B4>,<116>,<80>|<D5>,<117>,<79>|<D4>,<319>,<0>|<A4>,<113>,<65>|<C4>,<114>,<42>|<D5>,<116>,<78>|<B3>,<108>,<84>|<G4>,<114>,<43>
Assuming you've gone through the Quick Start guide, let's dive into the training and fine-tuning process! 🚀
conda activate MMGen_quickstart
This section covers the training and fine-tuning process for the CLMP.
cd your_path/MMGen_train/modules/clmp
Before running the training script, review and update (crucial) the paths in Awesome-Music-Generation/MMGen_train/modules/clmp/training.sh as needed. This file contains necessary training details.
bash training.sh
Similarly, review and update (crucial) the paths in Awesome-Music-Generation/MMGen_train/modules/clmp/fine_tuning.sh before proceeding with fine-tuning.
bash fine_tuning.sh
After CLMP model training or fine-tuning, you'll need to generate embeddings and construct FAISS indices to enable efficient similarity search during the Latent Diffusion training phase. Follow this two-step process:
-
Generate CLMP Embeddings Enable embedding extraction by adding the following flag to your training configuration:
--collect-audio-melody-feature True
Execute the training or fine-tuning script with this flag:
bash training.sh # or fine_tuning.sh
The model will generate audio and melody feature embeddings in the following directory:
your_path/Awesome-Music-Generation/MMGen_train/modules/clmp/faiss_indexing/clmp_embeddings
-
Construct FAISS Indices Navigate to the indexing directory and execute the index construction script:
cd your_path/Awesome-Music-Generation/MMGen_train/modules/clmp/faiss_indexing
# you should modify the path of embeddings in this script python build_faiss_indices.py
The script will generate optimized FAISS indices in:
your_path/Awesome-Music-Generation/MMGen_train/modules/clmp/faiss_indexing/faiss_indices
Before the training or finetuning of diffusion module, you should prepare required files and replace corresponding file paths in scripts.
First, you should set the mode. In the script MMGen_train/train/latent_diffusion.py
, for evaluation purpose, please set only_validation = True
; for training purpose, please set only_validation = False
.
Then, you should prepare the required files for melody vector database, including .faiss
and .npy
, which can be found in HuggingFace. Please replace the path of .faiss
and .npy
in script MMGen_train/modules/latent_diffusion/ddpm.py
# change the melody_npy and melody.faiss to the local path
melody_npy = np.load("MMGen/melody.npy")
melody_builder = FaissDatasetBuilder(melody_npy)
melody_builder.load_index("MMGen/melody.faiss")
Afterwards, you can run the following command to train from scratch:
python3 MMGen_train/train/latent_diffusion.py -c MMGen_train/config/train.yaml
Regarding to training dataset, please refer to Dataset section
You can also finetune with our pretrained model, the checkpoint is mg2-diffusion-checkpoint.ckpt
, which can be found here.
Then, you can run the following command to finetune your own model:
python3 MMGen_train/train/latent_diffusion.py -c MMGen_train/config/train.yaml --reload_from_ckpt data/checkpoints/mg2-diffusion-checkpoint.ckpt
Noted that MG2 is not permitted for commercial use.
- Demo website
- Huggingface checkpoints
- Quick start (Inference)
- Training Datasets
- Training/fine-tuning code
- Online free generation service
We sincerely acknowledge the developers of the following open-source code bases. These resources are invaluable sparks that ignite innovation and progress in the real world 🎆!
- https://github.com/CompVis/stable-diffusion
- https://github.com/haoheliu/AudioLDM-training-finetuning
- https://github.com/LAION-AI/CLAP
- https://github.com/jik876/hifi-gan
- https://github.com/facebookresearch/faiss
- https://mtg.github.io/mtg-jamendo-dataset
The research is supported by the Key Technologies Research and Development Program under Grant No. 2020YFC0832702, and National Natural Science Foundation of China under Grant Nos. 71910107002, 62376227, 61906159, 62302400, 62176014, and Sichuan Science and Technology Program under Grant No. 2023NSFSC0032, 2023NSFSC0114, and Guanghua Talent Project of Southwestern University of Finance and Economics.
@article{wei2024melodyneedmusicgeneration,
title={Melody-Guided Music Generation},
author={Shaopeng Wei and Manzhen Wei and Haoyu Wang and Yu Zhao and Gang Kou},
year={2024},
eprint={2409.20196},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2409.20196},
}