Running Inference

AI Generated GTAV

A Deep Learning project that uses Diffusion transformers (DiT) to generate Grand Theft Auto V driving footage. This project is based on the Open-Oasis Project

👉 Also check out TEDD1104: Self Driving Car in GTAV

Architecture

This project implements a diffusion-based video generation model trained on GTA V gameplay footage using:

Vision Transformer (ViT) for encoding/decoding frames
Diffusion Transformer (DiT) for the generative process
Optional action conditioning for controlled generation

Features

✨ Pretrained Models
🚀 Inference code for generating driving sequences
💻 Complete training pipeline
📊 Training dataset with 1.2M sequences

⚠️ This is a personal exploration project for video diffusion models. The code prioritizes readability and visualization over performance. While functional, results may be imperfect due to limited training resources. Feel free to experiment with the code!

Requirements

Python 3.8+
PyTorch 2.0+
Torch Vision
PyTorch Image Models
Hugging Face Accelerate
Hugging Face Transformers
Hugging Face Datasets
Wandb (for logging)

pip install --upgrade torch torchvision transformers accelerate datasets einops wandb webdataset matplotlib timm

Running Inference

First download the 🤖 Pretrained Models from 🤗Iker/AI-Generated-GTA-V.

Generate 32 frame video from random start frames from the test dataset without action conditioning:

python3 generate.py \
--total-frames 32 \
--dit_model_path download_path/dit.safetensors \
--vae_model_path download_path/vit-l-20.safetensors \
--noise_steps 100 \
--output_path your_video.mp4

Generate from custom start image without action conditioning:

python3 generate.py \
--total-frames 32 \
--dit_model_path download_path/dit.safetensors \
--vae_model_path download_path/vit-l-20.safetensors \
--noise_steps 100 \
--output_path your_video.mp4 \
--start_frame images/start_image_1.jpg

Enable action conditioning, by default all the actions will be pressing the key W to go forward. You should use the dit_action.safetensors model:

python3 generate.py \
--total-frames 32 \
--dit_model_path download_path/dit_action.safetensors \
--vae_model_path download_path/vit-l-20.safetensors \
--noise_steps 100 \
--output_path your_video_action_conditioning.mp4 \
--use_actions

Training your own model

The 📊 Full training dataset with 1,2M driving sequences is available in Iker/GTAV-Driving-Dataset.

In order to train your own model you first need to create a configuration file. See configs/train_dit_actions.yaml for an example of a training config with action conditioning and configs/train_dit.yaml for an example of a training config withour action conditioning.

Config params

Most of the params in the config files are self-explanatory. You can choose between dataset_type: hfdataset and hfdataset: webdataset. hfdataset is the most stable and faster setting, but it will download the entire dataset into your disk (~130GB) and then it will load the dataset into your RAM. So it required A LOT OF RAM. webdataset will stream the dataset from the hfrepo so you will only store ~6gb chunks into RAM at a time. It is more efficent more unestable and you might get connection errors.

The training will store the latest checkpoint in the output folder, if you set resume_from_checkpoint: true if a checkpoint exists, we will restore the training (optimizer, step, scheduler, dataset, etc...) from the checkpoint.

You can run the training with the following command, it will use as many GPUs as available (Data Parallelism):

accelerate launch --mixed_precision bf16 train_dit.py configs/train_dit_actions.yaml

See train_scripts/ for a slurm example to launch the training runs.

Results

View generated samples:

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
configs		configs
images		images
model		model
train_scripts		train_scripts
videos		videos
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
dummy_dataset.py		dummy_dataset.py
generate.py		generate.py
hf_dataset.py		hf_dataset.py
train_dit.py		train_dit.py
utils.py		utils.py
web_dataset.py		web_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Generated GTAV

Architecture

Features

Requirements

Running Inference

Training your own model

Config params

Results

About

Releases

Packages

Languages

License

ikergarcia1996/AI-Generated-GTAV

Folders and files

Latest commit

History

Repository files navigation

AI Generated GTAV

Architecture

Features

Requirements

Running Inference

Training your own model

Config params

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages