ECON_QAE_Training

Overview

This repository provides code for training and evaluating Quantized Autoencoders (QAE) as part of the ECON project. The code is organized into various scripts for data processing, model training, and evaluation.

Setup

To set up the environment, create and activate the Conda environment using the provided YAML file:

conda env create -f environment.yml
conda activate econ_qae

CAE Description

The Conditional Autoencoder (CAE) consists of a quantized encoder and an unquantized decoder, with additional conditioning in the latent space for known wafer information. Specifically, for HGCAL wafer encoding, the following conditional variables are used:

eta
waferu
waferv
wafertype (one-hot encoded into 3 possible types)
sumCALQ
layers

Altogether, these 8 conditional variables are concatenated with a 16D latent code, resulting in a 24D input to the decoder.

Training is performed via train_CAE_simon_data.py.
CMSSW integration is handled by preprocess_CMSSW.py, which slightly modifies how conditioning is applied (without affecting model performance) to ensure CMSSW compatibility.

Generating the Dataset

Use the process_data.py script to generate or preprocess the dataset. Below is an example command:

python process_data.py --opath test_data_saving --num_files 2 --model_per_eLink --biased 0.90 --save_every_n_files 1 --alloc_geom old

Arguments:

--opath: Output directory for saved data.
--num_files: Number of ntuples to preprocess.
--model_per_eLink: Trains a unique CAE per possible eLink allocation.
--model_per_bit_config: Trains a unique CAE per possible bit allocation.
--biased: Resamples the dataset so that n% of the data is signal and (1-n)% is background (specify n as a float).
--save_every_n_files: Number of ntuples to combine per preprocessed output file.
--alloc_geom: The allocation geometry (old, new).

Training the Model

Use the train_ECON_AE_CAE.py script to train the model. The train_ECON_AE_CAE.py script automatically runs preprocess_CMSSW.py which generates the necessary files to run the trained CAE in CMSSW. Below is an example command:

python train_ECON_AE_CAE.py --opath test_new_run --mname test --model_per_eLink --alloc_geom old --data_path test_data_saving --loss tele --optim lion --lr 1e-4 --lr_sched cos --train_dataset_size 2000 --test_dataset_size 1000 --val_dataset_size 1000 --batchsize 128 --num_files 1 --nepochs 10

Arguments:

--opath: Output directory for the training run.
--mname: Model name.
--model_per_eLink: Trains a unique CAE per possible eLink allocation.
--model_per_bit_config: Trains a unique CAE per possible bit allocation.
--alloc_geom: Allocation geometry (old, new).
--data_path: Path to the preprocessed dataset.
--loss: Loss function (tele, mse).
--optim: Optimizer (lion, adam).
--lr: Learning rate.
--lr_sched: Learning rate scheduler (cos, cos_warm_restarts).
--train_dataset_size: Number of samples in the training dataset.
--test_dataset_size: Number of samples in the test dataset.
--val_dataset_size: Number of samples in the validation dataset.
--batchsize: Training batch size.
--num_files: Number of preprocessed files to use.
--nepochs: Number of training epochs.

File Descriptions

gen_latent_samples.py: Generates latent samples for analysis.
process_data.py: Processes raw data and creates datasets.
train_ECON_AE_CAE.py: Trains the ECON Conditional Autoencoder models.
preprocess_CMSSW.py: Preprocesses CAE models for CMSSW.
utils/graph.py: Utility functions for graph operations.
utils/utils.py: General utility functions.
utils/telescope.py: Telescope loss function.
utils/files.py: File I/O helper functions.

Additional Information

For more details on each script and its usage, please refer to inline comments and docstrings within the code files. If you encounter any issues, feel free to open an issue or submit a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECON_QAE_Training

Overview

Setup

CAE Description

Generating the Dataset

Training the Model

File Descriptions

Additional Information

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
gen_latent_samples.py		gen_latent_samples.py
geometry_hgcal.txt		geometry_hgcal.txt
preprocess_CMSSW.py		preprocess_CMSSW.py
process_data.py		process_data.py
train_ECON_AE_CAE.py		train_ECON_AE_CAE.py

nswood/ECON_AE_Training

Folders and files

Latest commit

History

Repository files navigation

ECON_QAE_Training

Overview

Setup

CAE Description

Generating the Dataset

Training the Model

File Descriptions

Additional Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages