Removing Biases from Molecular Representations via Information Maximization

The repository contains the code for the InfoCORE method presented in the paper Removing Biases from Molecular Representations via Information Maximization, Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola (2023). If you have any question, feel free to open an issue or reach out via email: [email protected].

Set up the environment

conda create -n infocore python=3.10.9
conda activate infocore
bash env.sh

Drug representation experiments

The data used in the drug representation experiments and pretrained models can be downloaded from https://www.dropbox.com/scl/fo/bcj0jf5gacgiapwl33f37/h?rlkey=8eh3v1ilm6r2scgh3mn7syci5&dl=0.

Gene expression dataset (GE)

Data preparation

cd GE
mkdir model_save

Then add the downloaded data file GE/data into the directory GE.

Model training

The command for InfoCORE training can be found in scripts/script_training.sh.

Model evaluation

The command for the molecule-phenotype retrieval task can be found in scripts/script_evalacc.sh; the command for the property prediction task can be found in scripts/script_finetune.sh.

Cell imaging dataset (CP)

Data preparation

cd CP
mkdir model_save

Then add the downloaded data file CP/data into the directory CP.

Model training

The command for InfoCORE training can be found in scripts/script_training.sh.

Model evaluation

The command for the molecule-phenotype retrieval task can be found in scripts/script_evalacc.sh; the command for the property prediction task can be found in scripts/script_finetune.sh.

Citation

@inproceedings{wang2023removing,
  title={Removing Biases from Molecular Representations via Information Maximization},
  author={Wang, Chenyu and Gupta, Sharut and Uhler, Caroline and Jaakkola, Tommi S},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CP		CP
GE		GE
README.md		README.md
env.sh		env.sh
infocore.png		infocore.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Removing Biases from Molecular Representations via Information Maximization

Set up the environment

Drug representation experiments

Gene expression dataset (GE)

Cell imaging dataset (CP)

Citation

About

Releases

Packages

Languages

uhlerlab/InfoCORE

Folders and files

Latest commit

History

Repository files navigation

Removing Biases from Molecular Representations via Information Maximization

Set up the environment

Drug representation experiments

Gene expression dataset (GE)

Cell imaging dataset (CP)

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages