Auriau Vincent, Belahcene Khaled, Mousseau Vincent
- Install git-lfs , it will be needed to download the data
- Clone the repository preferably using ssh
- Make sure that git-lfs downloaded the files in data/. With the command
du -sh *
the files use several Mo of memory.
The command
conda env create -f config/env.yml
conda activate cs_td
python evaluation.py
will be used for evaluation, with two other test datasets. Make sure that it works well.
You can find the first dataset in data/dataset_4. It contains three files: X.npy, Y.npy and Z.npy. They are organised so that
The second dataset needs to be downloaded through the choice-learn package. This notebook provides a few indications.
You are asked to:
- Write a Mixed-Integer Progamming model that would solve both the clustering and learning of a UTA model on each cluster
- Code this MIP inside the TwoClusterMIP class in python/model.py. It should work on the dataset_4 dataset.
- Explain and code a heuristic model that can work on the cars dataset. It should be done inside the HeuristicModel class.
You will present your results during an oral presentation organized the on Tuesday
- A report summarizing you results as well as your thought process or even non-working models if you consider it to be interesting.
- Your solution of the first assignement should be clearly written in this report. For clarity, you should clearly state variables, constraints and objective of the MIP.
- A well organized git repository with all the Python code of the presented results. A GitHub fork of the repository is preferred. Add some documentation directly in the code or in the report for better understanding. The code must be easily run for testing purposes.
- In particular the repository should contain your solutions in the class TwoClustersMIP and HeuristicModel in the models.py file. If you use additional libraries, add them inside the config/env.yml file. The command 'python evaluation.py' will be used to check your models, be sure that it works and that your code complies with it. The dataset used will be a new one, with the same standards as 'dataset_4'.