Ardennes

Ardennes is an Estimation of Distribution Algorithm for performing decision-tree induction, as presented in the paper

CAGNINI, Henry E. L; BARROS, R. C; BASGALUPP, M. P. Estimation of Distribution Algorithms for Decision-Tree Induction. IEEE Congress on Evolutionary Computation (IEEE CEC 2017), San Sebastián, Spain, June 5-8, 2017.

Citation

If you find this code useful in your work, please cite it:

@inproceedings{cagnini2017ardennes,
  author    = {Henry E. L. Cagnini and
               Rodrigo C. Barros and
               M\'{a}rcio P. Basgalupp},
  title     = {{Estimation of Distribution Algorithms for Decision-Tree Induction}},
  booktitle = {{IEEE} Congress on Evolutionary Computation, {CEC} 2017, San Sebastián, Spain, June 5-8, 2017},
  year      = {2017}
}

Capabilities

Datasets with numerical predictive attributes;
Categorical class attributes;
Multiclass and binary problems;
More types of datasets will be added in next updates.

Limitations

This algorithm will only work:

for datasets with class as the last attribute;
for datasets with numerical predictive attributes, and categorical class attribute;
only binary splits;
Tested only on Ubuntu 16.04, but will probably work in any other SO once you figure out the corresponding libraries described in Installation.

Installation

Essential:

pip install networkx liac-arff numpy scikit-learn pandas scipy

For plotting trees and interpreting graphical models:

sudo apt-get install graphviz libgraphviz-dev pkg-config
pip install pygraphviz matplotlib plotly

For running j48 inside python:

sudo apt-get install default-jre default-jdk
pip install additional_packages/python-weka-wrapper-0.3.9.tar.gz

For parallel processing - greatly increases performance:

sudo apt-get install libffi-dev g++
sudo apt-get install ocl-icd-opencl-dev
pip install mako

Then follow instructions from https://wiki.tiker.net/PyOpenCL/Installation/Linux, or optionally:

NOTICE: If you use a virtual environment, you must activate it before running the following commands.

tar xfz additional_packages/pyopencl-2016.2.1.tar.gz
cd pyopencl-2016.2.1
python configure.py
sudo su -c "make install"

And you're done!

First steps

Your starting point should be by taking a look at the code located at the main.py script. Once you figure out what it does (it is fairly simple to understand), you can call it from terminal:

python main.py

The expected output should be something like this:

NOTICE: Using single-threaded CPU as device.
training ardennes for dataset liver-disorders
iter: 000 mean: 0.690761 median: 0.688406 max: 0.818841 ET: 22.49sec  height:  9  n_nodes: 45  test acc: 0.536232
iter: 001 mean: 0.674928 median: 0.692029 max: 0.818841 ET:  4.09sec  height:  9  n_nodes: 45  test acc: 0.536232
...
iter: 099 mean: 0.730978 median: 0.789855 max: 0.818841 ET: 2.68sec  height:  9  n_nodes: 27  test acc: 0.637681
Test acc: 0.64 Height: 9 n_nodes: 27 Time: 342.39 secs

The first line (NOTICE: Using single-threaded CPU as device.) denotes which processor you are using to compute the splitting criterion and individual's fitness. Currently there are two possible processors: OpenCL and single-threaded CPU, which is obviously slower.
The second line brings information about the dataset over which Ardennes is training.
The rest of the output is explained as follows:
- iter: current iteration/generation
- mean: mean training accuracy in the current population
- median: median training accuracy in the current population
- max: maximum training accuracy in the current population
- ET: estimated time that this generation took to process
- height: height of the best individual in the current population
- n_nodes: number of nodes of the best individual in the current population
- test acc: test accuracy of the best individual in the current population. This information is not used during the evolutionary process; it is only displayed for clarity purposes, and is available if you pass a test set to Ardennes.

Structure of the code

config.json: Where you will input the algorithm parameters, such as number of individuals, number of iterations, decile and maximum tree height.
main.py: starting point for running the algorithm.
evaluate.py: the module which is called from main.py. It has several functions which perform holdout, cross-validation and such operations.
treelib: directory for the main Ardennes code.

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
.idea		.idea
__extensions__		__extensions__
additional_packages		additional_packages
datasets		datasets
pgmpy_test		pgmpy_test
preprocessing		preprocessing
treelib		treelib
utils		utils
.gitignore		.gitignore
README.md		README.md
config.json		config.json
evaluate.py		evaluate.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ardennes

Citation

Capabilities

Limitations

Installation

First steps

Structure of the code

About

Releases

Packages

Languages

henryzord/ardennes

Folders and files

Latest commit

History

Repository files navigation

Ardennes

Citation

Capabilities

Limitations

Installation

First steps

Structure of the code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages