DM Project

Project implementation for the course of Data Mining at the University of Pisa 2020-2021. The project consists in the analysis of an unknown customer dataset.
The project was done by Simone Baccile, Lorenzo Simone and Marco Sorrenti.

Introduction

The analysis of the dataset was divided in 4 steps:

Data understanding
Clustering analysis
Predictive analysis
Sequential pattern mining

1. Data understanding

Analysis of the dataset, trying to understand what kind of dataset it is, what are the attributes, distributions of data. In this phase we've also done feature analysis, data cleaning and we've added new features useful to analyze the dataset in the following steps.

2. Clustering analysis

In this step we've tested different clustering algorithm trying to understand better how data are distributed and how different customer can be classified. The algorithm tested are:

K-Means
Hierarchical clustering
DBSCAN
X-Means
G-Means

3. Predictive analysis

During the predictive analysis we've used supervised learning algorithm to classify different kind of customers. Models used in this phase are:

Decision Tree
AdaBoost
Random Forest
K-NN
MLP

4. Sequential Pattern Mining

In this step we've tried to find statistically relevant patterns among data. The idea is to discover the hidden relationships between products and baskets in order to extract discriminatory behaviors. Pattern mining algorithm tested are:

PrefixSpan
GSP
SPMF

More details about the project can be found in DM_Report and in Project Presentation.

Installation

Download the notebooks and create Python environment with Conda:

$ git clone https://github.com/Simoniuss/DM-Project
$ cd DM-Project
$ conda create -n DMProject python=3.8
$ conda activate DMProject
$ pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
11-DM_DataUnderstanding.ipynb		11-DM_DataUnderstanding.ipynb
12-DM_Clustering.ipynb		12-DM_Clustering.ipynb
13-DM_Classification.ipynb		13-DM_Classification.ipynb
14-DM_SequentialPatternMining.ipynb		14-DM_SequentialPatternMining.ipynb
DM_Report.pdf		DM_Report.pdf
Project Presentation.pdf		Project Presentation.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DM Project

Table of Contents

Introduction

1. Data understanding

2. Clustering analysis

3. Predictive analysis

4. Sequential Pattern Mining

Installation

About

Releases

Packages

Languages

Simoniuss/DM-Project

Folders and files

Latest commit

History

Repository files navigation

DM Project

Table of Contents

Introduction

1. Data understanding

2. Clustering analysis

3. Predictive analysis

4. Sequential Pattern Mining

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages