Skip to content

Analyzing data set to summarize their main characteristics, often with visual methods

Notifications You must be signed in to change notification settings

n-e-e-l/ExploratoryDataAnalysis_Haberman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ExploratoryDataAnalysis_Haberman

Analyzing data set to summarize their main characteristics, often with visual methods.
The dataset used above is available here data.

Goal: To understand correlation between features and class labels

Probabilty Distribution of 'Survival' based on 'Age'

Probabilty Distribution of 'Survival' based on 'Operated Year'

Probabilty Distribution of 'Survival' based on 'axil_nodes'

Box plot 'Age'

Box Plot 'Operated Year'

Box Plot 'axil_nodes'

Violin plot for 'Age'

Violin plot for 'Operated Year'

Violin plot for 'axil_nodes'

Scatter plot explaining relationship between various fields present in data Alt text

Heatmap explaining the correlation between features and class label. Alt text

Conclusion:

Observation: Simple EDA is not conclusive enough to witness any strong correlation between features and class labels.By looking at the graphs generated, all we can say is Class label is not lineraly seprable. Heatmap shows some correation between 'axil_nodes' and 'survival_status'.

One of the discrepancy can be seen in class labels category itself as '1' stands for people who lived for 5 or more years and '2' stands for person died within 5 years of being operated. A better parameter would have been cause of death as the person who lost their lives within 5 years may have some other reason as a contributing factor. Apart from this category '1' specificalls is confusing as person may or may not be alive so there is an uncertainty about the current status of the patients who belong to category '1'.

About

Analyzing data set to summarize their main characteristics, often with visual methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published