This repository contains tools to address fairness issues in classification problems.
Authors: Kirill Myasoedov, Simona Nitti, Bekarys Nurtay (bekiichone), Ksenia Osipova, and Gabriel Rozzonelli.
The module contains the following:
-
A few
classifiers
for a fairer approach to classification problems:Classifier Related paper AdaFairClassifier
AdaFair: Cumulative Fairness Adaptive Boosting by Iosifidis et al. AdaptiveWeightsClassifier
Adaptive Sensitive Reweighting to Mitigate Bias in Fairness-aware Classification by Krasanakis et al. SMOTEBoostClassifier
SMOTEBoost: Improving Prediction of the Minority Class in Boosting by Chawla et al. -
Some
metrics
to help assessing fairness:- DFPR, DFNR, Eq.Odds
- p-rule
- Sensitive TPR and TNR
-
Some popular
datasets
to run experiments and play around: -
A couple of
utils
functions to ease possible preprocessing steps.
In order to run the provided modules, the following packages are needed:
numpy==1.19.5
pandas==1.1.5
scikit-learn==0.24.1
git clone https://github.com/rozzong/Fairness-Aware-Classification.git
The module datasets
contains some already preprocessed popular datasets for imbalanced classification problems leading to fairness issues.
from sklearn.model_selection import train_test_split
from fairness_aware_classification.datasets import COMPASDataset
# Load the data
data = COMPASDataset()
# Split the data
X_train, X_test, y_train, y_test, s_train, s_test = train_test_split(
data.X,
data.y,
data.sensitive
)
In addition to the usual samples and targets, some classifiers require a mask containing information about sensitive samples as input. This mask can be retrieved with accessing data.sensitive
.
For custom datasets, utils
comes with a couple of functions to generate sensitive masks.
import pandas as pd
from fairness_aware_classification.utils import sensitive_mask_from_features
# Load the data
df = pd.read_csv("my_dataset.csv")
# Set the target and do some feature selection
y = df.pop("target")
X = df.drop(["useless_feature_1"], axis=1)
# Compute the sensitive samples mask
sensitive_features = ["gender"]
sensitive_values = [0]
sensitive = sensitive_mask_from_features(X, sensitive_features, sensitive_values)
Classifiers from the module are meant to be used in a scikit-learn
fashion. Some functions contained in metrics
can be useful to define fairness-oriented objective functions.
from sklearn.metrics import accuracy_score
from fairness_aware_classification.metrics import dfpr_score, dfnr_score
from fairness_aware_classification.classifiers import AdaptiveWeightsClassifier
# The criterion function `objective` should be customized
# depending on the data. It should be maximized.
def objective(y_true, y_pred, sensitive):
acc = accuracy_score(y_true, y_pred)
dfpr = dfpr_score(y_true, y_pred, sensitive)
dfnr = dfnr_score(y_true, y_pred, sensitive)
return 2 * acc - abs(dfpr) - abs(dfnr)
base_clf = LogisticRegression(solver="liblinear")
awc = AdaptiveWeightsClassifier(base_clf, objective)
awc.fit(X_train, y_train, s_train)
y_pred = awc.predict(X_test)
For each provided toy dataset, its suggested objective function is accessible with data.objective
.
In main.ipynb
, the implemented classifiers are compared with a simple original AdaBoost classifier. The results of these runs on the four provided datasets are presented below.
Adult Census Income | Bank marketing |
---|---|
COMPAS | KDD Census Income |
---|---|