Scikit-transformers : Scikit-learn + Custom transformers

About

scikit-transformers is a very usefull package to enable and provide custom transformers such as LogColumnTransformer, BoolColumnTransformers and others fancy transformers.

It was created to provide a simple way to use custom transformers in scikit-learn pipelines, and allow to use them in a scikit-learn model, using GridSearchCV for testing and tuning hyperparameters.

The starting point was to provide a simple LogColumnTransformer, which is a simple wrapper around the numpy log function, making possible to use a skew threshold to apply the log transformation only on columns with a skew superior to a given threshold.

With scikit-transformers, it is now possible to use this LogColumnTransformer in transformer in a GridSearchCV using a skew threshold as hyperparameter to find what columns are good to log or not.

LogColumnTransformer is one of the many transformers implemented in scikit-transformers.

Installation

Using regular pip and venv tools :

python3 -m venv .venv
source .venv/bin/activate
pip install scikit-transformers

Usage

For a very basic usage :

import pandas as pd

from sktransf.trasnformer import LogColumnTransformer

df = pd.DataFrame(
    { "a": range(10),
      "b": range(10)
    }
)

logger = LogColumnTransformer()
logger.fit_transform(df)
df_transf = logger.transform(df)

Using common transformers :

import pandas as pd

from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector

df = pd.DataFrame(
    { "a": range(10),
      "b": range(10)
    }
)

df_bool = BoolColumnTransformer().fit_transform(df)
df_unique = DropUniqueColumnTransformer().fit_transform(df)
df_logged = LogColumnTransformer().fit_transform(df)

Using a pipeline with a scikit-learn model :

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector

pipe = Pipeline([
    ('bool', BoolColumnTransformer()),
    ('unique', DropUniqueColumnTransformer()),
    ('log', LogColumnTransformer()),
    ('model', LinearRegression())
])

X = pd.DataFrame(
    { "a": range(10),
      "b": range(10)
    }
)

y = range(10)

pipe.fit(X, y)

y_pred = pipe.predict(X)

Documentation

For more specific information, please refer to the notebooks:

Transformers :
- LogColumnTransformer notebook
- BoolColumnTransformer notebook
Selectors :
- DropUniqueColumnSelector notebook
- DropSkuColumnSelector notebook
Pipelines :
- Pipelines notebook

A complete documentation is be available on the github page.

Changelog, Releases and Roadmap

Please refer to the changelog page for more information.

Contributing

Pull requests are welcome.

For major changes, please open an issue first to discuss what you would like to change.

For more information, please refer to the contributing page.

License

GPLv3

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github		.github
.utils		.utils
docs		docs
sktransf		sktransf
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scikit-transformers : Scikit-learn + Custom transformers

About

Installation

Usage

Documentation

Changelog, Releases and Roadmap

Contributing

License

About

Releases 7

Packages

Contributors 2

Languages

License

AlexandreGazagnes/scikit-transformers

Folders and files

Latest commit

History

Repository files navigation

Scikit-transformers : Scikit-learn + Custom transformers

About

Installation

Usage

Documentation

Changelog, Releases and Roadmap

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 2

Languages

Packages