Trang chủ‎ > ‎IT‎ > ‎Data Science - Python‎ > ‎

imbalanced-learn: An extension of scikit-learn to handle imbalanced data problems

Read more here:

Under-sampling methods

The imblearn.under_sampling provides methods to under-sample a dataset.


under_sampling.ClusterCentroids([ratio, ...])Perform under-sampling by generating
centroids based on clustering methods.
under_sampling.CondensedNearestNeighbour([...])Class to perform under-sampling based
on the condensed nearest neighbour method.
under_sampling.EditedNearestNeighbours([...])Class to perform under-sampling based
on the edited nearest neighbour method.
under_sampling.RepeatedEditedNearestNeighbours([...])Class to perform under-sampling based on the repeated edited nearest neighbour method.
under_sampling.AllKNN([return_indices, ...])Class to perform under-sampling based on the AllKNN method.
under_sampling.InstanceHardnessThreshold([...])Class to perform under-sampling based on the instance hardness threshold.
under_sampling.NearMiss([ratio, ...])Class to perform under-sampling based on NearMiss methods.
under_sampling.NeighbourhoodCleaningRule([...])Class performing under-sampling based on the neighbourhood cleaning rule.
under_sampling.OneSidedSelection([...])Class to perform under-sampling based on one-sided selection method.
under_sampling.RandomUnderSampler([ratio, ...])Class to perform random under-sampling.
under_sampling.TomekLinks([return_indices, ...])Class to perform under-sampling by removing Tomek’s links.

Over-sampling methods

The imblearn.over_sampling provides a set of method to perform over-sampling.


over_sampling.ADASYN([ratio, random_state, ...])Perform over-sampling using ADASYN.
over_sampling.RandomOverSampler([ratio, ...])Class to perform random over-sampling.
over_sampling.SMOTE([ratio, random_state, ...])Class to perform over-sampling using SMOTE.

Combination of over- and under-sampling methods

The imblearn.combine provides methods which combine over-sampling and under-sampling.


combine.SMOTEENN([ratio, random_state, ...])Class to perform over-sampling using SMOTE and cleaning using ENN.
combine.SMOTETomek([ratio, random_state, ...])Class to perform over-sampling using SMOTE and cleaning using Tomek links.

Ensemble methods

The imblearn.ensemble module include methods generating under-sampled subsets combined inside an ensemble.


ensemble.BalanceCascade([ratio, ...])Create an ensemble of balanced sets by iteratively
under-sampling the imbalanced dataset using an estimator.
ensemble.EasyEnsemble([ratio, ...])Create an ensemble sets by iteratively applying
random under-sampling.


The imblearn.pipeline module implements utilities to build a composite estimator, as a chain of transforms, samples and estimators.


pipeline.Pipeline(steps)Pipeline of transforms and resamples with a final estimator.


pipeline.make_pipeline(*steps)Construct a Pipeline from the given estimators.


The imblearn.metrics module includes score functions, performance metrics and pairwise metrics and distance computations.


metrics.sensitivity_specificity_support(...)Compute sensitivity, specificity, and support for each class
metrics.sensitivity_score(y_true, y_pred[, ...])Compute the sensitivity
metrics.specificity_score(y_true, y_pred[, ...])Compute the specificity
metrics.geometric_mean_score(y_true, y_pred)Compute the geometric mean
metrics.make_index_balanced_accuracy([...])Balance any scoring function using the index balanced accuracy


The imblearn.datasets provides methods to generate imbalanced data.


datasets.make_imbalance(X, y, ratio[, ...])Turns a dataset into an imbalanced dataset at specific ratio.


The imblearn.utils module includes various utilities.


utils.estimator_checks.check_estimator(Estimator)Check if estimator adheres to scikit-learn
conventions and