Classifiers & Drift Detection Methods

Abstract

In this page, we make available data stream ensemble classifiers designed to cope with concept drifts (Learn++.NSE, Dynamic Weighted Majority, Ensemble Building, and RCD), concept drift detectors (Paired Learners, ECDD, and PHT) and data sets (Sine and Mixed). Their parameters and corresponding papers are described below.

Documentation

To use this extension, you need to download moa.jar and sizeofag.jar, available at MOA framework website. Then, add the JAR files below in the classpath when launching MOA. For example, in Linux:

java -cp EnsembleClassifiers.jar:moa.jar:weka.jar -javaagent:sizeofag.jar moa.gui.GUI

The JAR file basically have the class files that implement each classifier. Another option is to decompress moa.jar, include the source files in the moa.classifiers package and recompile MOA.

Ensemble Classifiers

RCD

Recurring concept drifts (RCD) is a framework developed to deal with contexts that reoccur. After the identification of a concept drift by a drift detection method, it uses a non-parametric multivariate statistical tests to check if the context is new or an old one that is occurring again. The parameters used in RCD are the following:

  • -l: Base learner.
  • -b: Buffer size. It is a sample of actual and stored contexts and are used by the statistical tests to identify reoccurring context.
  • -t: Test frequency. In the testing phase, it is the rate the statistical tests are performed to maintain classifier actual concerning present context.
  • -d: Drift detection method to use.
  • -a: Statistical test to be used.
  • -s: The minimum percentual similarity between distributions (p-value).
  • -c: The maximum amount of classifiers to store.
  • -m: The thread pool size, indicating how many simultaneous tests are allowed.

References

Ensemble Building

Classifier better suited to handle recurring concept drifts. Its parameters are:
  • -l: Base learner.
  • -e: Permitted error.
  • -a: Acceptance factor.
  • -c: Chunk size.
  • -r: The maximum number of classifiers to store and choose from when creating an ensemble.
  • -n: The maximum number of classifier in an ensemble.

References

  • Sasthakumar Ramamurthy and Raj Bhatnagar. Tracking Recurrent Concept Drift in Streaming data using Ensemble Classifiers. In Sixth International Conference on Machine Learning and Applications, pp. 404-409, 2007. URL http://dx.doi.org/10.1109/ICMLA.2007.80.

Dynamic Weighted Majority (Already included in MOA)

The parameters available for this classifier are the ones indicated in the referenced papers:

  • -l: Base learner.
  • -p: Period between expert removal, creation, and weight update.
  • -b: Factor to punish mistakes of classifiers.
  • -g: Minimum fraction of weight per classifier.

References

  • Jeremy Zico Kolter and Marcus A. Maloof. Using additive expert ensembles to cope with concept drift. In Proceedings of the 22nd International Conference on Machine Learning, ICML '05, pages 449-456, New York, NY, USA, 2005. ACM. ISBN 1-59593-180-5. URL http://doi.acm.org/10.1145/1102351.1102408
  • Jeremy Zico Kolter and Marcus A. Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research, 8:2755-2790, December 2007. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1314498.1390333

Learn++.NSE (Already included in MOA)

The parameters available for this classifier are the ones indicated in the referenced papers:
  • -l: Base learner.
  • -p: Size of the environments. After how many examples a new classifier will be created.
  • -a: Slope of the sigmoid function controlling the number of previous periods taken into account during weighting.
  • -b: Halfway crossing point of the sigmoid function controlling the number of previous periods taken into account during weighting.
  • -s: Classifiers pruning strategy to be used (NO: no pruning, AGE: age-based, ERROR: error-based).
  • -e: Ensemble maximum size.

References

  • Matthew Karnick, Metin Ahiskali, Michael D. Muhlbaier, and Robi Polikar. Learning concept drift in nonstationary environments using an ensemble of classifiers based ap-proach. In IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), IJCNN '08, pages 3455-3462, June 2008a. URL http://dx.doi.org/10.1109/IJCNN.2008.4634290
  • Matthew Karnick, Michael D. Muhlbaier, and Robi Polikar. Incremental learning in non-stationary environments with concept drift using a multiple classifier based approach. In 19th International Conference on Pattern Recognition, ICPR '08, pages 1-4, December 2008b. URL http://dx.doi.org/10.1109/ICPR.2008.4761062
  • Michael Muhlbaier and Robi Polikar. An ensemble approach for incremental learning in nonstationary environments. In Michal Haindl, Josef Kittler, and Fabio Roli, editors, Multiple Classifier Systems, volume 4472 of Lecture Notes in Computer Science, pages 490-500. Springer Berlin / Heidelberg, 2007. ISBN 978-3-540-72481-0. URL http://dx.doi.org/10.1007/978-3-540-72523-7_49
  • Ryan Elwell and Robi Polikar. Incremental learning of concept drift in non-stationary environments. IEEE Transactions on Neural Networks,  22(10):1517-1531, October 2011. ISSN 1045-9227. URL http://dx.doi.org/10.1109/TNN.2011.2160459
  • R. Elwell and R. Polikar. Incremental learning in nonstationary environments with con-trolled forgetting. In IEEE International Joint Conference on Neural Networks, IJCNN '09, pages 771-778, Los Alamitos, CA, USA, June 2009b. IEEE Computer Society. URL http://dx.doi.org/10.1109/IJCNN.2009.5178779

Concept Drift Detectors

VDDM

Virtual drift detection method that uses multivariate non-parametric statistical tests. Its parameters are:

  • -t: Statistical test to used.
  • -s: Window size.
  • -w: Warning threshold.
  • -d: Change level.

PPDM

Drift detection method to identify a change in the prior probabilities of the classes. It uses multivariate non-parametric statistical tests. Its parameters are:

  • -t: Statistical test to used.
  • -s: Window size.
  • -w: Warning threshold.
  • -d: Change level.

ECDD (Already included in MOA)

EWMA for Concept Drift Detection (ECDD) is a drift detector which uses an exponentially weighted moving average (EWMA) chart to monitor the misclassification rate of an streaming classifier. It can be used like DDM and EDDM in the SingleClassifierDrift class. Its parameters are:

  • -a: The average run length. Informs the rate of false positive alarms per data points.
  • -m: Controls how much weight is given to more recent data compared to older data. Smaller values mean less weight given to recent data.
  • -w: Warning threshold.

References

  • Gordon J. Ross, Niall M. Adams, Dimitris K. Tasoulis and David J. Hand. Exponentially weighted moving average charts for detecting concept drift. Pattern Recognition Letters, 33, pages 191-198, 2012. Elsevier. URL http://dx.doi.org/10.1016/j.patrec.2011.08.019

Paired Learners (Already included in MOA)

A classifier and drift detector. Creates two classifiers: a stable and a reactive. The first one is responsible to represent  the actual stable concept, while the reactive one is trained on the most recent data. If the  accuracy of the reactive is higher than the stable, it means that the concept has changed.  The stable classifier is substituted by the reactive, and the reactive is reseted. Its parameters are:

  • -s: Stable learner.
  • -r: Reactive learner.
  • -w: Window size for the reactive learner.
  • -t: Threashold for creating a new stable learner.

References

PHT (Already included in MOA)

The Page-Hinkley test (PHT) is a sequential analysis technique typically used for monitoring change detection in the average of a Gaussian signal. It can be used like DDM and EDDM in the SingleClassifierDrift class. Its parameters are:

  • -d: Detection threshold.
  • -w: Warning threshold.
  • -m: Magnitude threshold.

References

DoF

The DoF method detects drifts by processing data chunk by chunk, computing the nearest neighbor in the previous batch for each instance in the current batch and comparing their corresponding labels. A distance map is created, associating the index of the instance in the previous batch and the label computed by the nearest neighbor. A metric named degree of drift is computed based on the distance map. The average and standard deviation of all degrees of drift are computed and, if the current value is away from the average more than s standard deviations, a concept drift is raised. Its parameters are:

  • -w: Window size of each data chunk.
  • -s: Number of standard deviations to detect drifts.

References

STEPD (Already included in MOA)

STEPD computes the accuracy of the base learner in the W most recent instances and compares it to its overall accuracy from the beginning of the learning process. Its parameters are:

  • -d: Significance level for drift.
  • -m: Significance level for warning.

References

Artificial data streams

Gaussian

This data set was described in Sobolewski and Wozniak (2013). It is used to simulate a virtual concept drift, by generating attributes based on gaussian data and concept drifts by changing the mean by 5.0. Its parameters are:
  • -i: Seed for random generation of instances.
  • -n: Number of attributes to be generated.
  • -d: Number of attributes with concept drift.
  • -c: Class to be associated with each instance.

References

  • P. Sobolewski and M. Wozniak, Comparable Study of Statistical Tests for Virtual Concept Drift Detection. Heidelberg: Springer International Publishing, 2013, pp. 329–337. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-00969-8_32

Circles

This data set simulates a virtual concept drift by changing the position of four circles. Its parameters are:
  • -i: Seed for random generation of instances.
  • -f: Function that describes the position of the circles.
  • -n: Percentage of noise.
  • -s: Reduce the data to only contain 2 relevant numeric attributes.
  • -b: Number of irrelevant attributes.

Sine (Already included in MOA)

This data set can be used to create the four versions of Sine presented in Gama et al. (2004), two versions in Baena-García et al. (2006). Its parameters are based on the papers that used this data set:
  • -i: Seed for random generation of instances.
  • -f: Classification function used (1 to 4). One (1) is the reversal of two (2), and three (3) is the reversal of four (4).
  • -s: Reduce the data to only contain 2 relevant numeric attributes. Otherwise, two irrelevant attributes are created.
  • -b: Balance the number of instances of each class.

References

  • João Gama, Pedro Medas, Gladys Castillo and Pedro Pereira Rodrigues. Learning with Drift Detection. In Bazzan, Ana L. C. and Labidi, Sofiane, editors, Advances in Artificial Intelligence -“ SBIA 2004, volume 3171 of Lecture Notes in Computer Science, pages 286-295. Springer Berlin / Heidelberg, 2004. ISBN 978-3-540-23237-7. URL http://dx.doi.org/10.1007/978-3-540-28645-5_29.
  • Manuel Baena-García, José del Campo-Ávila, Raul Fidalgo, Albert Bifet, Ricard Gavaldà and Rafael Morales-Bueno. In: ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, 18 Set 2006, Berlin, Germany. URL http://eprints.pascal-network.org/archive/00002509/

Mixed (Already included in MOA)

This data set can be used to create the versions presented in Gama et al. (2004) and Baena-García et al. (2006). Its parameters are based on the papers that used this data set:
  • -i: Seed for random generation of instances.
  • -f: Classification function used (1 and 2), where one (1) is the reversal of two (2).
  • -b: Balance the number of instances of each class.

References

  • João Gama, Pedro Medas, Gladys Castillo and Pedro Pereira Rodrigues. Learning with Drift Detection. In Bazzan, Ana L. C. and Labidi, Sofiane, editors, Advances in Artificial Intelligence -“ SBIA 2004, volume 3171 of Lecture Notes in Computer Science, pages 286-295. Springer Berlin / Heidelberg, 2004. ISBN 978-3-540-23237-7. URL http://dx.doi.org/10.1007/978-3-540-28645-5_29.
  • Manuel Baena-García, José del Campo-Ávila, Raul Fidalgo, Albert Bifet, Ricard Gavaldà and Rafael Morales-Bueno. In: ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, 18 Set 2006, Berlin, Germany. URL http://eprints.pascal-network.org/archive/00002509/

Contact

Comments, suggestions, enhancements, corrections are highly appreciated. [paulomgj at gmail dot com]

ċ
EnsembleClassifiers.jar
(313k)
Paulo Gonçalves,
Mar 29, 2017, 12:08 PM