How to cite this repository?
@article{SouzaChallenges:2020, title={Challenges in Benchmarking Stream Learning Algorithms with Real-world Data}, author={Souza, V. M. A. and Reis, D. M. and Maletzke, A. G. and Batista, G. E. A. P. A.}, journal={Data Mining and Knowledge Discovery}, pages={1805-1858}, volume={34}, year={2020}, doi={10.1007/s10618-020-00698-5}}Paper hereThis repository is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to:
Share — copy and redistribute the material in any medium or format;
Adapt — remix, transform, and build upon the material for any purpose, including research and commercial use.
Under the following terms:
Note: Some datasets included in this repository originate from external sources and may be subject to their own licenses and usage restrictions.
How to donate a new dataset for this repository?
If you want to donate a dataset with concept drift from a real-world streaming problem, please contact Vinicius Souza (vinicius.mourao at pucpr.br) or Gustavo Batista (g.batista at unsw.edu.au) answering the following questions:
Where can we get the data in ARFF/CSV format?
Is there a publication related to the data?
Can you describe the classification problem related to the data, and why is it a streaming problem?
Can you explain the order of the examples and the interval between observations?
Can you describe the class labels?
Do you know the type of drift (incremental, gradual, abrupt, reoccurring) which occur in the dataset and when they occur?
- Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and electronics in agriculture 24(3):131–151- Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. URL http://archive.ics.edu/ml- Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. ieee transactions on knowledge and data engineering 25(10):2283–2301- Harries M (1999) Splice-2 comparative evaluation: Electricity pricing. Tech. Rep. 1, University of New South Wales, Sydney, Australia- Ikonomovska E, Gama J, Dveroski S (2011) Learning model trees from evolving data streams. Data mining and knowledge discovery 23(1):128–168- Katakis I, Tsoumakas G, Banos E, Bassiliades N, Vlahavas I (2009) An adaptive personalized news dissemination system. Journal of Intelligent Information Systems 32(2):191–212- Losing V, Hammer B, Wersing H (2015) Interactive online learning for obstacle classification on a mobile robot. In: International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8- Losing V, Hammer B, Wersing H (2016) Knn classifier with self adjusting memory for heterogeneous concept drift. In: IEEE International Conference on Data Mining (ICDM), IEEE, pp 291–300- Reis, D. M., Flach, P., Matwin, S., & Batista, G. (2016). Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1545-1554.- Souza VMA, Silva DF, Gama J, Batista GEAPA (2015) Data stream classification guided by clustering on nonstationary environments and extreme verification latency. In: Proceedings of the 2015 SIAM International Conference on Data Mining, SIAM, pp 873–881- Souza, V. M. A. (2018). Asphalt pavement classification using smartphone accelerometer and complexity invariant distance. Engineering Applications of Artificial Intelligence, 74, 198-211.- Souza VMA, Reis DM, Maletzke AG, Batista GEAPA (2020) Challenges on Benchmarking Stream Classifiers and Drift Detectors with Real-world Evolving Data. Data Mining and Knowledge Discovery, 1-54- Souza, V. M., Parmezan, A. R., Chowdhury, F. A., & Mueen, A. (2021). Efficient unsupervised drift detector for fast and high-dimensional data streams. Knowledge and Information Systems, 63(6), 1497-1527.- Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: IEEE Symposium on computational Intelligence for Security and Defense Applications, IEEE, pp 1–6- Ulanova, L., Begum, N., Shokoohi-Yekta, M., & Keogh, E. (2016). Clustering in the face of fast changing streams. In Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 1-9. - Vergara A, Vembu S, Ayhan T, Ryan MA, Homer ML, Huerta R (2012) Chemical gas sensor drift compensation using classifier ensembles. Sensors and Actuators B: Chemical 166:320–329- Zhu X (2010) Stream data mining repository. URL www.cse.fau.edu/~xqzhu/stream.html- Zliobaite I (2011) Combining similarity in time and space for training set formation under concept drift. Intelligent Data Analysis 15(4):589–611