Paper (conference version - 2020 IEEE BigData) Paper (journal version - Knowledge and Information Systems)
How to cite this work?
Conference version:
@inproceedings{IBDD_BigData, title={Unsupervised drift detection on high-speed data streams}, author={Souza, V. M. A. and Chowdhury, F. A. and Mueen, A.}, booktitle={Proceedings of the IEEE International Conference on Big Data (Big Data)}, pages={102--111}, year={2020}, organization={IEEE}}Journal version:
@article{IBDD_KAIS, title={Efficient unsupervised drift detector for fast and high-dimensional data streams}, author={Souza, V. M. A. and Parmezan, A. R. S. and Chowdhury, F. A. and Mueen, A.}, journal={Knowledge and Information Systems}, volume={63}, number={6}, pages={1497--1527}, year={2021}, publisher={Springer}}DATASETS (DOWNLOAD)
Benchmark real
HeartBeats [1,2]
Insects [2]
Posture [3]
StarLightCurves [4]
UWave [4]
Yoga [4]
Benchmark synthetic
Waveform [10]
4CR [11]
UG2C5D [12]
Case studies
Malaria mosquitoes prediction [5]
Twitter data 2016 US Election [6]
Skin lesion classification [7]
Customer prediction by smart meter data
Asphalt quality monitoring [8,9]
CODES:
References
[1] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G.Mark, J. E. Mietus, G. B. Moody, C. Peng, and H. E. Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, 23 (2000), 215–220.[2] L. Ulanova, N. Begum, M. Shokoohi-Yekta, and E. Keogh. 2016. Clustering in the face of fast changing streams. In SDM. 1–9.[3] Boštjan Kaluža, Violeta Mirchevska, Erik Dovgan, Mitja Luštrek, and Matjaž Gams. 2010. An agent-based approach to care in independent living. In AML. 177–186.[4] H. A. Dau, E. Keogh, K. Kamgar, C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanama-hatana, C. Yanping, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, and Hexagon-ML. 2018. The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/[5] M. González-Jiménez, S. A. Babayan, P. Khazaeli, M. Doyle, F. Walton, E. Reddy, T. Glew, M. Viana, L. Ranford-Cartwright, and A. Niang. 2019. Prediction of mosquito species and population age structure using mid-infrared spectroscopy and supervised machine learning. Wellcome Open Research 4 (2019).[6] Noor Abu-El-Rub and Abdullah Mueen. 2019. BotCamp: Bot-driven Interactions in Social Campaigns. In WWW. 2529–2535.[7] P. Tschandl, C. Rosendahl, and H. Kittler. 2018. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5 (2018), 180161.[8] Souza, V. M. A., Giusti, R., & Batista, A. J. 2018. Asfault: A low-cost system to evaluate pavement conditions in real-time using smartphones and machine learning. Pervasive and Mobile Computing, 51 (2018), 121-137.[9] Souza, V. M. A. 2018. Asphalt pavement classification using smartphone accelerometer and Complexity Invariant Distance. Engineering Applications of Artificial Intelligence 74 (2018): 198-211.[10] Dua, D., Graff, C.: UCI machine learning repository. http://archive.ics.uci.edu/ml[11] Souza, V. M. A., Silva, D. F., Gama, J., Batista, G. 2015. Data stream classification guided by clustering on nonstationary environments and extreme verification latency. Proceedings of the 2015 SIAM International Conference on Data Mining (SDM), 121-137.[12] Dyer, K. B., Capo, R., Polikar, R. 2014. Compose: A semisupervised learning framework for initially labeled nonstationary streaming data. IEEE TNNLS, 25(1):12-26, 2014.Accuracies on real-world benchmark datasets:
Accuracies on case studies:
Distance measures (StarLightCurves):