Most of the existing work relevant to data stream mining assume that all arrived streaming data are completely labeled and these labels could be utilized at hand. However in many applications labeled data are difficult or expensive to obtain, meanwhile unlabeled data are relatively easy to collect. Semi-supervised algorithms can solve this problem by using large amount of unlabeled samples, together with a few labeled ones, to build models for prediction or classification.
At the CILab we are developing a semi-supervised method for data stream classification. The method works in an incremental way based on a semi-supervised fuzzy clustering process applied to subsequent, non-overlapping chunks of data. Every time a new chunk arrives, the cluster prototypes resulting from the last chunk are aggregated as labeled data points with the new chunk.
G. Casalino, G. Castellano, C. Mencar. Credit card fraud detection by dynamic incremental semi-supervised fuzzy clustering. In 2019 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (EUSFLAT2019), Prague, Czech Republic, September 9-13 2019. Atlantis Press.
G. Casalino, G. Castellano, C. Mencar. Incremental and adaptive fuzzy clustering for virtual learning environments data analysis. In 2019 23rd International Conference on Information Visualisation (IV2019), pp. 382–387, Paris, France, July 2-5 2019. DOI: 10.1109/IV.2019.00071.
G. Casalino, G. Castellano, A.M. Fanelli, C. Mencar. Enhancing the dissfcm algorithm for data stream classification. In Masulli, Fullér, Giove (eds.) Fuzzy Logic and Applications. Lecture Notes in Computer Science. LNAI 10614, vol. 11291, pp. 109–122, Springer, Cham, 2019. DOI:10.1007/978-3-030-12544-8_9.
C. Casalino, G. Castellano, C. Mencar. Incremental adaptive semi-supervised fuzzy clustering for data stream classification. In Proc. of the 2018 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS 2018), pp. 1–7, Rhodes, Greece, May 25-27, 2018. DOI:10.1109/EAIS.2018.8397172.
G. Castellano, A.M. Fanelli. Data stream classification by adaptive semi-supervised fuzzy clustering. In Proc. of the 26th International Conference on Artificial Neural Networks (ICANN2017), Part II. Lintas et al., eds., Lecture Notes in Computer Science. LNAI, Alghero, Italy, September 11-14, 2017. DOI: 10.1007/978-3-319-68612-7.
G. Castellano, A.M. Fanelli. Classification of data streams by incremental semi-supervised fuzzy clustering. In Fuzzy Logic and Soft Computing Applications, V. Loia, A. Petrosino, W. Pedrycz, eds., Lecture Notes in Computer Science. LNAI 10147, pp. 185–194. Springer, 2017. DOI: 10.1007/978-3-319-52962-2_16.