Work on multi-view and collaborative clustering [before 2019]
Context: My PhD thesis, some of my post-doctoral work, Denis Maurel PhD thesis
Nowadays the sources of data partially describing the same elements have been multiplied: user data split on several social networks, data coming from connected objects, medical data from multiple exams or sensor, marketing data on the same clients spread on different bases, etc. This rise of huge amounts of data available from multiple sources that may contain different clusters gave birth to the field of multi-view clustering in which consists in developing clustering algorithms that can run on each view locally while accounting for what happens in the other views. In a similar way, the field of collaborative clustering can be seen as a tool for multi-view learning since it consists in developing clustering framework that enable several clustering algorithms to work together and exchange their finding to improve their local models and the subsequent partitions.
Within this context, during my PhD thesis under the supervision of Pr. Antoine Cornuéjols and Pr. Younès Bennani, I have proposed new models for collaborative clustering that make it possible for clustering algorithms of different natures to work together on a clustering task (multi-view or not). The advantages of combining different clustering algorithms are the following: different clustering algorithms can catch different types and shapes of clusters, in a multi-view context with views that feature different types of attributes it may be necessary to use different types of clustering algorithms. As part of collaboration during the PhD Thesis of Pierre Alexandre Murena, a new approach based on Kolmogorov complexity was proposed that enables an even broader spectrum of clustering algorithm to collaborate together.
The second problem I am currently tackling with collaborative and multi-view clustering is the issue of the quality of the data and to detect noisy or irrelevant views, as well as weak algorithm models that may hinder a multi-source analysis. To this end, in collaboration with Pr. Matei and Pr. Grozavu, I have among other things studied the importance of diversity and clustering stability in collaborative clustering. In the same vein, in collaboration with Pr Matei and Dr. Murena, we are currently working on a study of the theoretical properties of collaborative and multi-view method such as the stability of these multi-algorithms frameworks, the novelty that multi-view methods can bring in local partitions but also the consistency mono-algorithm models, and finally the robustness of these methods. For this work, our approach is based on Shai Ben David study of similar properties for regular clustering. Our goal is to define and find the links between stability, novelty, and consistency for multi-view and collaborative methods. This works also involves the notion of pure clustering collaborative clustering methods, and seeks to study how already developped multi-view, ensemble and collaborative clustering methods are linked and behave regarding the forementioned properties.
Finally, as part of the PhD thesis of Denis Maurel, we have been tackling two last but not less important issues regarding multi-view clustering : the issues of data privacy and missing data. While all earlier works assumed that there were no missing data and that everything could be exchanged freely between different views, it is rarely the case in practice where missing data are a huge problem. As for freely sharing the data between views, recent scandals with social networks data have shown that privacy issues are also an important problem to consider when developing Machine Learning algorithms. To this end, we have been working on reshaping most multi-view approach using deep autoencoder and other deep neural networks so that 1) the information exchanged between the views and algorithms are anonymous and heavily encrypted, and 2) missing data can be inferred or reconstructed given that there is enough information on the other views.