Research interests

My research interests are in developing machine learning tools to help scientists understand data. Technological advances in medicine, engineering, the Internet, and finance have produced larger and more complex data sets. By using techniques from machine learning, statistics and optimization, I solve learning problems arising from these new technologies. In particular, my methodological and theoretical work lies in the areas of modern matrix factorization analysis, statistical machine learning, deep learning and the emerging area of data science. My applied research interests include textual and bioinformatics data .

  • Data embedding and co-clustering

  • Deep clustering

  • Model based unsupervised learning

  • Unsupervised Ensemble Learning

  • Tensor data and attributed graph Co-clustering

  • Text mining and NLP

  • Recommender system

Some Research examples...

Simultaneous versus sequential data embedding and clustering

The dual purpose of this research is reducing the dimension and clustering based on the decomposition of the objective function of Semi-NMF-PCA into two terms ; where the first one is the objective function of PCA and the second is the Semi-NMF criterion in a low-dimensional space. This approach takes advantage of the mutual reinforcement between data reduction and clustering tasks, such a solution better approximate the relaxed continuous dimension reduction solution by the true discrete clustering solution. We also establish theoretical connections among our method and NMF, k -means and PNMF, that explain the performance improvement.

Regularized Co-clustering


Attributed Graph embedding and clustering


Tensor Data Co-clustering

Relational data. An undirected graph can represent these data with vertices depicting entities and edges describing the relationships between the entities. These relationships can be well represented by multiple undirected graphs over the same set of vertices with edges arising from different graphs catching heterogeneous relations. The vertices of those networks are often structured in unknown clusters with varying properties of connectivity. These multiple graphs can be structured as a three-way tensor, where each slice of tensor depicts a graph which is represented by a count data matrix. To extract relevant clusters, we propose an appropriate model-based co-clustering capable of dealing with multiple graphs. The proposed model can be seen as a suitable tensor extension of mixture models of graphs, while the obtained co-clustering can be treated as a consensus clustering of nodes from multiple graphs. Applications on real datasets and comparisons with multi-view clustering and tensor decomposition methods show the interest of our contribution.


Unsupervised ensemble Learning


Research Project