Learning Data Representation for Clustering


In Conjonction with PAKDD 2019 , 14-17 April, MACAU -CHINA .

Workshop Overview

To deal with massive data, Clustering which is the process of organizing similar objects into meaningful clusters. This approach is essential in many fields, including data science, information retrieval, bio-informatics and computer vision. Despite their success, most existing clustering methods are severely challenged by the data generated by modern applications, which are typically high dimensional, noisy, heterogeneous and sparse. Therefore, it is a fundamental problem to find a suitable representation of high dimensional data, which can enhance the performance of clustering. This has driven many researchers to investigate new clustering models to overcome these difficulties. One promising category of such models relies on learning data representation. Although specific domain knowledge can be used to help design representations, and the quest for machine learning is motivating the design of more powerful representation-learning algorithms implementing such priors.

The idea is to learn a new data representations of the objects of interest, e.g., images, that encode only the most relevant information characterizing the original data, which would for example reduce noise and sparsity. Since the representation learning process is not guaranteed to infer accurate representations that are suitable for the clustering task, it is important to perform both tasks jointly, as recommended by several authors, so as to let clustering govern feature extraction and vice-versa. Within this framework, classical dimensionality reduction approaches, e.g., Principal Component Analysis (PCA), have been widely considered for the data representation task. However, the linear nature of such techniques makes it challenging to infer faithful representations of real-world data, which typically lie on highly non-linear manifolds. This motivates the investigation of deep representation learning models (e.g., auto-encoders, convolutional neural networks, etc.), which have proven so far successful in extracting highly non-linear features from complex data, such as text, images and graphs. While promising, composing deep representation learning with clustering simultaneously has just started. The marriage between “Data representation” and “clustering” will bring huge opportunities as well as challenges to communities concerned with dimensionality reduction and clustering.

This workshop aims at discovering the recent advanced on data representation for clustering under different approaches. Thereby, the LDRC workshop is an opportunity to:

● present the recent advances in data representation based clustering algorithms,

● outline potential applications that could inspire new data representation approaches for clustering,

● explore benchmark data to better evaluate and study data representation based clustering models.

The workshop is co-located with the Pacific-Asia Conference on Knowledege Discovery and data Mining (PAKDD 2019).

Important Dates

Workshop papers submission: February 7, 2019 [23:59 PST]

  • Author notification: February 10, 2019
  • Camera-ready due: February 15, 2019
  • Conference dates: April 14-17, 2019

Workshop Chairs

General Chair

Prof. Mohamed Nadif

Department of Mathematics and Computer Science

University Paris Descartes, FR

Email: mohamed.nadif@parisdescartes.fr

Program Co-chair

Assoc. Prof. Lazhar Labiod

Department of Mathematics and Computer Science

University Paris Descartes, FR

Email: lazhar.labiod@parisdescartes.fr

Workshop Organizers

Workshop Contact

lazhar.labiod@parisdescartes.fr