Learning Data Representation for Clustering
In conjunction with The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2024) , Taipei, Taiwan, from May 7–10, 2024
Workshop Overview
To deal with massive data, Clustering which is the process of organizing similar objects into meaningful clusters. This approach is essential in many fields, including data science, information retrieval, bio-informatics and computer vision. Despite their success, most existing clustering methods are severely challenged by the data generated by modern applications, which are typically high dimensional, noisy, heterogeneous and sparse. Therefore, it is a fundamental problem to find a suitable representation of high dimensional data, which can enhance the performance of clustering. This has driven many researchers to investigate new clustering models to overcome these difficulties. One promising category of such models relies on learning data representation. Although specific domain knowledge can be used to help design representations, and the quest for machine learning is motivating the design of more powerful representation-learning algorithms implementing such priors.
The idea is to learn a new data representations of the objects of interest, e.g., images, that encode only the most relevant information characterizing the original data, which would for example reduce noise and sparsity. Since the representation learning process is not guaranteed to infer accurate representations that are suitable for the clustering task, it is important to perform both tasks jointly, as recommended by several authors, so as to let clustering govern feature extraction and vice-versa. Within this framework, classical dimensionality reduction approaches, e.g., Principal Component Analysis (PCA), have been widely considered for the data representation task. However, the linear nature of such techniques makes it challenging to infer faithful representations of real-world data, which typically lie on highly non-linear manifolds. This motivates the investigation of deep representation learning models (e.g., auto-encoders, convolutional neural networks, etc.), which have proven so far successful in extracting highly non-linear features from complex data, such as text, images and graphs. While promising, composing deep representation learning with clustering simultaneously has just started. The marriage between “Data representation” and “clustering” will bring huge opportunities as well as challenges to communities concerned with dimensionality reduction and clustering. This workshop aims at discovering the recent advanced on data representation for clustering under different approaches. Thereby, the LDRC workshop is an opportunity to:
- present the recent advances in data representation based clustering algorithms, 
- outline potential applications that could inspire new data representation approaches for clustering, 
- explore benchmark data to better evaluate and study data representation based clustering models. 
Format
Workshops are scheduled to be held at the beginning of the conference, May 7, 2024. Workshop papers will not be included in the conference proceedings but available on the PAKDD 2024 webpage.
The workshop is co-located with the The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2024) , Taipei, Taiwan, from May 7–10, 2024 .