CHDD 2012

International Workshop on

Clustering High-Dimensional Data

Palazzo Serra di Cassano, Via Monte di Dio 14, Naples (Italy), May 15th, 2012

 Flyier Pictures Invited SpeakersTechnical program VenueAccommodation Paper submission Registration

One of the strongest problems afflicting current machine learning techniques is dataset dimensionality. In many applications to real world problems, we deal with data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as medicine or biology, where DNA microarray technology can produce a large number of measurements at once, the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the dictionary, and many others, including data integration and management, and social network analysis. In all these cases, the dimensionality of data makes learning problems hardly tractable.

In particular, the high dimensionality of data is a highly critical factor for the clustering task. The following problems need to be faced for clustering high-dimensional data:

  • When the dimensionality is high, the volume of the space increases so fast that the available data becomes sparse, and we cannot find reliable clusters, as clusters are data aggregations (curse of dimensionality).

  • The concept of distance becomes less precise as the number of dimensions grows, since the distance between any two points in a given dataset converges (concentration effects).

  • Different clusters might be found in different subspaces, so a global filtering of attributes is not sufficient (local feature relevance problem).

  • Given a large number of attributes, it is likely that some attributes are correlated. Hence, clusters might exist in arbitrarily oriented affine subspaces.

  • High-dimensional data could likely include irrelevant features, which may obscure the effect of the relevant ones.

The workshop is aimed to the study of current approaches towards clustering high-dimensional data and its topic areas include, but are not limited to, approaches based on

  • relational clustering

  • data reduction using rough and fuzzy sets

  • subspace clustering

  • projected clustering

  • correlation clustering

  • biclustering/co-clustering

  • clustering ensembles

  • multi-view clustering

and to related methods, such as those for intrinsic dimension estimation and for clustering comparison. 

We solicit original or survey contributions (including work in progress) that contribute to this research area in data clustering. Participants who wish to give a talk should submit an extended abstract (max. 2 pages in free format including paper title, authors and affiliations) before April 25th, 2012 through the Easy Chair website

Abstracts will be referred as soon they will be submitted and the decision will be sent to authors. 

Authors of presented papers will be invited to submit a full paper for a post-workshop volume to be published in the Springer's LNCS series.

We request that participants register  ASAP and possibly before April 30th, 2012 following this LINK [note that "campo obbligatorio" means "mandatory field"  and "invia" means "send"].

April 25th, 2012 Abstract submission (max 2 pages as pdf file)
April 30th, 2012 Abstract acceptation
April 30th, 2012 Registration: go  HERE (no fees, but required)  
May 15th, 2012 Workshop 
Jul 20th, 2012 Full papers submission

Workshop Organizers:

Francesco Masulli  - University of Genova, Genoa (Italy)

Alfredo Petrosino  - University of Napoli Parthenope, Naples (Italy)


GNCSGruppo Nazionale  per il  Calcolo Scientifico 
IISFIstituto Italiano per gli Studi Filosofici
SIGBISpecial Interest Group in Bioinformatics and Intelligence of INNS
TFNNTask Force in Neural Networks of IEEE-CIS-TCBB
DISIDept. Computer and Information Sciences - Univ. Genova, Italy
DSADept. Applied Science, Univ. Naples Parthenope

Contacts: email to

<Giornata di Studi realizzata grazie a un finanziamento del GNCS - Gruppo Nazionale per il Calcolo Scientifico dell'INdAM -Istituto Nazionale di Alta Matematica Francesco Severi>