Stability-based methods for biomolecular cluster assessment

The validation of clusters discovered by clustering algorithms is a central problem in bioinformatics: indeed algorithms can find clusters in biomolecular data, but we need to assess whether the discovered cluster are statistically significant and biologically meaningful.

We developed stability-based algorithms and specific statistical tests for:

  1. Analyzing the overall clustering reliability and for the model order selection in an unsupervised setting of the problem (Bertoni and Valentini, 2006, 2007, 2008; Valentini 2007)
  2. Analyzing the reliability of single clusters inside a clustering (Bertoni and Valentini 2006, 2005; Valentini 2006)

The new methods have been applied to the analysis and validation of subclasses of pathologies characterized at bio-molecular level and to the discovery of multiple structures in complex bio-molecular data (e.g. hierarchical structures), using data generated through high-throughput biotechnologies (Bertoni and Valentini, 2006, 2007; Valentini and Ruffino, 2006).

We tried also to develop stability-based method to assess the reliability of hierarchical clusterings characterized by a high number of clusters and examples, targeted to the unsupervised search and validation of functional classes of genes (Avogadri et al. 2008, 2009).


