Supervised ensemble methods for biomolecular data integration

Data integration plays a key role in several computational biology problems, since each data source can provide complementary biologically-relevant information, necessary to unravel the biological phenomenon of interest.

We investigated the impact of ensemble methods as "late" data fusion algorithms in gene function prediction problems: each learning machine is trained on a different source of data and their decision are combined according to a specific "consensus" algorithm (Re and Valentini, 2009, 2010). Moreover we showed that ensembles are less prone to errors due to noisy data (Re and Valentini, 2010).

We applied data integration and ensemble methods also in the context of protein subcellular localization problems (Rozza et al. 2010, 2011), and we studied also problems related to the biomolecular data base management using XML to integrate heterogeneous biological data (Mesiti et al. 2009)


A. Rozza, G. Lombardi, M. Re, E. Casiraghi, G. Valentini and P. Campadelli A Novel Ensemble Technique for Protein Subcellular Location Prediction , In: "Ensembles in Machine Learning Applications", Studies in Computational Intelligence vol. 373, pp. 151-167, Springer, 2011

A. Rozza, G. Lombardi, M. Re, E. Casiraghi, and G. Valentini, DDAG K-TIPCAC: an ensemble method for protein subcellular localization, Proc. of the Third Edition of SUEMA, pp. 75-84 , ECML, Barcelona, Spain, 2010.

M. Re, G. Valentini, Noise tolerance of Multiple Classifier Systems in data integration-based gene function prediction, Supplementary Information Journal of Integrative Bioinformatics, 7(3):139, 2010

M. Re, G. Valentini, Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction Journal of Machine Learning Research, W&C Proceedings, vol.8: Machine Learning in Systems Biology, pp. 98-111, 2010.

M. Re, G. Valentini, Integration of heterogeneous data sources for gene function prediction using Decision Templates and ensembles of learning machines,

Neurocomputing, 73:7-9 pp. 1533-37, 2010

M. Re, G. Valentini, Ensemble based Data Fusion for Gene Function Prediction, In: (J. Kittler, J. Benediktsson, F. Roli, Eds.) Eighth International Workshop on Multiple Classifier Systems MCS 2009, Lecture Notes in Computer Science, vol.5519 pp.448-457, Springer 2009.

M. Mesiti, E. Jimenez-Ruiz, I. Sanz, R. Berlanga-Llavori, P. Perlasca, G. Valentini and D. Manset, XML-Based Approaches for the Integration of Heterogeneous Bio-Molecular Data BMC Bioinformatics 10:(S12)S7, 2009