Supervised ensemble methods for biomolecular data integration

Data integration plays a key role in several computational biology problems, since each data source can provide complementary biologically-relevant information, necessary to unravel the biological phenomenon of interest.

We investigated the impact of ensemble methods as "late" data fusion algorithms in gene function prediction problems: each learning machine is trained on a different source of data and their decision are combined according to a specific "consensus" algorithm (Re and Valentini, 2009, 2010). Moreover we showed that ensembles are less prone to errors due to noisy data (Re and Valentini, 2010).

We applied data integration and ensemble methods also in the context of protein subcellular localization problems (Rozza et al. 2010, 2011), and we studied also problems related to the biomolecular data base management using XML to integrate heterogeneous biological data (Mesiti et al. 2009)


