Scalable vertex-centric algorithms based on database technology
Several computational problems are characterized by huge networks with millions of nodes and billions of edges.
To deal with these challenging "big data" problems we are developing learning strategies based on:
(a) "local" implementation of existing semi-supervised network-based learning algorithms
(b) exploitation of graph database technologies for the storage of the graph and for efficiently handling
nodes and edges in secondary memory.
This approach has been preliminarily applied in the context of the gene function prediction problem using classical random walks and kernelized score functions (Mesiti, Re and Valentini, 2013, Mesiti, Re and Valentini 2014).
M. Re, M.Mesiti, G. Valentini, On the Automated Function Prediction of Big Multi-Species Networks, Network Biology SIG 2014 - ISMB 2014, Boston, USA
M. Mesiti, M. Re, G. Valentini Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction, GigaScience, 3:5, 2014
M.Mesiti, M. Re, G. Valentini Scalable Network-based Learning Methods for Automated Function Prediction based on the Neo4j Graph-database, Automated Function Prediction SIG 2013 ISMB 2013, Berlin