Large-Scale Graph Algorithmics: Theory and Practice
As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. Processing graphs with hundreds of billions of edges is only possible via developing distributed algorithms under distributed graph mining frameworks such as MapReduce, Pregel, Gigraph, and alike. For these distributed algorithms to work well in practice, we need to take into account several metrics such as the number of rounds of computation and the communication complexity of each round. For example, given the popularity and ease-of-use of MapReduce framework, developing practical algorithms with good theoretical guarantees for basic graph algorithms is a problem of great importance.
In this tutorial, we first discuss how to design and implement algorithms based on traditional MapReduce architecture. In this regard, we discuss various basic graph theoretic problems such as computing connected components, maximum matching, MST, counting triangle and overlapping or non-overlapping clustering. We discuss a computation model for MapReduce and describe the sampling, filtering, and core-set techniques to develop efficient algorithms in this framework. At the end, we explore the possibility of employing other distributed graph processing frameworks. In particular, we study the effect of augmenting MapReduce with a distributed hash table (DHT) service and also discuss the use of a new graph processing framework calledASYMP based on asynchronous message-passing method. In particular, we will show that using ASYMP, one can improve the CPU usage, and achieve significantly improved running time.
Silvio Lattanzi, Google Research Zurich
Silvio Lattanzi is a Research Scientist at Google Research Zurich. He received his Phd from University La Sapienza of Rome. His main research interests are in algorithms, large-scale graph mining, information retrieval and machine learning. He has published more than 40 papers in top tier conferences and journals in information retrieval, machine learning and algorithms. He has been awarded of KDD 2015 best paper award. He also served as program committee or senior program committee for more than 40 conferences including WWW, WSDM, KDD and SODA. Previously, he has presented tutorials at WSDM conference. He currently serves as a tech lead for the large-scale graph mining project at Google, and he is currently leading a new algorithm and optimization research group in Google Zurich.
Vahab Mirrokni, Google Research New York
Vahab Mirrokni is leading the algorithms research group at Google Research New York where he is a Principal Research Scientist. He joined Google after research appointments at Microsoft Research, MIT, and Amazon.com. He received a B.Sc. from Sharif University in 2001 and a Ph.D from MIT in 2005. He has published more than 100 papers in journals and conferences and has filed more than 30 patents in these areas. He has been the co-winner of the SODA 2005 best student paper award, the ACM EC 2008 outstanding paper award and the KDD 2015 best paper award. Vahab’s main research interests are algorithms design, large-scale graph mining, and the Internet Economics. He has served as a PC, and Senior PC member of several top conferences in these areas including EC, WWW, KDD, WSDM, and SODA. Previously, he has presented tutorials at ICML, WWW, ACM EC, WSDM and SIGMETRICS conferences. At Google, he leads several efforts in optimization for online advertising and large-scale graph mining. Vahab is now the tech lead manager for the large-scale graph mining project at Google Research New York.