Large-Scale Graph Neural Networks: 

Navigating the Past and Pioneering New Horizons


AAAI 2024

 

Goal of the Tutorial

This tutorial represents a notable achievement as it offers a comprehensive overview of techniques designed for large-scale machine learning on graphs, encompassing both theoretical foundations and practical applications. It delves into past and recent research endeavors aimed at enhancing the scalability of Graph Neural Networks (GNNs) and explores their diverse potential use cases. This tutorial caters to a broad audience, targeting engineers, researchers, graduate students, and industry professionals keen on harnessing scalable GNNs for large-scale datasets. After the tutorial, the audience is expected to learn both the foundational theory of classical and novel model frameworks, as well as their applications.


 Abstract

Graph Neural Networks (GNNs) have gained significant attention in recent years due to their ability to model complex relationships between entities in graph-structured data such as social networks, protein structures, and knowledge graphs. However, due to the size of real-world industrial graphs and the special architecture of GNNs, it is a long-lasting challenge for engineers and researchers to deploy GNNs on large-scale graphs, which significantly limits their applications in real-world applications. In this tutorial, we will cover the fundamental scalability challenges of GNNs, frontiers of large-scale GNNs including classic approaches and some newly emerging techniques, the evaluation and comparison of scalable GNNs, and their large-scale real-world applications. Overall, this tutorial aims to provide a systematic and comprehensive understanding of the challenges and state-of-the-art techniques for scaling GNNs. The summary and discussion on future directions will inspire engineers and researchers to explore new ideas and developments in this rapidly evolving field.

Outline


1. Introduction of GNNs (30 minutes)


(a) Foundations of GNNs

(b) Applications of GNNs

(c) Scalability Challenges of Large-Scale GNNs

2. Classic Approaches for Scaling GNNs (60 minutes)

 (a) Sampling Methods  

 (b) Decoupling Methods

(c) Distributed Methods


Break: 30 minutes

3. Emerging Techniques for Scaling GNNs (60 minutes)


Training:


(a)  Lazy Propagation

(b)  Alternating Training

(c)  GNN Pre-training


Inferencing:

Cross-model Distillation


Data:

(a)  Graph Condensation

(b)  Subgraph Sketching

(c)  Tabularization

4. Evaluation, Comparison and Applications (20 minutes) 


5. Summary and Future Directions (10 minutes)


Presentation Details

1. Introduction of GNNs

We first offer a brief introduction to graph neural networks (GNNs). We will illustrate several classical and basic GNN models and their real-world applications [1–3]. These domains include social networks [4], biological molecules [5], and recommendation systems [6]. Following the basic concepts, we will introduce the scalability challenge of large-scale GNNs via both theoretical analysis and empirical examples [7, 8].

2. Classic Approaches for Scaling GNNs

This section focuses on classic research on the scalability and efficiency of large-scale GNNs using various innovative designs, such as sampling methods [7–9], pre/post-computing methods [3,10,11], and distributed methods [12] . Targeting on different categories, we will introduce their basic methodologies, how they mitigate the scalability issue, their variants, and their imitations. This part provides the audience with the necessaryknowledge about how we can work with graphs of increasing sizes.

3. Emerging Techniques for Scaling GNNs

This part will provide an overview of emerging trends and techniques in scalable GNN research. We will discuss the latest developments in three perspectives: training, inferencing and data. In terms of training strategies, we delve into innovative approaches such as lazy graph propagation [13], alternating training [14], and the growing importance of pre-training [15]. In the context of inference, we introduce the concept of Cross-model distillation [16]. Additionally, within the realm of data management, we explore evolving

techniques such as Graph condensation [17], Subgraph Sketching [18], and Tabularization [19]. This comprehensive examination provides valuable insights into the cutting-edge developments that are shaping the future of scalable GNN research. By the end of this discussion, the audience will gain a comprehensive understanding of the current state-of-the-art in scalable GNN research and their potential for solving real-world problems.

4. Evaluation, Comparison, and Applications

In this section, we will evaluate the schemes introduced, focusing on their accuracy and computational complexity. The outcome of these evaluations will inform readers about the advantages and drawbacks of each method. Such insights will not only guide the selection of suitable techniques for scaling GNNs but also inspire researchers and practitioners to create more effective and precise large-scale GNNs. As the availability of large-scale graph data grows, the importance of scalable GNNs for addressing intricate problems and making accurate forecasts in various fields will rise. We will delve into the implications of scalable GNNs in notable real-world applications, such as web-scale recommendation systems [20], friend

recommendation [21], and fraud detection. Additionally, we will introduce several platforms and packages tailored for large-scale graphs. To conclude, we will discuss some prevailing challenges and potential avenues for research in real-world applications.

5. Summary and Future Directions

We will summarize the tutorial and provide an discussion on the future directions of large-scale GNNs.

Presenter Biographies

Rui Xue is a Ph.D. student in the Department of Electrical and Computer Engineering at North Carolina State University. He received the master’s degree in Electrical and Computer Engineering at University of Southern California. His main research interests include machine learning on graphs, scalability of machine learning, and signal processing. He has published several papers at signal processing and machine learning conferences.


Haoyu Han is currently a second-year Ph.D. candidate in the Department of Computer Science and Engineering at Michigan State University. He earned his Master’s degree in Computer Science from the University of Science and Technology of China. Haoyu’s primary research areas encompass graph data mining and large- scale machine learning. He has won two NeurIPS competitions, including the OGB-LGC. Additionally, he has authored several publications in the field of graph data mining.


Tong Zhao is a Research Scientist in the Computational Social Science group at Snap Research. He earned a Ph.D. in Computer Science and Engineering at University of Notre Dame in 2022. His research focuses on graph machine learning as well as their applications in real-world use cases. His work has resulted in 20+ conference and journal publications, in top venues such as ICML, ICLR, KDD, AAAI, WWW, TNNLS, etc. He also has experience organizing workshops and tutorials related to GNNs.


Neil Shah is a Lead Research Scientist and Manager at Snap Re- search, working on machine learning algorithms and applications on large-scale graph data. His work has resulted in 55+ conference and journal publications, in top venues such as ICLR, NeurIPS, KDD, WSDM, WWW, AAAI and more, including several best-paper awards. He has also served as an organizer, chair and senior pro- gram committee member at a number of these conferences. He has also organized workshops and tutorials on graph machine learning topics at KDD, WSDM, SDM, ICDM, CIKM, and WWW. He has had previous research experiences at Lawrence Livermore National Laboratory, Microsoft Research, and Twitch. He earned a PhD in Computer Science in 2017 from Carnegie Mellon University’s Computer Science Department, funded partially by the NSF Graduate Research Fellowship.

Jiliang Tang is a MSU Foundation professor in the computer science and engineering department at Michigan State University. He was an associate professor from 2021 to 2022 and an assistant professor from 2016 to 2021 in the same department. His research interests include data mining, machine learning and their applications in social media, biology, and education. He was the recipient of 2022 SDM IBM Early Career Data Mining Research Award, 2021 ICDM Tao Li Award, 2020 SIGKDD Rising Star Award, 2019 NSF Career Award, and 8 best paper awards (or runner-ups) including WSDM2018 and KDD2016. His dissertation won the 2015 KDD Best Dissertation runner up and Dean’s Dissertation Award. He serves as top data science conference organizers (e.g., KDD, SIGIR, WSDM, and SDM) and journal editors (e.g., TKDD and TKDE). He has published his research in highly ranked journals and top conference proceedings, which received more than 26,000 citations with h-index 77 and extensive media coverage. 


Xiaorui Liu is an assistant professor in Computer Science Department at North Carolina State University. He received his Ph.D. degree in Computer Science from Michigan State University in 2022. His research interests include deep learning on graphs, large- scale machine learning, and trustworthy artificial intelligence. He has published innovative works in top-tier conferences such as NeurIPS, ICML, ICLR, KDD, AISTATS, and SIGIR. He has experience of organizing and co-presenting multiple tutorials related to GNNs and Large-scale Machine Learning such as “Graph Representation Learning: Foundations, Methods, Applications, and Systems” in KDD 2021 and “Communication Efficient Distributed Learning” in IJCAI 2021. 


Tutorial Slides

AAAI24_tutorial_final.pdf

REFERENCES