Speakers Code Assistant
Mila
McGill University, Mila
McGill University
DRW Holdings, LLC
Advisory Board
Introduction
The success of graph models, especially on node-level tasks, is commonly believed to be rooted in the homophily principle. Homophily is a concept in sociology and evolutionary biology, stating the tendency that individuals with similar characteristics are easier to communicate and bond with each other. In graph representation learning, it describes the networks where nodes with the same labels or similar attributes are more likely to be connected. Such relational inductive bias is thought to be a major contributor to the superiority of GNNs over traditional Neural Networks (NNs) on various tasks.
On the other hand, the lack of homophily, i.e. heterophily, is considered as the main cause of the inferiority of GNNs on heterophilic graphs, and such performance degradation is widespread. People have revisited and re-evaluated lots of existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, heterophily problem is broadly relevant to numerous graph-related applications (shown in Figure 1-6), e.g. graph fraud/anomaly detection, point cloud segmentation, urban computing, recommender systems, scene graph generation, drug discovery, brain network analysis, link prediction, privacy, graph adversarial attacks and robustness, etc. In the past few years, considerable efforts have been devoted to studying and addressing the heterophily issue.
In this tutorial, we will systematically summarize the latest progress on heterophilic graph learning and deliver them in reader-friendly way. The contents include (1) categorization of benchmark datasets and evaluation of homophily metrics on synthetic graphs, (2) meticulous classification of the most updated supervised and unsupervised learning methods, (3) theoretical analysis on homophily/heterophily, (4) heterophily-related applications and (5) challenges and future directions for heterophilic graph learning.
Figure 1: Fraud detection with heterophily relations in Uber driver-rider network. Red nodes represent fraud users, blue nodes represent legitimate users. (Source)
Figure 2: Homophily distribution of molecules in QM9. The homophilic bond links atoms of the same type, while a heterophilic bond links different types of atoms. (Source)
Example
As shown in Figure 6, for the indiscernible boundary nodes, homophilic structure provides extra useful information into their aggregated features over the original node features. Such relational inductive bias is thought to be a major contributor to the superiority of GNNs over traditional NNs on various tasks. On the other hand, in Figure 7, the boundary nodes have more heterophilic neighbors than homophilic ones. Since heterophilic edges connect nodes of different classes, they can lead to mixed and indistinguishable node embeddings, making the classification task more difficult for GNNs.
Figure 6: Message passing on homophilic graph
(indistinguishable nodes -->message passing--> distinguishable nodes)
Figure 7: Message passing on heterophilic graph
(distinguishable nodes -->message passing--> indistinguishable nodes)
Goal
The objective of this tutorial is to: (1) let people be aware of heterophily problem and understand that not all graph structures are useful or beneficial for learning; (2) help our audiences to review the latest heterophily-specific graph models quickly and comprehensively; (3) summarize real-world applications and assist our audiences on their future research about related topics.
Schedule (90min)
Introduction and Background Knowledge (Sitao Luan, 5min)
Definition of homophily/heterophily
Be careful of your graph: why do we have to consider heterophily in graph learning
Homophily metrics
Homophily Metrics (Sitao Luan, code demo by Qincheng Lu and Jiaqi Zhu, 10min)
Graph-Label consistency
Similarity based metrics
Neighborhood identifiability/informativeness
Hypothesis testing based performance metrics
Evaluation methods with synthetic graphs
Benchmark (Qincheng Lu, code demo by Qincheng Lu and Jiaqi Zhu, 15min)
Benign, malignant and ambiguous heterophily datasets
Evaluation of homophily metrics on synthetic graphs
Models for Heterophily (Sitao Luan, 15min)
Supervised learning on homogeneous graphs
Theoretical Studies (Sitao Luan, 15min)
Mid-homophily pitfall
Homophily and distribution shifts
Other interesting findings
Homophily/Heterophily Related Applications (Sitao Luan, 15min)
Fraud/anomaly detection
Graph Learning Tasks
Computer vision
Urban Computing
Brain Network Analysis
Challenges and Future Directions (Chenqing Hua, 15min)
Heterophily on heterogeneous GNNs
Heterophily on temporal GNNs
Heterophily on Hypergraphs
Fairness
LLM and foundation models
Molecule design
Targeted Audience
The beginner level of graph theory and graph representation learning knowledge is needed for our targeted audience.
Citations
If you use the code, please cite the original sources:
@article{luan2024heterophily,
title={Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks},
author={Luan, Sitao and Lu, Qincheng and Hua, Chenqing and Wang, Xinyu and Zhu, Jiaqi and Chang, Xiao-Wen and Wolf, Guy and Tang, Jian},
journal={arXiv preprint arXiv:2409.05755},
year={2024}}
If you use the contents in the tutorial, please cite the original sources:
@article{luan2024heterophilic,
title={The heterophilic graph learning handbook: Benchmarks, models, theoretical analysis, applications and challenges},
author={Luan, Sitao and Hua, Chenqing and Lu, Qincheng and Ma, Liheng and Wu, Lirong and Wang, Xinyu and Xu, Minkai and Chang, Xiao-Wen and Precup, Doina and Ying, Rex and others},
journal={arXiv preprint arXiv:2407.09618},
year={2024}}
Contact Information
Speakers
Sitao Luan: sitao.luan@mail.mcgill.ca
Chenqing Hua: chenqing.hua@mail.mcgill.ca
Qincheng Lu: qincheng.lu@mail.mcgill.ca
Code Assistant
Jiaqi Zhu: jiaqi.zhu@mail.mcgill.ca