Tutorial for Machine learning on Graphs at WSDM 2024
Abstract
Graph neural networks (GNNs) learn from complex graph data and have been remarkably successful in various applications and across industries. This presentation first introduces the fusion of textual data with heterogeneous graph structures to improve semantic and behavioral representations. It introduces the Language Model GNN (LM-GNN), a framework that efficiently combines large language models and Graph Neural Networks (GNNs) through fine-tuning. LM-GNN supports various tasks like node classification and link prediction and demonstrates its effectiveness. Another aspect addressed is the challenge of effective node representation learning in textual graphs. Next NetInfo is introduced to understand when graph ML can provide value. The presentation also discusses pre-training text and graph models on large, heterogeneous graphs with textual data using the Graph-Aware Language Model Pre-Training (GALM) framework. It highlights the framework's effectiveness through experiments on real datasets and provides valuable insights into this innovative approach. The Graph-Aware Distillation (Grad) framework is next proposed, which encodes graph structures into a Language Model (LM) to enable fast and scalable inference. Grad optimizes GNN and a graphless student model, resulting in superior performance in node classification tasks. Finally, the presentation introduces Biobridge that allows to combine the power of knowledge graphs and language models for biomedical applications.