The background image credit to the GNN blog.
Contributor: Huan Zhao (zhaohuan@4paradigm.com); Wei-Wei Tu (tuweiwei@4paradigm.com)
Since Graph Neural Networks (GNNs) have been the state-of-the-art (SOTA) in various graph-based tasks, to make it focused and intensed, we introduce the popular and typical datasets and models in the past six years.
Following a well-known practice from the Open Graph Benchmark (OGB), we categorize the datasets based on the tasks, followed by two classic model-design pipelies: i.e., human-designed GNNs and neural architecture search. Finally we also briefly introduce two must-to-know open-source libraries for GNNs.
The datasets are chosen according two standards: a) the most widely and earliest used in existing papers; b) the ones provided by OGB, which are very hot in an open leaderboard. And we give the link of original source of these datasets and highlight the large-scale ones (task-specific) in blue.
Node-level prediction
Cora [1]
CiteSeer [1]
PubMed [1]
ogbn-products [2]
ogbn-proteins [2]
ogbn-arxiv [2]
ogbn-papers100M [2]
ogbn-mag [2]
MAG240M [3]
Link-level prediction
ogbl-ppa [2]
ogbl-collab [2]
ogbl-ddi [2]
ogbl-citation2 [2]
ogbl-wikikg2 [2]
ogbl-biokg [2]
WikiKG90Mv2 [3]
Graph-level prediction
NCI1 [4]
D&D [4]
PROTEINS [4]
IMDB-BINARY [4]
ogbg-molhiv [2]
ogbg-molpcba [2]
ogbg-ppa [2]
PCQM4Mv2 [3]
Besides the datasets, we introduce the two classic paradigms to design effective GNN models, i.e., human-designed and neural architecture search (NAS). Since there are hundreds of GNNs models proposed in the past six years, we only give the most basic, i.e., task-agnostic, and widely used models in existing papers. For the NAS-based GNNs, we introduce the classic NAS methods including the first reinforcement learning based method and differentiable one, and further two special methods, GraphGym and F2GNN, are introduced. The former focuses on the design space of GNNs, and the latter focuses on the topology design of GNNs.
Human-design GNN
GCN [5]
GraphSAGE [6]
GAT [7]
GIN [8]
JKNet [9]
NAS-based GNN
GraphNAS [10]
SANE [11]
GraphGym [12]
F2GNN [13]
We briefly give the two popular GNN libraries, which implement many popular GNN models. The stars of the two repostories in Github are greater than 10,000, which are actually must-to-use libraries in the GNN community.
PyTorch Geometric (PyG): https://github.com/pyg-team/pytorch_geometric
Deep Graph Library (DGL): https://github.com/dmlc/dgl
[1] Prithviraj et al. Collective Classification in Network Data. AI magazine 2008.
[2] Hu et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs. NeurIPS 2020.
[3] Hu et al. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. arxiv, 2021.
[4] Kristian et al. Benchmark Data Sets for Graph Kernels. 2016.
[5] Kipf et al. Semi-supervised classification with graph convolutional networks. ICLR 2016
[6] Hamilton et al. Inductive representation learning on large graphs. NeurIPS 2017.
[7] Veličković et al. Graph attention networks. ICLR 2018.
[9] Xu et al. Representation learning on graphs with jumping knowledge networks. ICML 2018.
[10] Gao et al. Graph neural architecture search. IJCAI 2020.
[11] Zhao et al. Search to aggregate neighborhood for graph neural network. ICDE 2021.
[12] You et al. Design space for graph neural networks. NeurIPS 2020.
[13] Wei et al. Designing the Topology of Graph Neural Networks: A Novel Feature Fusion Perspective. WebConf 2022.
Note Isabelle: How about the Open Graph Benchmark
See also datasets cited in https://arxiv.org/pdf/2005.00687.pdf