He Zhang, Bang Wu, Xingliang Yuan, Shirui Pan, Hanghang Tong, Jian Pei. Trustworthy Graph Neural Networks: Aspects, Methods, and Trends. Proceedings of the IEEE, 2024. (CCF A, SJR Q1, Impact factor 25.9, Issue cover)
Graph neural networks (GNNs) have emerged as a series of competent graph learning methods for diverse real-world scenarios, ranging from daily applications like recommendation systems and question answering to cutting-edge technologies such as drug discovery in life sciences and n-body simulation in astrophysics. However, task performance is not the only requirement for GNNs. Performance-oriented GNNs have exhibited potential adverse effects like vulnerability to adversarial attacks, unexplainable discrimination against disadvantaged groups, or excessive resource consumption in edge computing environments. To avoid these unintentional harms, it is necessary to build competent GNNs characterised by trustworthiness. To this end, we propose a comprehensive roadmap to build trustworthy GNNs from the view of the various computing technologies involved. In this survey, we introduce basic concepts and comprehensively summarise existing efforts for trustworthy GNNs from six aspects, including robustness, explainability, privacy, fairness, accountability, and environmental well-being. Additionally, we highlight the intricate cross-aspect relations between the above six aspects of trustworthy GNNs. Finally, we present a thorough overview of trending directions for facilitating the research and industrialisation of trustworthy GNNs.
He Zhang, Xingliang Yuan, Chuan Zhou, Shirui Pan. Projective Ranking-based GNN Evasion Attacks, IEEE Transactions on Knowledge and Data Engineering (TKDE), 2022. (CORE A*)
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks. However, GNNs are at risk of adversarial attacks. Two primary limitations of the current evasion attack methods are highlighted: (1) The current GradArgmax ignores the “long-term” benefit of the perturbation. It is faced with zero-gradient and invalid benefit estimates in certain situations. (2) In reinforcement learning-based attack methods, the learned attack strategies might not be transferable when the attack budget changes. To this end, we first formulate the perturbation space and propose an evaluation framework and the projective ranking method. We aim to learn a powerful attack strategy then adapt it as little as possible to generate adversarial samples under dynamic budget settings. In our method, based on mutual information, we rank and assess the attack benefits of each perturbation for an effective attack strategy. By projecting the strategy, our method dramatically minimizes the cost of learning a new attack strategy when the attack budget changes. In the comparative assessment with GradArgmax and RL-S2V, the results show our method owns high attack performance and effective transferability. The visualization of our method also reveals various attack patterns in the generation of adversarial samples.
Luzhi Wang, Dongxiao He, He Zhang, Yixin Liu, Wenjie Wang, Shirui Pan, Di Jin, Tat-Seng Chua. GOODAT: Towards Test-time Graph Out-of-Distribution Detection. AAAI Conference on Artificial Intelligence (AAAI), 2024. (CORE A*, Oral, Acceptance rate 23.75%)
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts, they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD samples with GNNs, recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN. Despite their effectiveness, these methods come with heavy training resources and costs, as they need to optimize the GNN-based models on training data. Moreover, their reliance on modifying the original GNNs and accessing training data further restricts their universality. To this end, this paper introduces a method to detect Graph Out-of-Distribution At Test-time (GOODAT), a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture. With a lightweight graph masker, GOODAT can learn informative subgraphs from test samples, enabling the capture of distinct graph patterns between OOD and ID samples. To optimize the graph masker, we meticulously design three unsupervised objective functions based on the graph information bottleneck principle, motivating the masker to capture compact yet informative subgraphs for OOD detection.
He Zhang, Bang Wu, Shuo Wang, Xiangwen Yang, Minhui Xue, Shirui Pan, Xingliang Yuan. Demystifying Uneven Vulnerability of Link Stealing Attacks against Graph Neural Networks. International Conference on Machine Learning (ICML), 2023. (CORE A*, Acceptance rate 27.9%)
While graph neural networks (GNNs) dominate the state-of-the-art for exploring graphs in real-world applications, they have been shown to be vulnerable to a growing number of privacy attacks. For instance, link stealing is a well-known membership inference attack (MIA) on edges that infers the presence of an edge in a GNN’s training graph. Recent studies on independent and identically distributed data (eg, images) have empirically demonstrated that individuals from different groups suffer from different levels of privacy risks to MIAs, ie, uneven vulnerability. However, theoretical evidence of such uneven vulnerability is missing. In this paper, we first present theoretical evidence of the uneven vulnerability of GNNs to link stealing attacks, which lays the foundation for demystifying such uneven risks among different groups of edges. We further demonstrate a group-based attack paradigm to expose the practical privacy harm to GNN users derived from the uneven vulnerability of edges. Finally, we empirically validate the existence of obvious uneven vulnerability on nine real-world datasets (eg, about 25% AUC difference between different groups in the Credit graph).
Bang Wu, He Zhang, Xiangwen Yang, Shuo Wang, Minhui Xue, Shirui Pan, Xingliang Yuan. GraphGuard: Detecting and Counteracting Training Data Misuse in Graph Neural Networks. The Network and Distributed System Security Symposium (NDSS), 2024. (CORE A*, Acceptance rate 15.0%)
The emergence of Graph Neural Networks (GNNs) in graph data analysis and their deployment on Machine Learning as a Service platforms have raised critical concerns about data misuse during model training. This situation is further exacerbated due to the lack of transparency in local training processes, potentially leading to the unauthorized accumulation of large volumes of graph data, thereby infringing on the intellectual property rights of data owners. Existing methodologies often address either data misuse detection or mitigation, and are primarily designed for local GNN models rather than cloud-based MLaaS platforms. These limitations call for an effective and comprehensive solution that detects and mitigates data misuse without requiring exact training data while respecting the proprietary nature of such data. This paper introduces a pioneering approach called GraphGuard, to tackle these challenges. We propose a training-data-free method that not only detects graph data misuse but also mitigates its impact via targeted unlearning, all without relying on the original training data. Our innovative misuse detection technique employs membership inference with radioactive data, enhancing the distinguishability between member and nonmember data distributions. For mitigation, we utilize synthetic graphs that emulate the characteristics previously learned by the target model, enabling effective unlearning even in the absence of exact graph data.
He Zhang, Xingliang Yuan, Shirui Pan. Unraveling Privacy Risks of Individual Fairness in Graph Neural Networks. IEEE International Conference on Data Engineering (ICDE), 2024. (CORE A*, Acceptance rate 22.75%)
Graph neural networks (GNNs) have gained significant attraction due to their expansive real-world applications. To build trustworthy GNNs, two aspects - fairness and privacy - have emerged as critical considerations. Previous studies have separately examined the fairness and privacy aspects of GNNs, revealing their trade-off with GNN performance. Yet, the interplay between these two aspects remains unexplored. In this paper, we pioneer the exploration of the interaction between the privacy risks of edge leakage and the individual fairness of a GNN. Our theoretical analysis unravels that edge privacy risks unfortunately escalate when the nodes’ individual fairness improves. Such an issue hinders the accomplishment of privacy and fairness of GNNs at the same time. To balance fairness and privacy, we carefully introduce fairness-aware loss reweighting based on influence function and privacy-aware graph structure perturbation modules within a fine-tuning mechanism. Experimental results underscore the effectiveness of our approach in achieving GNN fairness with limited performance compromise and controlled privacy risks. This work contributes to the comprehensively developing trustworthy GNNs by simultaneously addressing both fairness and privacy aspects.
Yizhen Zheng, He Zhang, Vincent Lee, Yu Zheng, Xiao Wang, Shirui Pan. Finding the Missing-half: Graph Complementary Learning. International Conference on Machine Learning (ICML), 2023. (CORE A*, , Acceptance rate 27.9%)
Real-world graphs generally have only one kind of tendency in their connections. These connections are either homophily-prone or heterophilyprone. While graphs with homophily-prone edges tend to connect nodes with the same class (i.e., intra-class nodes), heterophily-prone edges tend to build relationships between nodes with different classes (i.e., inter-class nodes). Existing GNNs only take the original graph during training. The problem with this approach is that it forgets to take into consideration the “missing-half” structural information, that is, heterophily-prone topology for homophily-prone graphs and homophilyprone topology for heterophily-prone graphs. In our paper, we introduce Graph cOmplementAry Learning, namely GOAL, which consists of two components: graph complementation and complemented graph convolution. The first component finds the missing-half structural information for a given graph to complement it. The complemented graph has two sets of graphs including both homophily- and heterophily-prone topology. In the latter component, to handle complemented graphs, we design a new graph convolution from the perspective of optimisation. The experiment results show that GOAL consistently outperforms all baselines in eight real-world datasets.