Selected Projects 

(Scientific Computation & Machine Learning)

SAGMAN: Stability Analysis of Graph Neural Networks on the Manifolds

Modern GNNs can be sensitive to changes in the input graph structure and node features,  leading to unpredictable behavior and degraded performance. In this collaborative work, we introduce a spectral framework (SAGMAN) for analyzing the stability of GNNs. SAGMAN quantifies the stability of each node by examining the distance mapping distortions (DMDs) on the input/output manifolds: when two nearby nodes on the input manifold are mapped (through a GNN model) to two distant nodes (data samples) on the output manifold, it implies a large DMD and thus poor GNN stability.  To create low-dimensional input/output manifolds for meaningful DMD estimations while exploiting both the input graph topology and node features, we propose a spectral sparsification framework for estimating probabilistic graphical models (PGMs)  such that the constructed input/output graph structures can well preserve pair-wise distances on the manifolds. Our empirical evaluations show that SAGMAN can effectively reveal each node's stability under various edge/feature perturbations, offering a scalable approach for assessing the stability of GNNs.

SGM-PINN: Sampling Graphical Models for Faster Training of Physics-Informed Neural Networks 

Our latest work (DAC'24) introduces a graph-based importance-sampling framework (SGM-PINN) to improve the training efficacy of Physics-Informed Neural Networks (PINNs) on parameterized problems. SGM-PINN exploits the conditional dependence among the training samples by leveraging probabilistic graphical models (PGMs). By applying a graph decomposition scheme to an undirected PGM built from the training dataset, our method generates node clusters encoding conditional dependence between training samples. Biasing sampling towards more important clusters allows smaller mini-batches and training datasets, improving training speed and accuracy. We additionally fuse an efficient robustness metric with residual losses to determine regions requiring additional sampling. Experiments demonstrate the advantages of the proposed framework, achieving 3X faster convergence compared to prior state-of-the-art importance sampling methods.

GARNET: Reduced-Rank Topology Learning for Robust and Scalable Graph Neural Networks 

Our recent collaborative work (LoG'22, spotlight paper) introduces GARNET, a scalable spectral method to boost the adversarial robustness of GNN models. GARNET leverages weighted spectral embedding to construct a base graph, which is not only resistant to adversarial attacks but also contains a critical (clean) graph structure for GNN training. GARNET further refines the base graph by pruning additional uncritical edges based on the probabilistic graphical model. GARNET has been evaluated on various datasets, including large graphs with millions of nodes. Our extensive experimental results show that GARNET achieves adversarial accuracy improvement and runtime speedup over the state-of-the-art GNN (defense) models by up to 10.23% and 14.7X, respectively

SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation

Our recent collaborative work (ICML'21) introduces SPADE, the first black-box method for evaluating the adversarial robustness of a given machine-learning (ML) model. SPADE  examines the bijective distance mappings between the input/output graph-based manifolds by only exploiting the input and output (features) vectors. Leveraging the generalized Courant-Fischer theorem, we show that the largest generalized eigenvalue computed with the input/output graph Laplacians will be a good surrogate (upper bound) for the best Lipschitz constant of the underlying mapping function. Therefore, the SPADE score of an ML model can be directly used as a black-box metric for quantifying its adversarial robustness. Moreover,  the SPADE scores of input data samples have been exploited to improve existing adversarial training/evaluation methods.

GraphZoom: A Multi-level Spectral Approach for Accurate & Scalable Graph Embedding 

Our recent collaborative work (ICLR'20, oral presentation) introduces GraphZoom, a multi-level framework for accurate and scalable graph embedding by leveraging a scalable spectral graph coarsening scheme.  GraphZoom first performs graph fusion to generate a new graph that effectively encodes the topology of the original graph and the node attribute information. This fused graph is then repeatedly coarsened into much smaller graphs by merging nodes with high spectral similarities. GraphZoom allows any existing embedding methods to be applied to the coarsened graph before it progressively refines the embeddings obtained at the coarsest level to increasingly finer graphs. GraphZoom increases the classification accuracy and significantly reduces the runtime compared to state-of-the-art unsupervised graph embedding methods.  For example,  GraphZoom achieves over 120X runtime speedups in DeepWalk embedding of the Friendster graph that contains 8 million nodes and 400 million edges, while drastically boosting the Micro-F1 score by up to 49.9% compared to DeepWalk.

GRASPEL: Scalable Graph Topology Learning via Spectral Densification 

The first 20  Laplacian eigenvalues (figures a, c, e) and spectral embeddings (figures b, d, f) of the graphs learned by GRASPEL iterations (top figure) using the MNIST dataset (24,462 handwritten digits from 0 to 6).

Our latest work (WSDM'22) introduces GRASPEL, a highly-scalable spectral densification approach for graph topology learning from data. By limiting the precision matrix to be a graph-Laplacian-like matrix, GRASPEL allows estimating attractive Gaussian Markov Random Fields (GMRFs) from high-dimensional input data. A unique feature of GRASPEL is that the spectral embedding (or effective-resistance) distances on the learned graph will encode the similarities between the original input data points. So GRASPEL can produce a geometric spanner with bounded distortion. GRASPEL is more scalable than state-of-the-art methods which require at least O(N^2) time for each iteration, while each GRASPEL iteration can be completed in O(N log N). Compared with prior state-of-the-art approaches, GRASPEL can substantially improve computing efficiency and solution quality of a variety of data mining and machine learning applications, such as spectral clustering (SC), and dimensionality reduction (DR). 

Spectral Methods for Multilevel Big Graph Visualization 

Our recent collaborative work (CG'20) presents a novel spectral graph coarsening/sparsification algorithm for visualizing big graph data. To enable effective visual exploration of the resulting spectrally coarsening/sparsified graphs, we implement spectral clustering and edge bundling. Our framework does not depend on a particular graph layout and can be integrated into different graph drawing algorithms. We experiment with publicly available graph data of different sizes and characteristics to demonstrate the efficiency and effectiveness of our approach. To further verify our solution, we quantitatively compare our method against different graph simplification solutions using a proxy quality metric and statistical properties of the graphs.