My research develops scalable Bayesian methods for complex data integration, with applications in neuroscience, security networks, and the social sciences. I focus on supervised network data, object-oriented analysis, and dimensionality reduction, motivated by experience with single- and multi-source neuroimaging and security data.
I also develop methods for causal inference, record linkage, and data fusion, inspired by social science datasets such as financial, health, and survey data.
The codes for my published research are available on my GitHub page (also linked below).
Selected Publications/Preprints on Structured Data Analysis
Visualization of the dynamic relationships between Hizballah (Hi), Hamas (Ha), and the Palestinian Islamic Jihad (PIJ) over time, with dotted edges in 2010 indicating unobserved relationships that were accurately predicted.
Joint Modeling of Temporally Evolving Multiplex Graphs and Nodal Attributes Using Neural Gaussian Processes: Insights from Terrorism Network Analysis (Preprint Link)
In Revision, 2025+
Abstract: We address the dynamic co-evolution of multiplex graphs and node attributes in terrorism networks. Focusing on terrorist organizations, we integrate multiplex graph layers and node attributes using time-varying stochastic latent factor models and neural network Gaussian processes (NN-GP). This framework leverages shared latent factors to capture graph structure and attribute evolution, while accounting for uncertainty and partial observations.
Circos plot showing estimated network coefficient between influential regions of interest (ROIs) in the brain.
Bayesian Data Fusion of Network Graphs and Spatially Correlated Node Attributes (Preprint Link)
Under Review, 2025+
Abstract: We introduce a predictor-dependent joint modeling framework for network data with spatially correlated attributes. It enables concurrent inference on node-predictor associations, spatial correlations, and regression relationships between predictors, edges, and nodal attributes. Applied to multi-modal brain imaging data, the framework integrates structural and functional MRI information to analyze brain connectivity, aging-related features, and spatial relationships. The Bayesian approach provides robust, uncertainty-quantified inferences, offering superior performance, especially with limited sample sizes.
Simplified illustration of a multiplex security network.
Multiplex Regression in Security Systems (Preprint Link)
In Revision, 2025+
Abstract: We introduce a novel regression framework using multilayer networks to predict a continuous outcome. Unlike existing methods, our approach employs low-rank models and shares parameters across layers to capture complex relationships. Applied to security network data from a U.S. National Laboratory, it efficiently predicts threat detection times and identifies influential nodes, outperforming existing methods in both inference and prediction.
Connectogram showing partial correlations between different brain regions in the left hemisphere.
A Bayesian Multiplex Graph Classifier of Functional Brain Connectivity Across Diverse Taks of Cognitive Control (Journal Link) (R Code)
Neuroinformatics, 2024
Special Issue on Data Science Methods and Neuroinformatics Applications
Abstract: We investigate the impact of aging on functional brain connectivity across cognitive control tasks, with a focus on identifying brain regions linked to early aging. Modeling functional connectivity as multiplex graphs, we address the challenge of predicting a binary outcome (aging vs. normal) using multiple graph predictors. Existing methods often struggle to fully leverage within- and across-layer information, particularly in small samples. To overcome this, we propose the Bayesian Multiplex Graph Classifier (BMGC), which models edge coefficients via bilinear interactions of node-specific latent effects and applies variable selection to identify influential nodes. BMGC offers computational efficiency and uncertainty quantification in node identification, coefficient estimation, and prediction. Simulations show superior performance over existing methods, and application to fMRI data reveals symmetric patterns in the sensory motor network and asymmetric aging-related effects in the default mode network.
Creative Achievement Questionnaire (CAQ) Data: Plot (a) displays clustering uncertainty; Plot (b) presents the posterior distribution of the estimated number of clusters as a barplot.
Covariate-Dependent Clustering of Undirected Networks with Brain-Imaging Data (Journal Link) (R Code)
Technometrics, 2024
Abstract: We develop a nonparametric Bayesian mixture model for clustering subjects based on shared relationships between subject-specific undirected networks and covariates, allowing these relationships to vary across clusters. The model uses low-rank, group-sparse symmetric matrix coefficients to capture associations between scalar predictors and network nodes, enabling inference on node-level effects within each cluster. The Bayesian framework supports data-driven cluster estimation, quantifies clustering uncertainty, and provides precise uncertainty measures for node-level associations. Simulations show strong performance, and application to brain connectome data reveals cluster-specific brain regions linked to creative achievement.
Bayesian Regression with Undirected Network Predictors with an Application to Brain Connectome Data (Journal Link) (R Code)
Journal of the American Statistical Association, 2021
Abstract: We study the relationship between brain networks and creativity by identifying brain regions and connections that significantly impact creative ability. Unlike traditional approaches that vectorize network matrices and lose structural information, we develop a flexible Bayesian framework that preserves network structure, enables accurate prediction, and identifies key brain regions and connections. Our method introduces novel network shrinkage priors and provides uncertainty quantification through Bayesian inference.
Bayesian Generalized Sparse Symmetric Tensor-on-Vector Regression (Journal Link) (R Code)
Technometrics, 2021
Abstract: We introduce a generalized Bayesian linear model for symmetric tensor responses and scalar predictors, incorporating low-rankness and group sparsity for efficiency and interpretability. The framework identifies key tensor nodes and cells linked to predictors while quantifying uncertainty. Theoretically, we show that the posterior predictive density converges to the true density at a near-optimal rate in Hellinger distance, depending on how the number of tensor nodes grow with the sample size.
Selected Publications on Data Integration through Record Linkage
Bayesian Causal Inference with Bipartite Record Linkage (Journal Link)
Bayesian Analysis, 2022
Abstract: In settings where causal inference relies on data split across files without error-free identifiers, standard two-stage approaches: first linking records, then analyzing, fail to account for linkage uncertainty or leverage study variable relationships. We propose a joint Bayesian model that simultaneously performs probabilistic record linkage and causal inference, improving both linkage quality and treatment effect estimates. Simulation and theoretical results support its advantages, and we illustrate the method with a study on debit card possession and household spending.
Journal of Statistical Planning & Inference, 2024
Abstract: We study causal inference in observational studies where data are split across two files, one with treatment, outcome, and some covariates, and another with additional relevant covariates for overlapping individuals. Without unique identifiers, analysts typically use probabilistic record linkage to merge data and estimate causal effects, but this approach ignores linkage uncertainty and does not leverage variable relationships to improve matches. We propose a Bayesian method that jointly performs regression-assisted record linkage and causal inference, generating multiple linked datasets for valid multiple imputation. Simulations and analysis of Italian income and wealth data demonstrate improved treatment effect estimation using our approach.
Selected Publications on Differential Privacy
Differentially private estimation of weighted average treatment effects for binary outcomes (Journal Link)
Computational Statistics and Data Analysis, 2025
Abstract: In social and health sciences, causal inferences often rely on sensitive data, raising ethical and legal concerns about participant confidentiality. Since releasing any statistic, including causal effect estimates, can leak private information, there is growing interest in using differentially private estimators. This work develops new algorithms for estimating weighted average treatment effects with binary outcomes under differential privacy. Theoretical accuracy guarantees are provided, and empirical evaluations using simulations and real-world data (education and income) demonstrate their performance.