I am an Associate Professor of Statistics at North Carolina State University, where I have been working since Aug 2020. I received my Ph.D. in Statistics from the University of Illinois at Urbana-Champaign in Jul 2016. From Aug 2016 to Aug 2020, I was a tenure-track Assistant Professor of Statistics at Virginia Tech.
The primary theme of my research is developing formal inferential algorithms for network data and applying such algorithms to epidemiology, social sciences, and environmental health. I am also working on developing a statistical science of patient safety, focusing on adverse medical events due to human errors, medical devices, drug reactions, and radiation therapy. See my CV below for more on my work and background.
Email: ssengup2 ''at'' ncsu.edu
Google Scholar: https://scholar.google.com/citations?user=MXM2IiUAAAAJ
Twitter: @SrijanSengupta7
New arXiv pre-print: A Unified Framework for Community Detection and Model Selection in Blockmodels (w/ Subhankar Bhadra and Minh Tang).
Blockmodels are a foundational tool for modeling community structure in networks, with the stochastic blockmodel (SBM), degree-corrected blockmodel (DCBM), and popularity-adjusted blockmodel (PABM) forming a natural hierarchy of increasing generality. While community detection under these models has been extensively studied, much less attention has been paid to the model selection problem, i.e., determining which model best fits a given network. Building on recent theoretical insights about the spectral geometry of these models, we propose a unified framework for simultaneous community detection and model selection across the full blockmodel hierarchy. A key innovation is the use of loss functions that serve a dual role: they act as objective functions for community detection and as test statistics for hypothesis testing. We develop a greedy algorithm to minimize these loss functions and establish theoretical guarantees for exact label recovery and model selection consistency under each model. Extensive simulation studies demonstrate that our method achieves high accuracy in both tasks, outperforming or matching state-of-the-art alternatives. Applications to five real-world networks further illustrate the interpretability and practical utility of our approach.
Our paper, "A Bootstrap-based Method for Testing Network Similarity," (w/ Somnath Bhadra, Kaustav Chakraborty, and Soumendra Nath Lahiri) has been published in the Journal of Computational and Graphical Statistics.
In this work, we address the problem of determining whether two networks, defined on a common set of nodes, exhibit stochastic similarity. We introduce a bootstrap-based testing framework that assesses two notions of similarity: (i) equality—testing if the networks arise from the same random graph model, and (ii) scaling—testing if their probability matrices are proportional. The proposed method is versatile, accommodating various network models such as stochastic blockmodels, Chung-Lu models, and random dot product graph models. We establish the theoretical consistency of our tests and demonstrate their empirical performance through extensive simulations and a real-world application involving the Aarhus network dataset.
New arXiv pre-print on Network Cross-Validation and Model Selection via Subsampling (w/ Sayan Chakrabarty and Yuguo Chen). In this work, we introduce NETCROP (NETwork CRoss-Validation using Overlapping Partitions), a novel cross-validation procedure designed for complex and large-scale networks. NETCROP enhances computational efficiency by leveraging smaller, overlapping subnetworks for training, providing accurate model selection and parameter tuning. Our numerical results demonstrate that NETCROP often surpasses existing network cross-validation methods in both speed and accuracy.
Our work (w/ Kartik Lovekar and Subhadeep Paul) on small-world networks is now published in the Electronic Journal of Statistics. The “small-world” property—where networks have both high clustering and short paths between nodes—shows up across fields like sociology, biology, and neuroscience. But current ways of detecting small-world structure often fall short. Existing approaches mix clustering and path length into a single metric, lack statistical rigor, and rely on overly simple baseline models. In our work, we separate these two key features and define small-worldness as a formal hypothesis test. We introduce both parametric bootstrap and asymptotic tests (with theoretical guarantees) that work under flexible null models, including Erdős–Rényi. Applying these tools to real-world networks reveals a more accurate and nuanced view of the small-world phenomenon.
The revised version of our predictive assignment paper (w/ Subhankar Bhadra and Marianna Pensky) is now on arXiv. We propose a strategy called predictive assignment to scale up community detection in massive networks while ensuring statistical accuracy. First, community detection is carried out on a small subgraph to estimate the relevant model parameters. Next, each remaining node is assigned to a community based on these estimates. We prove that predictive assignment achieves strong consistency under the stochastic blockmodel and its degree-corrected version, even when the parent community detection algorithm is only weakly consistent.
Our work (w/ Indrila Ganguly and Sujit Ghosh) on subsampled residual bootstrap is now published in the Journal of Machine Learning Research. We propose a simple and versatile scalable algorithm called subsampled residual bootstrap (SRB) for generalized linear models (GLMs), a large class of regression models that includes the classical linear regression model as well as other widely used models such as logistic, Poisson and probit regression. We prove consistency and distributional results that establish that the SRB has the same theoretical guarantees under the GLM framework as the classical residual bootstrap, while being computationally much faster. We demonstrate the empirical performance of SRB via simulation studies and a real data analysis of the Forest Covertype data from the UCI Machine Learning Repository.
Our work (w/ Kaustav Chakraborty and Yuguo Chen) on scalable inference for RDPG networks is now published in the Journal of Computational and Graphical Statistics. In this article, we propose a subsampling-based method to reduce the computational cost of estimation and two-sample hypothesis testing. The idea is to divide the network into smaller subgraphs with an overlap region, then draw inference based on each subgraph, and finally combine the results together. We first develop the subsampling method for random dot product graph models, and establish theoretical consistency of the proposed method. Then we extend the subsampling method to a more general setup and establish similar theoretical properties. We demonstrate the performance of our methods through simulation experiments and real data analysis.
New NSF grant: Scalable and Generalizable Inference for Network Data. This is a single PI grant for methodological work on network inference.