I am an Associate Professor of Statistics at North Carolina State University, where I have been working since Aug 2020. I received my Ph.D. in Statistics from the University of Illinois at Urbana-Champaign in Jul 2016. From Aug 2016 to Aug 2020, I was a tenure-track Assistant Professor of Statistics at Virginia Tech.
The primary theme of my research is developing formal inferential algorithms for network data and applying such algorithms to epidemiology, social sciences, and environmental health. I am also working on developing a statistical science of patient safety, focusing on adverse medical events due to human errors, medical devices, drug reactions, and radiation therapy. See my CV below for more on my work and background.
Email: ssengup2 ''at'' ncsu.edu
Google Scholar: https://scholar.google.com/citations?user=MXM2IiUAAAAJ
Twitter: @SrijanSengupta7
New arXiv pre-print on Network Cross-Validation and Model Selection via Subsampling (w/ Sayan Chakrabarty and Yuguo Chen). In this work, we introduce NETCROP (NETwork CRoss-Validation using Overlapping Partitions), a novel cross-validation procedure designed for complex and large-scale networks. NETCROP enhances computational efficiency by leveraging smaller, overlapping subnetworks for training, providing accurate model selection and parameter tuning. Our numerical results demonstrate that NETCROP often surpasses existing network cross-validation methods in both speed and accuracy.
Our work (w/ Kartik Lovekar and Subhadeep Paul) on small-world networks is now published in the Electronic Journal of Statistics. The “small-world” property—where networks have both high clustering and short paths between nodes—shows up across fields like sociology, biology, and neuroscience. But current ways of detecting small-world structure often fall short. Existing approaches mix clustering and path length into a single metric, lack statistical rigor, and rely on overly simple baseline models. In our work, we separate these two key features and define small-worldness as a formal hypothesis test. We introduce both parametric bootstrap and asymptotic tests (with theoretical guarantees) that work under flexible null models, including Erdős–Rényi. Applying these tools to real-world networks reveals a more accurate and nuanced view of the small-world phenomenon.
The revised version of our predictive assignment paper (w/ Subhankar Bhadra and Marianna Pensky) is now on arXiv. We propose a strategy called predictive assignment to scale up community detection in massive networks while ensuring statistical accuracy. First, community detection is carried out on a small subgraph to estimate the relevant model parameters. Next, each remaining node is assigned to a community based on these estimates. We prove that predictive assignment achieves strong consistency under the stochastic blockmodel and its degree-corrected version, even when the parent community detection algorithm is only weakly consistent.
Our work (w/ Indrila Ganguly and Sujit Ghosh) on subsampled residual bootstrap is now published in the Journal of Machine Learning Research. We propose a simple and versatile scalable algorithm called subsampled residual bootstrap (SRB) for generalized linear models (GLMs), a large class of regression models that includes the classical linear regression model as well as other widely used models such as logistic, Poisson and probit regression. We prove consistency and distributional results that establish that the SRB has the same theoretical guarantees under the GLM framework as the classical residual bootstrap, while being computationally much faster. We demonstrate the empirical performance of SRB via simulation studies and a real data analysis of the Forest Covertype data from the UCI Machine Learning Repository.
Our work (w/ Kaustav Chakraborty and Yuguo Chen) on scalable inference for RDPG networks is now published in the Journal of Computational and Graphical Statistics. In this article, we propose a subsampling-based method to reduce the computational cost of estimation and two-sample hypothesis testing. The idea is to divide the network into smaller subgraphs with an overlap region, then draw inference based on each subgraph, and finally combine the results together. We first develop the subsampling method for random dot product graph models, and establish theoretical consistency of the proposed method. Then we extend the subsampling method to a more general setup and establish similar theoretical properties. We demonstrate the performance of our methods through simulation experiments and real data analysis.
New NSF grant: Scalable and Generalizable Inference for Network Data. This is a single PI grant for methodological work on network inference.
The revised version of our two-sample testing paper (w/ Somnath Bhadra, Kaustav Chakraborty, and Soumendra Nath Lahiri) is now on arXiv. This paper studies the matched network inference problem, where the goal is to determine if two networks, defined on a common set of nodes, exhibit a specific form of stochastic similarity. Two notions of similarity are considered: (i) equality, i.e., testing whether the networks arise from the same random graph model, and (ii) scaling, i.e., testing whether their probability matrices are proportional for some unknown scaling constant. We develop a testing framework based on a parametric bootstrap approach and a Frobenius norm-based test statistic. The proposed approach is highly versatile as it covers both the equality and scaling problems, and ensures adaptability under various model settings, including stochastic blockmodels, Chung-Lu models, and random dot product graph models.