Methodological Keywords (Core Statistical Innovations)
Causal inference; High-dimensional inference; Data integration; Nonparametric Bayesian methods; Bayesian hierarchical models; MCMC methods; Clustering and classification; Survival and longitudinal data analysis; Spatial statistics; Network analysis; Semiparametric methods; Transfer and federated learning; Dimension reduction; Meta-analysis; Computational statistics; Observational studies
Application Areas (Biomedical & Clinical Impact)
Cancer genomics; Precision oncology; Biomarker discovery; Genomic medicine; Immuno-oncology; Cancer survival analysis; Image processing and medical imaging; Health disparity research; Digital health; Translational research; Population science
Interdisciplinary Connections (Collaborative Domains)
Spatial transcriptomics data; Omics data (multi-omics integration); Statistical genetics; Systems biology; Microbiome data; Neuroscience; Connectomics; Computational medicine; AI-driven healthcare; Health informatics; Epidemiology; Healthcare and public health policy; Bioinformatics
Mickle, Angela M., et al. (2025). Applying a Novel Whole Person Approach Findings Show Sex Differences in a Measure of Allostatic Load in Diverse Mid-Older Adults with Chronic Pain Associated with Risk for Knee Osteoarthritis. The Journal of Pain, 29.
Gu, C., Baladandayuthapani, V., and Guha, S., (2025). Nonparametric Bayes Differential Analysis of Multigroup DNA Methylation Data. Bayesian Analysis, 20, 489-518. A preliminary version of the paper was recognized through an SBSS Student Paper Competition Award to Chiyu Gu. R code
Guha, S. and Qiu, P. (2025). Bayesian Pairwise Comparison of High-Dimensional Images. Journal of Computational and Graphical Statistics, to appear.
Guha, S. and Li, Y. (2024). Bayesian Estimation of Propensity Scores for Integrating Multiple Cohorts with High-Dimensional Covariates. Statistics in Biosciences, to appear.
Yan, D. and Guha, S. (2024). A Clustering Approach to Integrative Analyses of Multiomic Cancer Data. Journal of Applied Statistics, to appear.
Manavalan, P. et al. (2024). Analysis of the Acceptability of Multi-Level Sexual Health Interventions and Sexually Transmitted Infection Screening and Testing Among Persons With HIV Across Three Clinical Sites in Florida. Journal of Acquired Immune Deficiency Syndromes, to appear.
Shetty, S., Luo, Y., Thomas, A., Guha, S., and Lott, D. (2024). Effect of Exercise Training on Clinical and Physiological Variables in Adults with My- otonic Dystrophy Type 1: A Systematic Review Protocol. MethodsX, to appear.
Guha, S. and Li, Y., (2024). Causal Meta-Analysis by Integrating Multiple Observational Studies with Multivariate Outcomes. Biometrics, 80, ujae070.
Song, J., Guha, S., and Li, Y., (2023). Bayesian Inference for High Dimensional Cox Models with Gaussian and Diffused-Gamma Priors: A Case Study of Mortality in COVID-19 Patients Admitted to the ICU. Statistics in Biosciences, 1–29.
Banks, D. and Guha, S., (2023). Contributed Discussion: What is the probability of realizing a distribution from a stick-breaking process that falls outside an e-ball on the base measure? Bayesian Analysis, 18, 352–353.
Guha, S., Jung, R. and Dunson, D. (2022). Predicting Phenotypes from Brain Connection Structure. Journal of the Royal Statistical Society: Series C, 71, 639–668.
Datta, S. and Guha, S. (Eds.) (2021). Statistical Analysis of Microbiome Data, Frontiers in Probability and the Statistical Sciences, Springer Nature, Switzerland.
Sachdeva, A., Ahn, C., Tiwari, R. and Guha, S. (2022). A Novel Approach to Augment Single-Arm Clinical Studies with Real-World Data. Journal of Bio- pharmaceutical Statistics, 27, 1–17.
Anyaso-Samuel, S., Sachdeva, A., Guha, S., and Datta, S. (2021). Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier. Frontiers in Genetics, section Computational Genomics, 12:642282.
Anyaso-Samuel, S., Sachdeva, A., Guha, S., and Datta, S. (2021). Bioinformatics Pre-processing of Microbiome Data with an Application to Metagenomic Forensics. Statistical Analysis of Microbiome Data (eds. Datta, S. and Guha, S.), Springer Nature, Switzerland.
Guha, S., and Datta, S. (2021). A Bayesian Approach to Restoring the Duality between Principal Components of a Distance Matrix and Operational Tax- onomic Units in Microbiome Analyses. Statistical Analysis of Microbiome Data (eds. Datta, S. and Guha, S.), Springer Nature, Switzerland.
Guha, S. and Ghosh, S. K. (2020). Probabilistic Detection and Estimation of Conic Sections from Noisy Data. Journal of Computational and Graphical Statistics, 29, 513–522.
Yan, D., Guha, S., Ahn, C., and Tiwari, R. (2020). Semiparametric Bayesian Markov Analysis of Personalized Benefit-Risk Assessment. Annals of Applied Statistics, 14, 768–788.
Drusbosky, L.M, Singh, N. K., Hawkin, K. E., et al (2019). A Genomics-Informed Computational Biology Platform Prospectively Predicts Treatment Responses in AML and MDS Patients. Blood Advances, 3(12), 1837–1847.
Jha, C., Li, Y. and Guha, S. (2017). Semiparametric Bayesian Analysis of High-Dimensional Censored Outcome Data. Statistical Theory and Related Fields, 1, 194–204.
Guha, S. and Baladandayuthapani, V. (2016). A Nonparametric Bayesian Technique for High-Dimensional Regression. Electronic Journal of Statistics, 10, 3374–3424.
Guha, S., Banerjee, S., Gu, C. and Baladandayuthapani, V. (2015). Non-parametric Variable Selection, Clustering and Prediction for Large Biological Datasets. Nonparametric Bayesian Inference in Biostatistics (eds. Mitra, R.and Muller, P.), Springer International Publishing.
Cui, S., Guha, S., Ferreira, M. A. R. and Tegge, A. N. (2015). A Hidden Markov Model for Detecting Differentially Expressed Genes from RNA-Seq Data. Annals of Applied Statistics, 9, 901–925.
Guha, S., Ji, Y., and Baladandayuthapani, V. (2014). Bayesian Disease Classification using Copy Number Data. Cancer Informatics, 13(S2), 83–91.
Guha, S. (2011). Discussion of Sampling schemes for generalized linear Dirichlet process random effects models by Kyung, Gill and Casella. Statistical Meth- ods & Applications, 20, 291–293.
MacEachern, S. N. and Guha, S. (2010). Parametric and Semiparametric Hypotheses in the Linear Model. The Canadian Journal of Statistics, 39, 165–180.
Guha, S. (2010). Posterior Simulation in Countable Mixture Models for Large Datasets. Journal of the American Statistical Association, 105, 775–786.
Guha, S. (2010). Bayesian Hidden Markov Modeling of Array CGH Data. Bayesian Modeling in Bioinformatics (eds. Dey, D. K., Ghosh, S. and Mallick, B.), Chapman & Hall/CRC.
Guha, S., Ryan, L. and Morara, M. (2009). Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates. Journal of Computational and Graphical Statistics, 18, 818–837.
Guha, S. (2008). Posterior Simulation in the Generalized Linear Mixed Model with Semiparametric Random Effects. Journal of Computational and Graphical Statistics, 17, 410–425.
Guha, S., Li, Y. and Neuberg, D. (2008). Bayesian Hidden Markov Modeling of Array CGH Data. Journal of the American Statistical Association, 103, 485– 497.
Guha, S. and MacEachern, S. N. (2006). Generalized Post-stratification and Importance Sampling for Subsampled Markov Chain Monte Carlo Estimation. Journal of the American Statistical Association, 101, 1175–1184.
Li, Y., Tiwari R., and Guha, S. (2006). Mixture Cure Survival Models with Dependent Censoring. Journal of the Royal Statistical Society - Series B, 69, 285–306.
Burden, S., Guha, S., Morgan, G., Ryan, L. Sparks, G. and Young, L. (2005). Spatio-temporal Analysis of Ischemic Heart Disease in NSW, Australia. Environmental and Ecological Statistics, 12, 427–448.
Guha, S., MacEachern, S. N. and Peruggia, M. (2004). Benchmark Estimation for Markov Chain Monte Carlo Samples. Journal of Computational and Graphical Statistics, 13, 683–701.
MacEachern, S. N., Peruggia, M. and Guha, S. (2003). Discussion of A theory of statistical models for Monte Carlo integration by Kong, McCullagh, Nicolae, Tan and Meng. Journal of the Royal Statistical Society - Series B, 65, 612.
National Institute of Arthritis and Musculoskeletal and Skin Diseases, Biomechanics Contributions to Symptoms and Joint Health in Individuals with Rotator Cuff Tears, R01AR084273, 2024 – 2028. Role: Co-I.
National Institute of Arthritis and Musculoskeletal and Skin Diseases, Nervous system influences on recovery from painful rotator cuff tears, R01AR080058, 2023 – 2028. Role: Co-I.
National Cancer Institute, Detecting Racial Disparities in Cancer Survival by Integrating Multiple High-Dimensional Observational Studies, R01CA269398, 2022 – 2026. Role: MPI.
National Cancer Institute, The Boston Lung Cancer Survivor Cohort, U01CA209414, 2023 – 2025. Role: Co-I.
Health Resources & Services Administration HIV/AIDS, Improving Sexually Transmitted Infection Screening and Treatment among People Living with or at Risk for HIV, U90HA32147, 2020 – 2023. Role: Co-I.
National Science Foundation, Collaborative Research: New Bayesian Nonparametric Paradigms of Personalized Medicine for Lung Cancer, DMS-1854003, 2015 – 2020. Role: PI.
National Science Foundation, Bayesian Mixture Models: Unified Theoretical Frameworks and MCMC Methods, DMS-0906734, 2009 – 2013. Role: PI.
Department of Health and Human Services, Statistical Informatics for Cancer Research, 2009 – 2013. Role: Co-I.
R package to accompany the paper “WMAP: Causal Meta-Analysis by Integrating Multiple Observational Studies” by Guha, Xu, Priyam, and Li (2025, submitted)
R package to accompany the paper “Causal Meta-Analysis by Integrating Multiple Observational Studies with Multivariate Outcomes.” by Guha and Li (2024, Biometrics)
R code to accompany the paper “Bayesian Pairwise Comparison of High-Dimensional Images” by Guha and Qiu (2024)
R code for implementing the integrative covariate-balancing weighting strategy developed in Guha and Li (Biometrics; 2024)
R code for implementing the B-MSC method of Guha and Li (2024)
R code for implementing the BayesDiff method of Gu, Baladandayuthapani, and Guha (2023)
R code for generating simulated data and fitting the Bayesian Connectomics (BaCon) model class of Guha, Jung and Dunson (2022)
Code for fitting microbiome datasets using the Bayesian SVD-type decomposition technique of Guha and Datta (2021)
Code for implementing the Bayesian technique of Guha and Ghosh (2020) for inferring unknown conic sections on the basis of noisy data
R package implementing VariScan, a nonparametric Bayesian technique for clustering, variable selection, and prediction in high-throughput regression settings. The methodology is developed in Guha and Baladandayuthapani (EJS, 2016).
R package hmmSeq implementing the Bayesian technique for analyzing RNA-Seq data proposed in Cui, Guha, Ferreira and Tegge (AOAS, 2015). Joint work with the paper’s co-authors.
R package glmmGS for fitting generalized linear mixed models to massive datasets. Publicly available from CRAN. Co-developed with Michele Morara, Louise Ryan, and Christopher Paciorek.
The Bayesian hidden Markov model strategy proposed by Guha, Li and Neuberg (JASA, 2008) for array CGH genomics data is implemented in Bioinformatics Toolbox 3.2. It is available at http://www.mathworks.com
Created software for generating the sample paths of a number of common stochastic processes, verifying their theoretical properties by simulation and visualizing abstract results like the martingale central limit theorem. Used in the Department of Biostatistics, Harvard School of Public Health to teach the graduate level courses, Analysis of Failure Time Data and Probability Theory and Applications I.