I am organizing the Statistics Seminar in Fall 2025. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).
Time and Date: 3:00 PM - 4:00 PM, Wednesday, October 22, 2025
*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Fall 2025.
Speaker: Will Kleiber, Department of Statistics, University of Colorado Boulder
Location: PGH 646A
Title: TBA
Abstract: TBA
Time and Date: 3:00 PM - 4:00 PM, Wednesday, November 19, 2025
*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Fall 2025.
Speaker: Huixia Judy Wang, Department of Statistics, Rice University
Location: PGH 646A
Title: TBA
Abstract: TBA
I am organizing the Statistics Seminar in Spring 2025. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).
Time and Date: 11:00 AM - 12:00 PM, Monday, February 3, 2025
Speaker: Dr. Serge Guillas, Department of Statistical Science, University College London
Location: PGH 648
Title: Linked and Deep Gaussian Process emulation of simulators, with application to convection.
Abstract: We first introduce Gaussian Process (GP) emulation of computer models. These are surrogates of simulators that efficiently mimic the input-output relationship of such complex numerical models, only sampling a small set of runs. GPs crucially model uncertainties. We then present a new type of emulator of any feed forward multi-physics system, by linking GP emulators of individual simulators, with large gains over the composite emulator of the whole system. The Deep Gaussian Process (DGP) is then presented as a surrogate that shares the structure of the linked emulator but enables the emulation of highly non-linear simulators without the knowledge of individual sub-processes. We then examine sharp changes in the outputs a computer simulator. These often indicate bifurcations or critical transitions within the investigated system, e.g. laminar v. turbulent behavior in fluid dynamics. An efficient approach that localizes these changes using DGPs with a minimal number of evaluations is introduced. We demonstrate the efficacy and efficiency of the proposed framework on the Rayleigh–Bénard convection.
Time and Date: 3:00 PM - 4:00 PM, Wednesday, February 19, 2025
*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Spring 2025.
Speaker: Dr. Marina Vannucci, Department of Statistics, Rice University
Location: PGH 646A
Title: Varying-coefficients Bayesian models for inference of networks and covariate effects
Abstract: New methods for the simultaneous inference of graphical models and covariates effects in the Bayesian framework will be discussed. I will consider regression settings where the interest is in the estimation of sparse networks among a set of primary variables, and where covariates may impact the strength of edges. The proposed models utilize spike-and-slab priors to perform edge selection, and Gaussian process priors to allow for flexibility in the covariate effects. Efficient and scalable algorithms for posterior inference will be employed for the estimation of the models. Simulation studies will demonstrate how the proposed models improve on the accuracy of existing methods, in both network recovery and covariate selection. I will show applications of the proposed models to neuroimaging and genomic datasets.
Time and Date: 11:00 AM - 12:00 PM, Monday, March 3, 2025
Speaker: Dr. Christine B. Peterson, Department of Biostatistics, The University of Texas MD Anderson Cancer Center
Location: PGH 648
Title: Flexible feature aggregation for microbiome analysis
Abstract: Microbiome data sets, which capture the abundances of bacteria and other microorganisms in the human body, represent a key source of “big data” in understanding human health. In this talk, I will introduce the structure of microbiome data and unique challenges in the analysis of this high-dimensional data type. I will highlight some recent approaches we have developed for predictive modeling from microbiome data where we allow for flexible aggregation of features with shared effects. I will first discuss a factor model for integration of microbiome data with other high-dimensional data types, where we assume a known tree structure among the microbiome features. I will then discuss a Bayesian approach for feature selection with data-adaptive clustering. I will illustrate the proposed methods with applications to data sets on the role of the microbiome in colorectal cancer and insulin resistance.
Time and Date: 11:00 AM - 12:00 PM, Monday, March 24, 2025
Speaker: Dr. Peng Zhao, Department of Applied Economics and Statistics, University of Delaware
Location: PGH 648
Title: Robust high-dimensional covariate-assisted network modeling
Abstract: Modern network data analysis often involves analyzing network structures alongside covariate features to gain deeper insights into underlying patterns. However, traditional covariate-assisted statistical network models may not fully consider the cases with high-dimensional covariates, where some covariates could be uninformative or misleading, and the possible mismatch between network and covariate information. To address this issue, we introduce a novel robust high-dimensional covariateassisted latent space model. This framework links latent vectors representing network structures with simultaneously sparse and low-rank transformations of the highdimensional covariates, capturing their mutual dependence. To robustly integrate this dependence, we use a shrinkage prior on the discrepancy between latent network vectors and low-rank covariate approximation vectors, allowing the possibility of mismatching information from covariates for some nodes in the network. To achieve computation efficiency, we develop a mean-field variational inference algorithm to approximate the posterior distribution. We establish the posterior concentration rate within a suitable parameter space and demonstrate how the proposed model facilitates adaptive information aggregation between networks and high-dimensional covariates. Extensive simulation studies and real-world data analyses confirm the effectiveness of our approach.
I am organizing the Statistics Seminar in Fall 2024. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).
Time and Date: 2:00 PM - 3:00 PM, Monday, September 30, 2024
Speaker: Dr. Xi Lu, Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston
Location: PGH 646A
Title: Robust Bayesian Methods to Sparse High-Dimensional Regression
Abstract: In high-dimensional regression problems, the demand for robust variable selection arises due to the commonly observed outliers and heavy-tailed distributions of the response variable, as well as model misspecifications when structured sparsity is ignored. The elastic net enjoys wide popularity in genomics studies as it can accommodate the strong correlations among omics features. Therefore, the robustified elastic net in both the frequentist and Bayesian frameworks have received much attention in recent years for the robust identification of important omics features. In this talk, I will present the Bayesian quantile elastic net with spike-and-slab priors that overcomes the major limitations of the existing family of elastic net methods. Specifically, we have developed a fully Bayesian method that builds on the robust likelihood function to safeguard against heterogeneity of complex diseases while accounting for the structured sparsity. Incorporation of the spike-and-slab priors in the Bayesian hierarchical model has significantly improved accuracy in shrinkage estimation and variable selection by inducing exact sparsity through posterior estimates generated from the Metropolis-within Gibbs sampling. The advantages of the proposed method have been demonstrated through the simulation study of data with independent and identically distributed random errors as well as heterogeneous random errors over multiple versions of elastic net regularization methods and other alternatives. The analysis of SNP data with strong LDs from the Nurse Health Study (NHS) has also revealed the superiority of the proposed method. All methods under comparison have been implemented in package Bayenet available on CRAN.
Time and Date: 2:00 PM - 3:00 PM, Monday, October 21, 2024
Speaker: Dr. Winston Liaw, M.D., M.P.H., Department of Health Systems and Population Health Sciences, Tilman J. Fertitta Family College of Medicine, University of Houston
Location: PGH 646A
Title: Reclaiming Relationships in Medicine: What is the Role of Artificial Intelligence?
Abstract: As healthcare evolves, the importance of maintaining strong patient-clinician relationships has never been more critical. However, the increasing reliance on technology, particularly electronic health records (EHRs), has disrupted these relationships in primary care, leading to a growing crisis of burnout and disconnection. In this talk, Dr. Liaw will explore how artificial intelligence (AI) offers both risks and opportunities for addressing this crisis. AI has the potential to enhance the delivery of care by helping clinicians predict patient outcomes, streamline workflows, and provide more personalized care. However, if implemented without careful consideration, it could exacerbate existing problems, increasing the depersonalization of care. This presentation will delve into the ways AI can serve as an "escape fire" for modern medicine, helping clinicians reclaim valuable face-to-face time with patients, while also addressing the ethical, practical, and relational challenges of integrating AI into healthcare. Collaboration between the fields of mathematics and medicine is essential in developing AI tools that support, rather than undermine, the patient-clinician relationship. This talk will also highlight opportunities for interdisciplinary collaboration between the math department and the medical community to improve health outcomes.
Time and Date: 2:00 PM - 3:00 PM, Monday, November 4, 2024
Speaker: Dr. Ying Lin, Department of Industrial Engineering, Cullen College of Engineering, University of Houston
Location: PGH 646A
Title: Smart and Secure Health Monitoring via Online Representation Learning
Abstract: Recent advances in information technologies have made health monitoring an efficient and cost-effective solution for the early detection and intervention of various diseases. However, the full potential of these technologies in large-scale populations remain unrealized, primarily due to several barriers: 1) the passive monitoring strategy imposes unnecessary burdens for both patients and healthcare providers, generating excessive data that must be transmitted, stored, and analyzed; 2) health progression is uncertain, heterogeneous, and interdependent among patients; 3) existing monitoring systems rely on centralized data collection and analysis, making them difficult to scale for large populations while raise concerns about data storage costs and patient privacy; 4) the increasing risk of adversarial attacks on healthcare systems poses significant security and vulnerability issues. To address these challenges, this talk will first introduce a smart health monitoring method that adaptively monitors patients with uncertain health progression. A novel online representation learning algorithm that integrates the latent trajectory modeling with the upper confidence bound (UCB)-based exploration strategy was proposed. Moreover, to scale the smart health monitoring to large populations and protect patients’ privacy, a federated online representation learning method that learns the latent trajectories from distributed and sequentially observed data will be presented. Lastly, to enhance the security and intelligence of smart health monitoring method, a robust online representation learning algorithm will be introduced. It theoretically studies the impact of adversarial attacks on online representation learning algorithm and proposes a novel mitigation strategy to reduce the impact. Efficiency of these proposed methods were demonstrated through theoretical analysis, simulation studies and empirical studies of smart cognitive monitoring in Alzheimer’s disease.
Time and Date: 2:00 PM - 3:00 PM, Monday, November 18, 2024
Speaker: Dr. Lulu Shang, Department of Biostatistics, University of Texas MD Anderson Cancer Center
Location: PGH 648
Title: Statistical and Computational Methods in Spatial Transcriptomics
Abstract: Spatial transcriptomics is a collection of genomic technologies that enable transcriptomic profiling of tissues with spatial localization information. Analyzing spatial transcriptomic data is computationally challenging because the data collected from various spatial transcriptomic technologies are often noisy and exhibit substantial spatial correlation across tissue locations. In the first part of the talk, I will present SpatialPCA, a spatially aware dimension reduction method that extracts a low-dimensional representation of the spatial transcriptomics data with biological signal and preserved spatial correlation structure. This method unlocks many computational tools previously developed for single-cell RNAseq studies, allowing for tailored and novel analyses of spatial transcriptomics. Another essential task in spatial transcriptomics involves identifying genes with spatial expression patterns, known as spatially variable genes (SVGs). In the second part of my talk, I will present Celina, a statistical method that detects SVGs displaying diverse spatial expression patterns within a specific cell type. These cell type-specific SVGs (ct-SVGs) represent crucial transcriptomic signatures underlying cellular heterogeneity and provide insights uniquely accessible through spatial transcriptomics. Taking together, these methods open doors for novel biologically informed downstream analyses, unveiling functional cellular heterogeneity at an unprecedented scale.
I will be organizing the Statistics Seminar in Spring 2024. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).
Time and Date: 3:00 PM - 4:00 PM, Wednesday, January 24, 2024
*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Spring 2024.
Speaker: Dr. Katherine B. Ensor, Department of Statistics, Rice University
Location: PGH 646A
Title: Hierarchical Modeling for Spatial-Temporal Extremes
Abstract: Methodologies for time-varying spatial-temporal extremes play an important practical role in urban planning and risk management. We put forward a hierarchical spatial-temporal peak-over-threshold modeling framework for studying rainfall history for large geographical regions. Modeling in space uses the extended Hausdorff distance to account for the irregularly shaped and sized regions. The objective is to obtain the distribution for the 25, 100 and 500 year return levels within each hydrologic region. Working with hydrologists, we can obtain improved flood maps for a region based on these return level estimates. Our methodologies are applied to study the greater Houston area, using the large library of spatially referenced data on the Kinder Urban Data Platform (kinderudp.org). This research supported the greater Houston area’s recovery from Hurricane Harvey and long-term planning as the region learns to live with water.
Time and Date: 11:30 AM - 12:30 PM, Monday, March 4, 2024
Speaker: Dr. Anirban Bhattacharya, Department of Statistics, Texas A&M University
Location: PGH 646A
Title: Bayesian Semi-supervised Inference via a Debiased Modeling Approach
Abstract: Inference in semi-supervised (SS) settings has received a great amount of attention in recent years due to increased relevance in modern big-data problems. In a typical SS setting, there is a much larger sized unlabeled data containing only observations for predictors, in addition to a moderately sized labeled data involving observations for both an outcome and a set of predictors. Such data arises naturally from settings where the outcome, unlike the predictors, is costly to obtain. One of the primary statistical objectives in SS settings is to explore whether parameter estimation can be improved by exploiting the unlabeled data. In this talk, we discuss a novel Bayesian approach to SS inference for the population mean estimation problem. The proposed approach provides improved and optimal estimators both in terms of estimation efficiency as well as inference. The central idea behind our method is to model certain summary statistics of the data rather than specifying a probability model for the entire raw data itself. We establish concrete theoretical results validating all our claims and further supporting them through extensive numerical studies.
I will be organizing the Statistics Seminar in Fall 2023. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).
Time and Date: 12:00 PM - 1:00 PM, Monday, October 2, 2023
Speaker: Dr. Quan Zhou, Department of Statistics, Texas A&M University
Location: PGH 646A
Title: Importance Tempering of Markov Chain Monte Carlo Schemes
Abstract: Informed importance tempering (IIT) is an easy-to-implement MCMC algorithm that can be seen as an extension of the familiar Metropolis-Hastings algorithm with the special feature that informed proposals are always accepted, and which was shown in Zhou and Smith (2022) to converge much more quickly in some common circumstances. This work develops a new, comprehensive guide to the use of IIT in many situations. First, we propose two IIT schemes that run faster than existing informed MCMC methods on discrete spaces by not requiring the posterior evaluation of all neighboring states. Second, we integrate IIT with other MCMC techniques, including simulated tempering, pseudo-marginal and multiple-try methods (on general state spaces), which have been conventionally implemented as Metropolis-Hastings schemes and can suffer from low acceptance rates. The use of IIT allows us to always accept proposals and brings about new opportunities for optimizing the sampler which are not possible under the Metropolis-Hastings framework. Numerical examples illustrating our findings are provided for each proposed algorithm, and a general theory on the complexity of IIT methods is developed. Joint work with G. Li and A. Smith.
Time and Date: 12:00 PM - 1:00 PM, Monday, October 9, 2023
Speaker: Dr. Debdeep Pati, Department of Statistics, Texas A&M University
Location: PGH 646A
Title: Reconciling Computational Barriers & Statistical Guarantees in Variational Inference
Abstract: As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming increasingly popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is still largely lacking. Furthermore, there is no systematic study of the statistical properties of the algorithmic solution. In this talk, we show how the choice of variational family is critical to good statistical performance of the algorithmic solution through a number of case studies. We will discuss some recent advances towards studying convergence of the popular coordinate ascent variational inference algorithm. We will present both positive and negative results, which serve as a useful cautionary note to the applied practitioners of variational inference.
Time and Date: 12:00 PM - 1:00 PM, Monday, November 6, 2023
Speaker: Dr. Xia "Ben" Hu, Department of Computer Science, Rice University
Location: PGH 646A
Title: ChatGPT in Action: An Experimental Investigation of Its Effectiveness in NLP Tasks
Abstract: The recent progress in large language models has resulted in highly effective models like OpenAI's ChatGPT that have demonstrated exceptional performance in various tasks, including question answering, essay writing, and code generation. This presentation will cover the evolution of LLMs from BERT to ChatGPT and showcase their use cases. Although LLMs are useful for many NLP tasks, one significant concern is the inadvertent disclosure of sensitive information, especially in the healthcare industry, where patient privacy is crucial. To address this concern, we developed a novel framework that generates high-quality synthetic data using ChatGPT and fine-tunes a local offline model for downstream tasks. The use of synthetic data improved the performance of downstream tasks, reduced the time and resources required for data collection and labeling, and addressed privacy concerns. Finally, we will discuss the regulation of LLMs, which has raised concerns about cheating in education. We will introduce our recent survey on LLM-generated text detection and discuss the opportunities and challenges it presents.