Statistics Seminar

Department of Mathematics, University of Houston

Fall 2025 Schedule

I am organizing the Statistics Seminar in Fall 2025. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).

Time and Date: 3:00 PM - 4:00 PM, Wednesday, October 22, 2025

*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Fall 2025.

Speaker: Will Kleiber, Department of Statistics, University of Colorado Boulder

Location: Online

Title: Estimation and prediction methods for high-frequency temporal and space-time processes

Abstract: This talk will cover two major obstacles when considering high-frequency temporal and space-time data.

As the power grid moves to a more renewable future, energy sources from weather-driven phenomena such as solar power will form an increasingly large portion of electricity generation. The variability, non-Gaussianity and intermittency of solar resources challenge current grid operation paradigms, and realistic data scenarios are required for grid planning and operational studies. However, such data are not available at the space-time resolution needed for realistic grid models. Given sparse spatial samples that are high-resolution in time, we introduce a framework for spatiotemporal prediction in a functional data analysis framework when data exhibit nonstationary phase misalignment. The approach is illustrated on a challenging irradiance dataset and compares favorably against existing methods.

In the second half of the talk, we consider high-frequency cryptocurrency data. L\'evy processes are widely used in financial modeling due to their ability to capture discontinuities and heavy tails, which are common in high-frequency asset return data. However, parameter estimation remains a challenge when associated likelihoods are unavailable or costly to compute, a problem that is exacerbated for high-frequency return data. We propose a fast and accurate method for L\'evy parameter estimation using the neural Bayes estimation (NBE) framework -- a simulation-based, likelihood-free approach that leverages permutation-invariant neural networks to approximate Bayes estimators. Through simulation studies we illustrate that NBE outperforms traditional methods in both accuracy and runtime. We illustrate our approach on multiple cryptocurrency return datasets, where the method captures evolving parameter dynamics and delivers reliable and interpretable inference at a fraction of the computational cost of traditional methods.

Time and Date: 3:00 PM - 4:00 PM, Wednesday, October 29, 2025

*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Fall 2025.

Speaker: Huixia Judy Wang, Department of Statistics, Rice University

Location: PGH 646A

Title: Model-Free Approaches to Constructing Prediction Regions under Target Shift

Abstract: In many real-world applications, obtaining labeled data is a significant challenge due to high costs and technical limitations. This scarcity of labeled outcomes presents a major obstacle for traditional statistical inference. To address this, we introduce a model-free approach for constructing prediction regions for new target outcomes. Our method leverages a labeled source distribution, which is different from the target but related through a distributional shift, to overcome the lack of target labels. When target data are fully unlabeled, our predictions rely entirely on the rich source data; when some labels are available, we seamlessly integrate them to boost efficiency. A key innovation in this new approach lies in how we handle the complexities of different data distributions. We tackle non-exchangeability and non-identifiability by estimating the likelihood ratio through a novel technique: matching the covariate distributions of the source and target domains using a B-spline basis. This powerful approach allows us to accommodate complex error structures, including asymmetry and multimodality. To this end, we construct the highest predictive density sets using a new weight-adjusted conditional density estimator. This estimator models the source conditional density and then transforms it through a weighting scheme to accurately approximate the target conditional density. We will discuss the theoretical guarantees of our method and demonstrate its strong performance. We validate our approach through comprehensive simulation studies and a compelling real-world application using the MIMIC-III clinical database. This is a joint work with Menghan Yi and Yanlin Tang.

Spring 2025 Schedule

I am organizing the Statistics Seminar in Spring 2025. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).

Time and Date: 11:00 AM - 12:00 PM, Monday, February 3, 2025

Speaker: Dr. Serge Guillas, Department of Statistical Science, University College London

Location: PGH 648

Title: Linked and Deep Gaussian Process emulation of simulators, with application to convection.

Abstract: We first introduce Gaussian Process (GP) emulation of computer models. These are surrogates of simulators that efficiently mimic the input-output relationship of such complex numerical models, only sampling a small set of runs. GPs crucially model uncertainties. We then present a new type of emulator of any feed forward multi-physics system, by linking GP emulators of individual simulators, with large gains over the composite emulator of the whole system. The Deep Gaussian Process (DGP) is then presented as a surrogate that shares the structure of the linked emulator but enables the emulation of highly non-linear simulators without the knowledge of individual sub-processes. We then examine sharp changes in the outputs a computer simulator. These often indicate bifurcations or critical transitions within the investigated system, e.g. laminar v. turbulent behavior in fluid dynamics. An efficient approach that localizes these changes using DGPs with a minimal number of evaluations is introduced. We demonstrate the efficacy and efficiency of the proposed framework on the Rayleigh–Bénard convection.

Time and Date: 3:00 PM - 4:00 PM, Wednesday, February 19, 2025

*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Spring 2025.

Speaker: Dr. Marina Vannucci, Department of Statistics, Rice University

Location: PGH 646A

Title: Varying-coefficients Bayesian models for inference of networks and covariate effects

Abstract: New methods for the simultaneous inference of graphical models and covariates effects in the Bayesian framework will be discussed. I will consider regression settings where the interest is in the estimation of sparse networks among a set of primary variables, and where covariates may impact the strength of edges. The proposed models utilize spike-and-slab priors to perform edge selection, and Gaussian process priors to allow for flexibility in the covariate effects. Efficient and scalable algorithms for posterior inference will be employed for the estimation of the models. Simulation studies will demonstrate how the proposed models improve on the accuracy of existing methods, in both network recovery and covariate selection. I will show applications of the proposed models to neuroimaging and genomic datasets.

Time and Date: 11:00 AM - 12:00 PM, Monday, March 3, 2025

Speaker: Dr. Christine B. Peterson, Department of Biostatistics, The University of Texas MD Anderson Cancer Center

Location: PGH 648

Title: Flexible feature aggregation for microbiome analysis

Abstract: Microbiome data sets, which capture the abundances of bacteria and other microorganisms in the human body, represent a key source of “big data” in understanding human health. In this talk, I will introduce the structure of microbiome data and unique challenges in the analysis of this high-dimensional data type. I will highlight some recent approaches we have developed for predictive modeling from microbiome data where we allow for flexible aggregation of features with shared effects. I will first discuss a factor model for integration of microbiome data with other high-dimensional data types, where we assume a known tree structure among the microbiome features. I will then discuss a Bayesian approach for feature selection with data-adaptive clustering. I will illustrate the proposed methods with applications to data sets on the role of the microbiome in colorectal cancer and insulin resistance.

Time and Date: 11:00 AM - 12:00 PM, Monday, March 24, 2025

Speaker: Dr. Peng Zhao, Department of Applied Economics and Statistics, University of Delaware

Location: PGH 648

Title: Robust high-dimensional covariate-assisted network modeling

Abstract: Modern network data analysis often involves analyzing network structures alongside covariate features to gain deeper insights into underlying patterns. However, traditional covariate-assisted statistical network models may not fully consider the cases with high-dimensional covariates, where some covariates could be uninformative or misleading, and the possible mismatch between network and covariate information. To address this issue, we introduce a novel robust high-dimensional covariateassisted latent space model. This framework links latent vectors representing network structures with simultaneously sparse and low-rank transformations of the highdimensional covariates, capturing their mutual dependence. To robustly integrate this dependence, we use a shrinkage prior on the discrepancy between latent network vectors and low-rank covariate approximation vectors, allowing the possibility of mismatching information from covariates for some nodes in the network. To achieve computation efficiency, we develop a mean-field variational inference algorithm to approximate the posterior distribution. We establish the posterior concentration rate within a suitable parameter space and demonstrate how the proposed model facilitates adaptive information aggregation between networks and high-dimensional covariates. Extensive simulation studies and real-world data analyses confirm the effectiveness of our approach.

Fall 2024 Schedule

I am organizing the Statistics Seminar in Fall 2024. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).

Time and Date: 2:00 PM - 3:00 PM, Monday, September 30, 2024

Speaker: Dr. Xi Lu, Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston

Location: PGH 646A

Title: Robust Bayesian Methods to Sparse High-Dimensional Regression

Abstract: In high-dimensional regression problems, the demand for robust variable selection arises due to the commonly observed outliers and heavy-tailed distributions of the response variable, as well as model misspecifications when structured sparsity is ignored. The elastic net enjoys wide popularity in genomics studies as it can accommodate the strong correlations among omics features. Therefore, the robustified elastic net in both the frequentist and Bayesian frameworks have received much attention in recent years for the robust identification of important omics features. In this talk, I will present the Bayesian quantile elastic net with spike-and-slab priors that overcomes the major limitations of the existing family of elastic net methods. Specifically, we have developed a fully Bayesian method that builds on the robust likelihood function to safeguard against heterogeneity of complex diseases while accounting for the structured sparsity. Incorporation of the spike-and-slab priors in the Bayesian hierarchical model has significantly improved accuracy in shrinkage estimation and variable selection by inducing exact sparsity through posterior estimates generated from the Metropolis-within Gibbs sampling. The advantages of the proposed method have been demonstrated through the simulation study of data with independent and identically distributed random errors as well as heterogeneous random errors over multiple versions of elastic net regularization methods and other alternatives. The analysis of SNP data with strong LDs from the Nurse Health Study (NHS) has also revealed the superiority of the proposed method. All methods under comparison have been implemented in package Bayenet available on CRAN.

Time and Date: 2:00 PM - 3:00 PM, Monday, October 21, 2024

Speaker: Dr. Winston Liaw, M.D., M.P.H., Department of Health Systems and Population Health Sciences, Tilman J. Fertitta Family College of Medicine, University of Houston

Location: PGH 646A

Title: Reclaiming Relationships in Medicine: What is the Role of Artificial Intelligence?

Abstract: As healthcare evolves, the importance of maintaining strong patient-clinician relationships has never been more critical. However, the increasing reliance on technology, particularly electronic health records (EHRs), has disrupted these relationships in primary care, leading to a growing crisis of burnout and disconnection. In this talk, Dr. Liaw will explore how artificial intelligence (AI) offers both risks and opportunities for addressing this crisis. AI has the potential to enhance the delivery of care by helping clinicians predict patient outcomes, streamline workflows, and provide more personalized care. However, if implemented without careful consideration, it could exacerbate existing problems, increasing the depersonalization of care. This presentation will delve into the ways AI can serve as an "escape fire" for modern medicine, helping clinicians reclaim valuable face-to-face time with patients, while also addressing the ethical, practical, and relational challenges of integrating AI into healthcare. Collaboration between the fields of mathematics and medicine is essential in developing AI tools that support, rather than undermine, the patient-clinician relationship. This talk will also highlight opportunities for interdisciplinary collaboration between the math department and the medical community to improve health outcomes.

Time and Date: 2:00 PM - 3:00 PM, Monday, November 4, 2024

Speaker: Dr. Ying Lin, Department of Industrial Engineering, Cullen College of Engineering, University of Houston

Location: PGH 646A

Title: Smart and Secure Health Monitoring via Online Representation Learning

Abstract: Recent advances in information technologies have made health monitoring an efficient and cost-effective solution for the early detection and intervention of various diseases. However, the full potential of these technologies in large-scale populations remain unrealized, primarily due to several barriers: 1) the passive monitoring strategy imposes unnecessary burdens for both patients and healthcare providers, generating excessive data that must be transmitted, stored, and analyzed; 2) health progression is uncertain, heterogeneous, and interdependent among patients; 3) existing monitoring systems rely on centralized data collection and analysis, making them difficult to scale for large populations while raise concerns about data storage costs and patient privacy; 4) the increasing risk of adversarial attacks on healthcare systems poses significant security and vulnerability issues. To address these challenges, this talk will first introduce a smart health monitoring method that adaptively monitors patients with uncertain health progression. A novel online representation learning algorithm that integrates the latent trajectory modeling with the upper confidence bound (UCB)-based exploration strategy was proposed. Moreover, to scale the smart health monitoring to large populations and protect patients’ privacy, a federated online representation learning method that learns the latent trajectories from distributed and sequentially observed data will be presented. Lastly, to enhance the security and intelligence of smart health monitoring method, a robust online representation learning algorithm will be introduced. It theoretically studies the impact of adversarial attacks on online representation learning algorithm and proposes a novel mitigation strategy to reduce the impact. Efficiency of these proposed methods were demonstrated through theoretical analysis, simulation studies and empirical studies of smart cognitive monitoring in Alzheimer’s disease.

Time and Date: 2:00 PM - 3:00 PM, Monday, November 18, 2024

Speaker: Dr. Lulu Shang, Department of Biostatistics, University of Texas MD Anderson Cancer Center

Location: PGH 648

Title: Statistical and Computational Methods in Spatial Transcriptomics

Abstract: Spatial transcriptomics is a collection of genomic technologies that enable transcriptomic profiling of tissues with spatial localization information. Analyzing spatial transcriptomic data is computationally challenging because the data collected from various spatial transcriptomic technologies are often noisy and exhibit substantial spatial correlation across tissue locations. In the first part of the talk, I will present SpatialPCA, a spatially aware dimension reduction method that extracts a low-dimensional representation of the spatial transcriptomics data with biological signal and preserved spatial correlation structure. This method unlocks many computational tools previously developed for single-cell RNAseq studies, allowing for tailored and novel analyses of spatial transcriptomics. Another essential task in spatial transcriptomics involves identifying genes with spatial expression patterns, known as spatially variable genes (SVGs). In the second part of my talk, I will present Celina, a statistical method that detects SVGs displaying diverse spatial expression patterns within a specific cell type. These cell type-specific SVGs (ct-SVGs) represent crucial transcriptomic signatures underlying cellular heterogeneity and provide insights uniquely accessible through spatial transcriptomics. Taking together, these methods open doors for novel biologically informed downstream analyses, unveiling functional cellular heterogeneity at an unprecedented scale.

Spring 2024 Schedule

I will be organizing the Statistics Seminar in Spring 2024. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).

Time and Date: 3:00 PM - 4:00 PM, Wednesday, January 24, 2024

*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Spring 2024.

Speaker: Dr. Katherine B. Ensor, Department of Statistics, Rice University

Location: PGH 646A

Title: Hierarchical Modeling for Spatial-Temporal Extremes

Abstract: Methodologies for time-varying spatial-temporal extremes play an important practical role in urban planning and risk management. We put forward a hierarchical spatial-temporal peak-over-threshold modeling framework for studying rainfall history for large geographical regions. Modeling in space uses the extended Hausdorff distance to account for the irregularly shaped and sized regions. The objective is to obtain the distribution for the 25, 100 and 500 year return levels within each hydrologic region. Working with hydrologists, we can obtain improved flood maps for a region based on these return level estimates. Our methodologies are applied to study the greater Houston area, using the large library of spatially referenced data on the Kinder Urban Data Platform (kinderudp.org). This research supported the greater Houston area’s recovery from Hurricane Harvey and long-term planning as the region learns to live with water.

Time and Date: 11:30 AM - 12:30 PM, Monday, March 4, 2024

Speaker: Dr. Anirban Bhattacharya, Department of Statistics, Texas A&M University

Location: PGH 646A

Title: Bayesian Semi-supervised Inference via a Debiased Modeling Approach

Abstract: Inference in semi-supervised (SS) settings has received a great amount of attention in recent years due to increased relevance in modern big-data problems. In a typical SS setting, there is a much larger sized unlabeled data containing only observations for predictors, in addition to a moderately sized labeled data involving observations for both an outcome and a set of predictors. Such data arises naturally from settings where the outcome, unlike the predictors, is costly to obtain. One of the primary statistical objectives in SS settings is to explore whether parameter estimation can be improved by exploiting the unlabeled data. In this talk, we discuss a novel Bayesian approach to SS inference for the population mean estimation problem. The proposed approach provides improved and optimal estimators both in terms of estimation efficiency as well as inference. The central idea behind our method is to model certain summary statistics of the data rather than specifying a probability model for the entire raw data itself. We establish concrete theoretical results validating all our claims and further supporting them through extensive numerical studies.

Fall 2023 Schedule

I will be organizing the Statistics Seminar in Fall 2023. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).

Time and Date: 12:00 PM - 1:00 PM, Monday, October 2, 2023

Speaker: Dr. Quan Zhou, Department of Statistics, Texas A&M University

Location: PGH 646A

Title: Importance Tempering of Markov Chain Monte Carlo Schemes

Abstract: Informed importance tempering (IIT) is an easy-to-implement MCMC algorithm that can be seen as an extension of the familiar Metropolis-Hastings algorithm with the special feature that informed proposals are always accepted, and which was shown in Zhou and Smith (2022) to converge much more quickly in some common circumstances. This work develops a new, comprehensive guide to the use of IIT in many situations. First, we propose two IIT schemes that run faster than existing informed MCMC methods on discrete spaces by not requiring the posterior evaluation of all neighboring states. Second, we integrate IIT with other MCMC techniques, including simulated tempering, pseudo-marginal and multiple-try methods (on general state spaces), which have been conventionally implemented as Metropolis-Hastings schemes and can suffer from low acceptance rates. The use of IIT allows us to always accept proposals and brings about new opportunities for optimizing the sampler which are not possible under the Metropolis-Hastings framework. Numerical examples illustrating our findings are provided for each proposed algorithm, and a general theory on the complexity of IIT methods is developed. Joint work with G. Li and A. Smith.

Time and Date: 12:00 PM - 1:00 PM, Monday, October 9, 2023

Speaker: Dr. Debdeep Pati, Department of Statistics, Texas A&M University

Location: PGH 646A

Title: Reconciling Computational Barriers & Statistical Guarantees in Variational Inference

Abstract: As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming increasingly popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is still largely lacking. Furthermore, there is no systematic study of the statistical properties of the algorithmic solution. In this talk, we show how the choice of variational family is critical to good statistical performance of the algorithmic solution through a number of case studies. We will discuss some recent advances towards studying convergence of the popular coordinate ascent variational inference algorithm. We will present both positive and negative results, which serve as a useful cautionary note to the applied practitioners of variational inference.

Time and Date: 12:00 PM - 1:00 PM, Monday, November 6, 2023

Speaker: Dr. Xia "Ben" Hu, Department of Computer Science, Rice University

Location: PGH 646A

Title: ChatGPT in Action: An Experimental Investigation of Its Effectiveness in NLP Tasks

Abstract: The recent progress in large language models has resulted in highly effective models like OpenAI's ChatGPT that have demonstrated exceptional performance in various tasks, including question answering, essay writing, and code generation. This presentation will cover the evolution of LLMs from BERT to ChatGPT and showcase their use cases. Although LLMs are useful for many NLP tasks, one significant concern is the inadvertent disclosure of sensitive information, especially in the healthcare industry, where patient privacy is crucial. To address this concern, we developed a novel framework that generates high-quality synthetic data using ChatGPT and fine-tunes a local offline model for downstream tasks. The use of synthetic data improved the performance of downstream tasks, reduced the time and resources required for data collection and labeling, and addressed privacy concerns. Finally, we will discuss the regulation of LLMs, which has raised concerns about cheating in education. We will introduce our recent survey on LLM-generated text detection and discuss the opportunities and challenges it presents.

Page updated

Google Sites

Report abuse