Statistics Seminar

Department of Mathematics, University of Houston

Spring 2024 Schedule

I will be organizing the Statistics Seminar in Spring 2024. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).


*This is a special session of Statistics Seminar as part of the Mathematics Colloquium of Spring 2024.

Speaker: Dr. Katherine B. Ensor, Department of Statistics, Rice University

Location: PGH 646A

Title: Hierarchical Modeling for Spatial-Temporal Extremes

Abstract: Methodologies for time-varying spatial-temporal extremes play an important practical role in urban planning and risk management. We put forward a hierarchical spatial-temporal peak-over-threshold modeling framework for studying rainfall history for large geographical regions. Modeling in space uses the extended Hausdorff distance to account for the irregularly shaped and sized regions. The objective is to obtain the distribution for the 25, 100 and 500 year return levels within each hydrologic region. Working with hydrologists, we can obtain improved flood maps for a region based on these return level estimates. Our methodologies are applied to study the greater Houston area, using the large library of spatially referenced data on the Kinder Urban Data Platform (kinderudp.org). This research supported the greater Houston area’s recovery from Hurricane Harvey and long-term planning as the region learns to live with water.


Speaker: Dr. Anirban Bhattacharya, Department of Statistics, Texas A&M University

Location: PGH 646A

Title: Bayesian Semi-supervised Inference via a Debiased Modeling Approach

Abstract: Inference in semi-supervised (SS) settings has received a great amount of attention in recent years due to increased relevance in modern big-data problems. In a typical SS setting, there is a much larger sized unlabeled data containing only observations for predictors, in addition to a moderately sized labeled data involving observations for both an outcome and a set of predictors. Such data arises naturally from settings where the outcome, unlike the predictors, is costly to obtain. One of the primary statistical objectives in SS settings is to explore whether parameter estimation can be improved by exploiting the unlabeled data. In this talk, we discuss a novel Bayesian approach to SS inference for the population mean estimation problem. The proposed approach provides improved and optimal estimators both in terms of estimation efficiency as well as inference. The central idea behind our method is to model certain summary statistics of the data rather than specifying a probability model for the entire raw data itself. We establish concrete theoretical results validating all our claims and further supporting them through extensive numerical studies. 

Fall 2023 Schedule

I will be organizing the Statistics Seminar in Fall 2023. Detailed schedules will be posted below. The seminar format will be in-person with remote option on zoom (links are available upon request).


Speaker: Dr. Quan Zhou, Department of Statistics, Texas A&M University

Location: PGH 646A

Title: Importance Tempering of Markov Chain Monte Carlo Schemes

Abstract: Informed importance tempering (IIT) is an easy-to-implement MCMC algorithm that can be seen as an extension of the familiar Metropolis-Hastings algorithm with the special feature that informed proposals are always accepted, and which was shown in Zhou and Smith (2022) to converge much more quickly in some common circumstances. This work develops a new, comprehensive guide to the use of IIT in many situations. First, we propose two IIT schemes that run faster than existing informed MCMC methods on discrete spaces by not requiring the posterior evaluation of all neighboring states. Second, we integrate IIT with other MCMC techniques, including simulated tempering, pseudo-marginal and multiple-try methods (on general state spaces), which have been conventionally implemented as Metropolis-Hastings schemes and can suffer from low acceptance rates. The use of IIT allows us to always accept proposals and brings about new opportunities for optimizing the sampler which are not possible under the Metropolis-Hastings framework. Numerical examples illustrating our findings are provided for each proposed algorithm, and a general theory on the complexity of IIT methods is developed. Joint work with G. Li and A. Smith. 


Speaker: Dr. Debdeep Pati, Department of Statistics, Texas A&M University

Location: PGH 646A

Title: Reconciling Computational Barriers & Statistical Guarantees in Variational Inference

Abstract: As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming increasingly popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is still largely lacking.  Furthermore,  there is no systematic study of the statistical properties of the algorithmic solution.  In this talk, we show how the choice of variational family is critical to good statistical performance of the algorithmic solution through a number of case studies.  We will discuss some recent advances towards studying convergence of the popular coordinate ascent variational inference algorithm. We will present both positive and negative results, which serve as a useful cautionary note to the applied practitioners of variational inference. 


Speaker: Dr. Xia "Ben" Hu, Department of Computer Science, Rice University

Location: PGH 646A

Title: ChatGPT in Action: An Experimental Investigation of Its Effectiveness in NLP Tasks

Abstract: The recent progress in large language models has resulted in highly effective models like OpenAI's ChatGPT that have demonstrated exceptional performance in various tasks, including question answering, essay writing, and code generation. This presentation will cover the evolution of LLMs from BERT to ChatGPT and showcase their use cases. Although LLMs are useful for many NLP tasks, one significant concern is the inadvertent disclosure of sensitive information, especially in the healthcare industry, where patient privacy is crucial. To address this concern, we developed a novel framework that generates high-quality synthetic data using ChatGPT and fine-tunes a local offline model for downstream tasks. The use of synthetic data improved the performance of downstream tasks, reduced the time and resources required for data collection and labeling, and addressed privacy concerns. Finally, we will discuss the regulation of LLMs, which has raised concerns about cheating in education. We will introduce our recent survey on LLM-generated text detection and discuss the opportunities and challenges it presents.