Projects

Some of my recent projects with my students and collaborators focus on developing Bayesian and optimization methods for high dimensional statistics, machine learning, and complex spatial and network data analysis. These projects are motivated from real applications in urban planning and traffic statistics, oceanography, social sciences (e.g., election and human movement patterns), economics and business (e.g., income distribution and credit card transaction data), and biomedical studies. Below are several examples of recently completed projects. If you are interested in knowing more about my other ongoing research and potentially working with me as a graduate/undergraduate student, please don't hesitate to email me (huiyan at stat.tamu.edu).

Project 1: Multivariate Horseshoe for Structured Sparsity and Smoothness

Graphs have been commonly used to represent complex data structures. In models

dealing with graph-structured data, multivariate parameters may have structured sparsity and smoothness in the sense that both zero and non-zero parameters tend to cluster together. We propose a new prior for high-dimensional parameters with graphical relations, referred to as the Tree based Low-rank Horseshoe (T-LoHo) model, that generalizes the popular univariate Bayesian horseshoe shrinkage prior to the multivariate setting to detect structured sparsity and smoothness simultaneously. We apply it to regularize a Bayesian high-dimensional regression problem where the regression coefficients are linked by a graph, so that the resulting clusters have flexible shapes and satisfy the cluster contiguity constraint with respect to the graph. The results indicate substantial improvements over other competing methods such as the sparse fused lasso.

Lee, Luo and Sang (2021, NeurIPS)

Project 2: Bayesian Additive Spanning Trees (BAST)

Nonparametric regression on complex domains has been a challenging task as most existing methods, such as ensemble models based on binary decision trees (e.g., RF, XGBoost, BART), are not designed to account for intrinsic geometries and domain boundaries due to their axis-parallel split rules. We propose a Bayesian additive regression spanning trees (BAST) model for machine learning predictive tasks on manifolds, with an emphasis on complex constrained domains or irregularly shaped spaces embedded in Euclidean spaces. Our model is built upon a random spanning tree manifold partition model as each weak learner, which is capable of capturing any irregularly shaped spatially contiguous partitions while respecting intrinsic geometries and domain boundary constraints. Utilizing many nice properties of spanning tree structures, we design an efficient Bayesian inference algorithm. Equipped with a soft prediction scheme, BAST is demonstrated to significantly outperform other competing machine learning methods in simulation experiments and real data examples, due to its strong local adaptivity to different levels of smoothness.

Luo, Sang and Mallick (2021, NeurIPS)

Project 3: ALS Disease Spreading Pattern Analysis

Amyotrophic lateral sclerosis (ALS, also known as Lou Gehrig's disease) is a neurological disease that starts at a focal point and gradually spreads to other parts of the nervous system. One of the main clinical symptoms of ALS is muscle weakness. To study spreading patterns of muscle weakness, we analyze spatiotemporal binary muscle strength data, which indicates whether observed muscle strengths are impaired or healthy. We propose two regularized network models (Shin et al., 2019, Biometrics and Shin et al., 2021, Statistics in Medicine) to study the ALS spreading patterns over body locations and time.

Project 4: Human Mobility Patterns during COVID-19

The coronavirus (COVID-19) global pandemic has made a significant impact on people's social activities. Cell phone mobility data provide unique and rich information on studying this impact. We have developed a number of R shiny dashboards to monitor COVID-19 cases, vaccinations, and mobility patterns in Texas using DSHS and SafeGraph data (https://covid19-modeltrac.shinyapps.io/TX-BV-ModelTrac/, https://huiyansang.shinyapps.io/BCS_POI_dashboard_V2/). In particular, we consider the daily leaving-home index data at 2144 census block groups of Harris County in Texas to study the changes in mobility patterns and how they relate to public policy and social demographic variables. We found that ethnic, education and age compositions of census blocks have a noticeable impact on the spatial clustering patterns of people's mobility behaviors. This work will appear in Annals of Applied Statistics.