4/27/2018

Post date: May 1, 2018 2:42:27 PM

Title: Robust Clustering Methods with Subpopulation-specific Deviations

Speaker: Briana Stephenson, Department of Biostatistics, University of North Carolina, Chapel Hill

In a large, heterogeneous population, traditional clustering methods can produce a large number of clusters due to a variety of factors, including study size and regional diversity. These factors result in a loss of interpretability of patterns that may differ due to minor pattern changes. We address these data complexities with the introduction of a new method known as Robust Profile Clustering (RPC). Built from a local partition process framework, participants are able to cluster at two levels: (1) globally, with participants assigned to overall population-level clusters via an over-fitted mixture model, and (2) locally, in which regional variations are accommodated via a beta-Bernoulli process dependent on subpopulation differences. These clusters can then be linked with a probit response to generate a joint predictive clustering model known as Supervised Robust Profile Clustering to help cluster global and local profiles according to the outcome of interest. Using data obtained from the National Birth Defects Prevention Study and the Hispanic Community Health Study/Study of Latinos, we discuss the application, impact and utility of these methods, as well as other recent machine learning techniques to improve dietary pattern analysis in a largely diverse population, such as the United States.