Example data set: NHANES 2003-2006

The rapid growth of wearable devices has made it possible to collect high-resolution, objectively measured physical activity data. In the National Health and Nutrition Examination Survey (NHANES) 2003-2006, approximately 14000 people were asked to wear a hip-worn accelerometer for seven consecutive days while maintaining their normal life activities. Tri-axial accelerations were collected and released as minute-level activity counts (AC). Here is a tutorial to obtain a ready-to-use NHANES data set.

The statistical analysis of this data set is challenging because (1) it contains many participants (large n) and many covariates (large p); (2) each participant has multiple days of data, which induces a multilevel structure; (3) for each day, physical activity data are high-dimensional; (4) the activity counts exhibit non-stationary behavior across time of the day.

Multilevel Functional Data Analysis

In the NHANES 2003-2006 study, physical activity data are obtained from each participant across seven consecutive days. The plot below shows the physical activity profiles of three NHANES study participants over available days. Each study participant is uniquely identified by the SEQN number. Within each column, each row displays the minute-level AC of one day from midnight to midnight, titled by day of the week from Sunday (top row) to Saturday (bottom row). Some days were excluded due to low data quality and therefore not shown.

The multilevel structure of the physical activity raises numerous questions, such as (1) given the multilevel structure of the data (multiple days of physical activity), what is the structure of the within- and betweenstudy participants variability of physical activity? (2) what is the association between the daily activity patterns and covariates (e.g., age, sex, day of the week)? 

We develop and advance multilevel Functional Data Analysis methods to answer these questions. If you don't know what "Functional Data Analysis" is, here is a brief overview

High-dimensional Predictors and Time-to-event Outcomes

An example research question is: what is the association between the high dimensional baseline objective measurements of physical activity and time to all-cause mortality? We combine Functional Data Analysis and Survival Analysis methods to answer these questions.

Below is a plot to illustrate the research question. Each row represents the data from one individual, including high-dimensional physical activity data, demographic data, and survival outcomes.