As smartphone sensor data becomes more common, development of statistical methods for this innovative mode of data collection will accelerate. New statistical methods require rigorous evaluation before being deemed acceptable for widespread use. A common tool for this evaluation is a simulation study, which involves generating fake data under known conditions (but still subject to stochastic noise). The new method is then applied to the data to test whether it can reliably identify the conditions.
Ten synthetic workdays generated via a simulation model
For traditional modes of data collection, simulation is relatively straightforward, but there are no currently available processes for simulating realistic trip/activity sequence data. These data exhibit complicated, cyclic, and often rigid temporal structures that traditional "state-change" models such has Markov processes cannot capture. An ideal model would accurately capture these structures, but also allow for the stochastic randomness necessary in a simulation study.
I am developing a framework for simulating realistic trip/activity data for use in evaluation of my own and other’s statistical methods for human activity sequence data. The figure below shows a sample of 10 synthetic workdays simulated from the same model with identical parameters. The basic temporal structure of the work day is preserved, but the model allows for randomness in start times, end times, travel modes, and selection of non-essential activities.
Batteries die. People forget their devices at home. Or they accidentally leave their phones in the bathroom stall after secretly perusing social media during a bathroom work break. Periods of missingness due to user error are an inevitability when analyzing wearable sensor data. Imputation is a common statistical tool where various methods are used to replace missing data with substituted values in a principled manner in order to apply statistical methods to a complete data set while preserving validity of results.
Imputation of human activity data suffers from many of the same challenges as simulating synthetic activity data, but with an additional complication. Imputed activity data for an individual should ideally be more "similar" to his or her own data than to data from other individuals. By embedding iMEM individualized inference into the workday simulation model, I developed a method for imputing fully synthetic workdays based on a users real data and borrowing information from other users. The figure shows four real (top panel) and five imputed (bottom panel) workdays from one individual.
Real and imputed workdays generated from the iMEM imputation model