Another unique feature of the human activity data is that it is sequential and temporally indexed, allowing for understanding of activity patterns over days, weeks, and potentially even months. Characterizing human activity patterns in various ways - for example, quantifying differences between two days of activity or clustering individuals with similar patterns - has great potential utility in many areas. In this vein, I have borrowed and augmented sequence alignment tools from genomics. Activity patterns are represented as character sequences, and their resulting pairwise sequence distances can be used in clustering algorithms, decision trees, or even as predictors in regression models. The figure above visualizes activity pattern clusters using plots showing the proportion of users engaging in particular activities at a given time of the day.
Activity pattern clusters generated via sequence distance methods
I have developed a method which derives covariate-based subgroups that have heterogeneous activity patterns. The figure below illustrates an activity pattern decision tree applied to one weekday of activity data and a set of candidate covariates for each user, collected via Daynamica in a recent Minneapolis-area study.
Activity pattern decision tree generated via sequence distance methods
Some interesting patterns emerge. The first chosen split is between users 63 and younger and those older than 63. The older group spent less time at work and relatively more time in leisure, other, and personal business activities, indicating that this split captures differences between retirees and those still in the workforce. Within the younger subgroup, another split was chosen, separating the highly educated (bachelor's degree or higher) and those with lower education levels. The majority of users in the more highly educated subgroup exhibit a typical "9-to-5" workday with relatively few engaging in "non-fixed" activities such as leisure and recreation and eating out. Those with lower education display much less structured schedules.
This characterization of heterogeneity in activity-based subgroups is a valuable tool that could inform decision-making processes in a myriad of applications. For example, urban planners could better understand differences in transportation usage; medical interventions could be customized to subgroups based on their activity patterns; or individuals could be identified for targeted advertising based on their activity patterns.
Slides for a presentation on this topic at the International Conference on Health Policy Statistics can be found below.