Biostatistics, Evidence, and Research Design


My work focuses on biostatistics, evidence, and research design ('BERD') in neurological rehabilitation and recovery. Recovery is a complex, dynamic process with many interacting factors at physiological, psychological, and sociological levels. To that end, I specialize in longitudinal and multivariate modeling techniques to help disentangle these problems and mechanistically explore the recovery process. Meta-scientifically, I collaborate with different teams to share data, synthesize data, and evaluate scientific evidence. And I collaborate in the planning of research studies to ensure efficient but rigorous designs to answer scientific questions. 

Ontology and Measurement in Neurological Recovery

In researching neurological injury and disease, it is imperative that we measure the right theoretical constructs in the right way. For instance, in a genome-wide association study of “recovery” following stroke, the hits we get across the genome will depend on how that recovery phenotype is defined. The NIH Stroke Scale is one of the most widely used instruments for measuring stroke severity and recovery, but when aggregated to a total score it loses specificity (i.e., two people can have the same total score for very different reasons). And when taken at the individual item level it loses sensitivity (i.e., the 5-level ordinal motor score for the arm has strong ceiling/floor effects and lacks resolution compared to continuous measures). Thus, we often face constraints between the informational value of a measure relative to the cost of its collection. To that end, I have worked on many different projects exploring the costs/benefits of measuring behavioral phenotypes in neurology. 

Example Publications:



Longitudinal and Time-Series Data Analysis

Rehabilitation is fundamentally about change within a person over time. Statistically speaking, however, these temporally dependent data violate the assumptions of many statistical tests and require specialized tools for analysis. People often use the term “longitudinal data” to refer to few data points (e.g., <10) that are collected over a long timescale (e.g., days or months apart). In contrast, people use the term “time-series data” to refer to large numbers of data points (e.g., hundreds to millions) that are sampled at a very high density (e.g., milliseconds or microseconds apart). Although these different data types actually exist on a continuum, it is still useful to talk about them separately as differences in the sampling rate and number of observations make them amenable to different types of analyses. For instance, time-series data can be transformed into the frequency domain with Fourier analysis and future data can be predicted with various autoregressive models. In contrast, longitudinal data might be analyzed with linear or non-linear mixed-effect regression models, with the specifics of the model depending on the nature of the outcome (e.g., binary, ordinal, or interval/ratio) and the “shape” of the trajectory (e.g., linear, exponential, or sigmoidal). Although we do not create these mathematical tools, we apply them to rehabilitation problems and create instructional materials for rehabilitation researchers to use them. 

Example Publications:


Data Use, Re-Use, and Rehabilitation Informatics

As with many fields, rehabilitation has seen astronomical growth in the amount and complexity of the data we produce. For instance, physiological data from EEG or accelerometry data from inertial sensors contain highly structured data (e.g., voltages or forces in discrete intervals of time) in very dense samples (e.g., 250-1,000Hz for minutes or hours of recording). In contrast, electronic health records contain loosely structured data from millions of individuals all with complex data types that all have unique relationships to each other and may or may not be recorded over time. This means that researchers and their students are facing increasingly large and complex data sets. In my research group, we want to give researchers the tools and training to work with their own data effectively. More than any one project, we want to make sure that data are Findable, Accessible, Interoperable, and Reusable (FAIR) in rehabilitation science. As part of that effort, I am part of the educational leadership team for the Reproducible Rehabilitation (“ReproRehab”) program funded by NCMRR and I collaborate with other researchers at WUSTL to harmonize and archive large research datasets. 


Example publications:



Outside of Work

I love spending time with my wife, Emma, and our two dogs, Olive and Moose! I also enjoy running, lifting weights, going on hikes, reading books, and drinking coffee. 

Saint Louis is a great city to explore, with a lot of history and fantastic places to eat in each neighborhood. (And a lot of forests and trails not too far away!) 

[my website is permanently a work in progress; last updated 2024-05-02]