I strive for simplicity in statistics, but many applications call for more complex methods. I have planned, built, and published on many types of models:
Traditional parametric regression models such as linear, logistic, mixed, and repeated measures, with a variety of functional forms
Unsupervised models such as factor analysis and principal component analysis, with rotation methods
Methods for quasi-experimental longitudinal data, such as interrupted time series, regression discontinuity, and difference-in-differences.
I am familiar with other classes of models as well:
Machine learning regression models such as lasso, ridge, and elastic net, and classifiers such as KNN and SVMs
Bayesian perspectives within traditional models
Marginal structural models for non-experimental data
Understanding causation is hard. I am leading the effort at Ariadne Labs to set consistent evidentiary standards for the various stages of our work, and the study designs that can build that evidence base. I think a lot about how study designs enable causal reasoning, and how we talk about that. The pitfalls are tempting -- p-hacking, endpoint switching, and post-hoc rationalization are all too human -- but the rewards of upholding rigorous standards are immense.
For five years, I worked in psychometrics as a scientist with Optum Patient Insights, the copyright holder for the SF-36 and SF-12 version 2 tools. Statistically, this work involved:
Many flavors of psychometric testing of survey tools for validity and reliability
Scale construction using conventional methods, Item Response Theory, and computer-adaptive methods
I deliver two lectures on psychometrics each year in a class for the Summer Program on Clinical Effectiveness at the Harvard T.H. Chan School of Public Health.
Statistician on the BetterBirth study, a cluster-randomized controlled trial of the WHO Safe Childbirth Checklist, involving nearly 160,000 births in Uttar Pradesh, India.
Statistical Lead on the Serious Illness Conversation Program (SICP) cluster-randomized trial, testing the effect of end-of-life conversations on patient-reported outcomes and goal-concordant care.
Statistical Lead on the Low-birthweight Infant Feeding Trial (LIFT), an adaptive randomized controlled trial in Malawi, Tanzania, and India, planned for 2021.
PI or co-PI on several secondary analyses of phase II/phase III randomized-controlled trials of pharmaceutical effects on patient-reported outcomes.
Familiar with other RCT designs such as pragmatic trials, stepped wedge trials, enriched trials, and adaptive trials.
I currently use SAS 9.4 at a highly proficient level, including many statistical procs, the macro language, the ods output system, and data visualizations. I use SQL commands from within SAS. I can use Stata or SPSS in a pinch.
Lately I have been dipping into R Studio, R Markdown, and R Shiny.
I also use Excel, which is familiar to many non-technical people and is overlooked by statisticians. It is much more powerful than it seems.
Co-PI on the Better Evidence Project, a study of 1,600 front-line clinicians in low-resource settings who use UpToDate, an online clinical reference, linking their longitudinal survey data with clickstream data over one year.
Statistical Lead on the Low-birthweight Infant Feeding Exploration (LIFE) study, which is capturing feeding and growth outcomes in a cohort of over 1,000 low birthweight infants over one year, with sites in Malawi, Tanzania, and India.
Statistical advisor on analysis of a cross-sectional survey of health facilities and women of reproductive age in Ghana, as part of the Performance Monitoring and Accountability (PMA) 2020 project.
Data Analyst on the USAID-funded Situation Analysis project, with cross-sectional assessments of the quality of reproductive health care in twelve countries in sub-Saharan Africa
I have handled many, many datasets in my career: large and small, clean and messy, well-documented and totally mysterious. I am undaunted by challenges such as:
Maintaining non-identifiability (salting and hashing, for example)
Switching units of analysis (sometimes called "long" or "wide" data)
Joining datasets with keys, including messy keys, using SQL joins or SAS merges
Free text standardization (NLP and regular expressions)
Standardizing data flows between several institutions
Making data "tidy" as specified in R
As a scientific advisor at Ariadne Labs, I mentor many clinician researchers as they design studies of health system interventions. These designs include:
Implementation Science studies types I, II, and III (varying balance between research and quality improvement aims)
Monitoring and evaluation studies using run charts or similar methods
Realist evaluations
In my view, mixed-method studies are typically stronger than single-method studies. I have no formal training in qualitative research, but in working with qualitative colleagues over the years, I have gained experience with:
Conducting cognitive debrief interviews for survey development
Conducting concept elicitation interviews for content validity
Developing in-depth interview guides, and conducting the interviews
Coding and codebook development (NVivo)
Establishing realistic study aims and appropriate endpoints, including endpoint classification (primary, secondary, exploratory)
Sample size and power calculations, with management of multiplicity (inflated alpha error due to multiple testing)
Methods to mitigate sampling bias in all its forms
Survey development, with attention to language biases, respondent burden, Likert scale construction, and mode of administration
Maximizing the efficiency of a study design given time and budget constraints
Grantwriting to philanthropic donors and NIH
Human subjects protection, research ethics, and IRB relations