Rigorous Validation with Proof-By-Simulation
Abstract:
Modern experiment designs can make clinical trials faster, smaller, and more likely to select the right treatment. But, existing mathematical tools may not be strong enough to show how key performance metrics, such as the false positive probability, vary across different inputs, such as a range of different treatment effects. One common approach is to use simulations - but then, how many simulations are enough, and what if you've "missed a spot"? This is especially important for optimized experiment designs, which are incentivized to give false positives in places that are not checked. We take a comprehensive approach: building on methods from Sklar's PhD thesis, we extend simulations taken over a dense grid of input parameters into bounds that cover continuous volumes of parameter space.* With these tools, we aim to "automate the Type I Error proof" and other criteria for new trial designs, making regulation faster, easier, and more consistent. Speculatively, we also hope our approach will lay a foundation for new "black box" statistical methods. We have published our open-source prototype software: https://github.com/Confirm-Solutions/imprint.
Speaker email: michaelbsklar@gmail.com.
*Our bounds are non-asymptotic. We require that simulation outcomes depend on unknown parameters through an exponential family distribution [e.g. Gaussian, binomial, gamma, or Poisson outcomes]. For good performance, the computational cost of the simulation grid suffers from a curse of dimensionality; our examples in 3 - 4 parameter dimensions can be done on an at-home computer.
Bio:
Michael Sklar is CEO of Confirm Solutions, a public-benefit corporation creating open-source statistical software with a mission to speed up medical regulation. He holds a PhD in Statistics from Stanford, where he was later a Stein Fellow. https://www.linkedin.com/in/michael-sklar/
Summary
Goal: develop tool to accelerate clinical trial design
E.g. overlap different trial phases, approval, manufacturing
Focus: Overlap phase 3 clinical trial and approval
Challenge: trial design is complex
Distribution of survival times are unusual
Trials can stop early if they’re going very well or poorly
If we have multiple possible treatments, want to drop the poor performers, focus on good performers
Must wait for a long time to observe outcomes
Different arms may start at different times
The randomization probabilities may change over time (using algorithm)
We care both about patient welfare AND learning about treatment effects
Balance exploration and exploitation
Problems:
Design: which tests to use
Validation: check that trial design and outcomes are valid
FDA process is long
Lots of discussions
E.g. The Complex Innovative Design approval process is 240 days long (trial validation)
Complications of design validations
Definition of null space
Scope of simulations
Multiple hypothesis testing
Informative priors, where Bayesian testing is used (Bayesian is not common)
Resource issues
Scope of the simulation (parameter space)
Choice of clinical, statistical nuisance, operational parameters
Fine-ness of the grid on which they’re sampled
Goal: show that P(False Positive) < x over a volume of parameter space
Focus on the Type I (false positive) error:
Simulation:
Model of the population being studied (their physical attributes and interactions) and the process for selecting them for the study
Includes the different arms of the trial
Arms are related if choice of being selected in the various arms is multi-level
Simulation of the process being studied has some curve relating free parameters (describe the population being studied, values unknown) to false positive error
Curve can be any shape and we want to make sure it doesn’t exceed some threshold for any of these unknown parameters
Can we simulate at enough points to ensure it never exceeds the threshold?
For exponential-family distribution we can approximate these using confidence interval for the distribution and its 1st derivative
This allows us to compute the number and density of trial observations needed to ensure the false positive error does (or doesn’t) stay within the threshold
Video includes a demonstration some example studies and the complex study behavior spaces that the tool can model