Rigorous Validation with Proof-By-Simulation

Michael Sklar @ Confirm Solutions

Abstract:
Modern experiment designs can make clinical trials faster, smaller, and more likely to select the right treatment. But, existing mathematical tools may not be strong enough to show how key performance metrics, such as the false positive probability, vary across different inputs, such as a range of different treatment effects. One common approach is to use simulations - but then, how many simulations are enough, and what if you've "missed a spot"? This is especially important for optimized experiment designs, which are incentivized to give false positives in places that are not checked. We take a comprehensive approach: building on methods from Sklar's PhD thesis, we extend simulations taken over a dense grid of input parameters into bounds that cover continuous volumes of parameter space.* With these tools, we aim to "automate the Type I Error proof" and other criteria for new trial designs, making regulation faster, easier, and more consistent. Speculatively, we also hope our approach will lay a foundation for new "black box" statistical methods. We have published our open-source prototype software: https://github.com/Confirm-Solutions/imprint.

Speaker email: michaelbsklar@gmail.com.

*Our bounds are non-asymptotic. We require that simulation outcomes depend on unknown parameters through an exponential family distribution [e.g. Gaussian, binomial, gamma, or Poisson outcomes]. For good performance, the computational cost of the simulation grid suffers from a curse of dimensionality; our examples in 3 - 4 parameter dimensions can be done on an at-home computer.

Bio:
Michael Sklar is CEO of Confirm Solutions, a public-benefit corporation creating open-source statistical software with a mission to speed up medical regulation. He holds a PhD in Statistics from Stanford, where he was later a Stein Fellow. https://www.linkedin.com/in/michael-sklar/

Summary

Goal: develop tool to accelerate clinical trial design
- E.g. overlap different trial phases, approval, manufacturing
Focus: Overlap phase 3 clinical trial and approval
Challenge: trial design is complex
- Distribution of survival times are unusual
- Trials can stop early if they’re going very well or poorly
- If we have multiple possible treatments, want to drop the poor performers, focus on good performers
- Must wait for a long time to observe outcomes
- Different arms may start at different times
- The randomization probabilities may change over time (using algorithm)
- We care both about patient welfare AND learning about treatment effects
- Balance exploration and exploitation
Problems:
- Design: which tests to use
- Validation: check that trial design and outcomes are valid
FDA process is long
- Lots of discussions
- E.g. The Complex Innovative Design approval process is 240 days long (trial validation)
Complications of design validations
- Definition of null space
- Scope of simulations
- Multiple hypothesis testing
- Informative priors, where Bayesian testing is used (Bayesian is not common)
- Resource issues
Scope of the simulation (parameter space)
- Choice of clinical, statistical nuisance, operational parameters
- Fine-ness of the grid on which they’re sampled
- Goal: show that P(False Positive) < x over a volume of parameter space
Focus on the Type I (false positive) error:
- Simulation:
  - Model of the population being studied (their physical attributes and interactions) and the process for selecting them for the study
  - Includes the different arms of the trial
  - Arms are related if choice of being selected in the various arms is multi-level
- Simulation of the process being studied has some curve relating free parameters (describe the population being studied, values unknown) to false positive error
- Curve can be any shape and we want to make sure it doesn’t exceed some threshold for any of these unknown parameters
- Can we simulate at enough points to ensure it never exceeds the threshold?
- For exponential-family distribution we can approximate these using confidence interval for the distribution and its 1st derivative
- This allows us to compute the number and density of trial observations needed to ensure the false positive error does (or doesn’t) stay within the threshold
Video includes a demonstration some example studies and the complex study behavior spaces that the tool can model

Page updated

Report abuse