A Tour of Survival Analysis, from Classical to Modern

This tutorial is part of the 2020 ACM Conference on Health, Inference, and Learning taking place virtually on July 23-24. [ACM tutorial site]

Presenters

George H. Chen, Assistant Professor of Information Systems, Carnegie Mellon University
(georgechen [at symbol] cmu.edu)
Jeremy C. Weiss, Assistant Professor of Health Informatics, Carnegie Mellon University
(jeremyweiss [at symbol] cmu.edu)

Abstract

Survival analysis is used for predicting time-to-event outcomes, such as how long a patient will stay in the hospital, or when the recurrence of a tumor will likely happen. This tutorial aims to go over the basics of survival analysis, how it is used in healthcare, and some of its recent methodological advances from the ML community. We also discuss open challenges.

Tutorial Slides

Part 1: Basic survival analysis problem setup & classical estimators (presented by George)
Pre-requisite knowledge assumed: calculus, probability, and statistics
[slides*] [slides with animation*] [video] [supplemental note: "An Introduction to Survival Analysis Math"]
The supplemental note goes over the same examples as in the slides but in greater mathematical detail.
* If you're having trouble viewing the slides above, try these other versions of the pdf's (slides converted to images):
[slides (rasterized)] [slides with animation (rasterized)]

Part 2: Considerations for clinical applications (presented by Jeremy)
This part builds on Part 1, introducing some key concepts such as calibration and competing risks, and also includes an example involving the COVID-19 treatment remdesivir (predicting time until recovery for patients with treatment vs control)
[slides] [video (starts at 29:13)]

Parts 3 & 4: Recent neural net advances, open challenges (presented by George)
This part builds mostly on Part 1 although one method discussed toward the end (DeepHit) relates to competing risks from Part 2; importantly, we assume the audience is already familiar with neural nets (although not in the survival analysis setting)
[slides*] [slides with animation*] [video (starts at 44:47)]
A more polished & extensive version of the experiments shown in the tutorial can be found here, with code:
[paper on deep kernel survival analysis] [code]
Kvamme and Borgan have a paper that has a nice introduction to the math for continuous and discrete-time survival models as they're used with neural nets: "Continuous and Discrete-Time Survival Prediction with Neural Networks"
* If you're having trouble viewing the slides above, try these other versions of the pdf's (slides converted to images):
[slides (rasterized)] [slides with animation (rasterized)]

Some Software Packages

We highlight some software packages here. More links to code are available in the references below associated with their specific papers.

Python

lifelines -- Kaplan-Meier, Cox model and regularized variants, Weibull AFT, Aalen additive model
glmnet_python -- regularized Cox variants; official port of glmnet from R
pycox -- unified PyTorch implementations of DeepSurv, Cox-Time, Cox-CC, MTLR, Nnet-survival, DeepHit, and others
pysurvival -- MTLR implementation by original author, also random survival forests, survival SVMs, and others
Reference implementations by original authors:

R

survival -- Kaplan-Meier and Cox proportional hazards model
glmnet -- lasso, ridge, and elastic-net regularized Cox models
hdnom -- Cox nomograms
randomForestSRC -- random survival forest implementation by original author
cmprsk -- Fine & Gray subdistribution hazards and cumulative incidence functions
riskRegression -- cause-specific hazard models

References

A recent survey

Ping Wang, Yan Li, and Chandan K. Reddy. "Machine learning for survival analysis: A survey". ACM Computing Surveys (CSUR) 51(6): 1-36, 2019.
[paper (arXiv)]

Some classical survival estimators

(Kaplan-Meier estimator) Edward L. Kaplan and Paul Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282):457–481, 1958.
[paper (JSTOR)]
(Cox proportional hazards) David R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society. Series B, 34(2): 187–202, 1972.
[paper (JSTOR)]
How to estimate the baseline hazard is in the official discussion of the Cox paper:
- Norman Breslow. Discussion of the paper by D. R. Cox (1972). Journal of the Royal Statistical Society, Series B, 34(2):216–217, 1972.

An explanation for how the Cox loss (for beta) relates to ranking:

- Harald Steck, Balaji Krishnapuram, Cary Dehing-Oberije, Philippe Lambin, and Vikas C. Raykar. On ranking in survival analysis: Bounds on the concordance index. In Advances in Neural Information Processing Systems, pages 1209-1216, 2008.
  [paper (NeurIPS)]
(Logistic-hazard discrete-time model) Charles C. Brown. On the use of indicator variables for studying the time-dependence of parameters in a response-time model. Biometrics, 31(4):863–872, 1975.
[paper (JSTOR)]
(Conditional Kaplan-Meier estimators) Rudolf Beran. Nonparametric regression with randomly censored survival data. Technical report, University of California, Berkeley, 1981.
[paper (ResearchGate)]
Finite-sample error bounds are provided by:
- George H. Chen. Nearest neighbor and kernel survival analysis: Nonasymptotic error bounds and strong consistency rates. In International Conference on Machine Learning, pages 1001–1010, 2019.
  [paper (arXiv)] [code (Python)]
(Fine & Gray competing risks) Jason P. Fine and Robert J. Gray. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446):496-509, 1999.
[paper (JSTOR)]
(Random survival forests) Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer. Random survival forests. The Annals of Applied Statistics, 2(3):841–860, 2008.
[paper (arXiv)] [code (R)]

Accuracy/error metrics

Concordance index (ranking-based metric):
- (Original) Frank E. Harrell, Robert M. Califf, David B. Pryor, Kerry L. Lee, and Robert A. Rosati. Evaluating the yield of medical tests. Journal of the American Medical Association, 247(18):2543–2546, 1982.
  [paper (JAMA)]
- (Time-dependent) Laura Antolini, Patrizia Boracchi, and Elia Biganzoli. A time-dependent discrimination index for survival data. Statistics in Medicine, 24(24):3927–3944, 2005.
  [doi (Wiley)]
- Also look at the mortality way of ranking to compute concordance index in the random survival forests paper: Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer. Random survival forests. The Annals of Applied Statistics, 2(3):841–860, 2008.
  [paper (arXiv)] [code (R)]
Brier score (error in estimating survival function):
- Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, 18(17-18):2529–2545, 1999.
  [doi (Wiley)]
- Thomas A. Gerds and Martin Schumacher. Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical Journal, 48(6):1029–1040, 2006.
  [doi (Wiley)]
Looking at average widths of subject-specific survival time prediction intervals:
- George H. Chen. Deep kernel survival analysis and subject-specific survival time prediction intervals. In Machine Learning for Healthcare Conference, 2020.
  [paper (arXiv)] [code (Python)]

Calibration

(TRIPOD) Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, and Karel GM Moons. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) The TRIPOD Statement. Circulation, 131(2):211-219, 2015.
[doi (Annals of Internal Medicine)]
Olga V. Demler, Nina P. Paynter, and Nancy R. Cook. Tests of calibration and goodness-of-fit in the survival setting. Statistics in Medicine, 34(10):1659-1680, 2015.
[doi (Wiley)]
(Countdown regression) Avati, Anand, Tony Duan, Sharon Zhou, Kenneth Jung, Nigam H. Shah, and Andrew Ng. Countdown Regression: Sharp and Calibrated Survival Predictions. In Uncertainty in Artificial Intelligence, 2019.
[paper (arXiv)] [code (Python)]
(D-calibration) Haider, Humza, Bret Hoehn, Sarah Davis, and Russell Greiner. "Effective ways to build and evaluate individual survival distributions." Journal of Machine Learning Research 21(85):1-63, 2020.
[paper (arXiv)] [code (R)]

Some standard datasets:

Note that some datasets have multiple versions floating around the web, some times with feature names and some times without. All of the following show up in some form as part of the pycox package (which also has other datasets).

(German Breast Cancer Study Group) M. Schumacher, G. Bastert, H. Bojar, K. Huebner, M. Olschewski, W. Sauerbrei, C. Schmoor, C. Beyerle, R. L. Neumann, and H. F. Rauschecker. Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. Journal of Clinical Oncology, 12(10):2086–2093, 1994.
[paper (Journal of Clinical Oncology)] [dataset is part of R package "TH.data"; search in documentation for "GBSG2"] [can also be found as the test data here (DeepSurv code repository)]
(SUPPORT) William A. Knaus, Frank E. Harrell, Joanne Lynn, Lee Goldman, Russell S. Phillips, Alfred F. Connors, Neal V. Dawson, William J. Fulkerson, Robert M. Califf, and Norman Desbiens. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine, 122(3):191–203, 1995.
[paper] [dataset (scroll down to "SUPPORT")]
(METABRIC) Christina Curtis, Sohrab P Shah, Suet-Feung Chin, Gulisa Turashvili, Oscar M Rueda, Mark J Dunning, Doug Speed, Andy G Lynch, Shamith Samarajiwa, and Yinyin Yuan. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel sub-groups. Nature, 486(7403):346–352, 2012.
[paper (Nature)] [there are different versions of the dataset floating around; here's one from the DeepSurv code repository]
(Rotterdam tumor bank) John A. Foekens, Harry A. Peters, Maxime P Look, Henk Portengen, Manfred Schmitt, Michael D Kramer, Nils Brünner, Fritz Jänicke, Marion E. Meijer-van Gelder, and Sonja C. Henzen-Logmans. The urokinase system of plasminogen activation and prognosis in 2780 breast cancer patients. Cancer Research, 60(3):636–643, 2000.
[paper (Cancer Research)] [can also be found as the training data here (DeepSurv code repository)]

Some recent neural net survival estimators (all with code)

(DeepSurv) Jared L. Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1): 24, 2018.
[paper (arXiv)] [authors' original code (Python)] [pycox code (Python)]
In fact, the approach of DeepSurv was already known decades previous:
- David Faraggi and Richard Simon. A neural network model for survival data. Statistics in Medicine, 14(1):73-82, 1995.
(DeepHit) Changhee Lee, William R. Zame, Jinsung Yoon, and Mihaela van der Schaar. DeepHit: A deep learning approach to survival analysis with competing risks. In AAAI Conference on Artificial Intelligence, 2018.
[paper (UCLA)] [authors' original code (Python)] [pycox code (Python)]
(Nnet-survival) Michael F. Gensheimer and Balasubramanian Narasimhan. A scalable discrete-time survival model for neural networks. PeerJ, 7:e6257, 2019.
[paper (arXiv)] [authors' original code (Python)] [pycox code (Python)]
(Cox-CC, Cox-Time) Håvard Kvamme, Ørnulf Borgan, and Ida Scheel. Time-to-event prediction with neural networks and Cox regression. Journal of Machine Learning Research, 20(129):1–30, 2019.
[paper (arXiv)] [JMLR] [pycox code (Python)]

(PC-Hazard) Håvard Kvamme and Ørnulf Borgan. Continuous and discrete-time survival prediction with neural networks. arXiv preprint arXiv:1910.06724, 2019.
[paper (arXiv)] [pycox code (Python)]
(Dynamic-DeepHit) Changhee Lee, Jinsung Yoon, and Mihaela Van Der Schaar. "Dynamic-DeepHit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data." IEEE Transactions on Biomedical Engineering, 67(1):122-133, 2019.
[paper (IEEE)] [code (Python)]
(Deep kernel survival analysis) George H. Chen. Deep kernel survival analysis and subject-specific survival time prediction intervals. In Machine Learning for Healthcare Conference, 2020.
[paper (arXiv)] [code (Python)]
(Topic modeling with survival analysis) Linhong Li, Ren Zuo, Amanda Coston, Jeremy C. Weiss, George H. Chen. Neural topic models with survival supervision: Jointly predicting time-to-event outcomes and learning how clinical features relate. In International Conference on Artificial Intelligence in Medicine, 2020.
[paper (arXiv)] [code (Python)]

Deep generative models for survival analysis

Rajesh Ranganath, Adler Perotte, Noémie Elhadad, and David Blei. Deep survival analysis. In Machine Learning for Healthcare Conference, pages 101-114, 2016.
[paper (arXiv)]
Chirag Nagpal, Xinyu Li, and Artur Dubrawski. Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks. In NeurIPS Machine Learning for Health Workshop, 2019.
[paper (arXiv)] [code (Python)]

A recently proposed framework for causal inference with survival analysis

Yifan Cui, Michael R. Kosorok, Stefan Wager, and Ruoqing Zhu. "Estimating heterogeneous treatment effects with right-censored data via causal survival forests." arXiv preprint arXiv:2001.09887, 2020.
[paper (arXiv)]