Other guidance papers related to TG 6

Table of contents

Guidance on prediction model performance

Overall perspectives on model performance

Assessment of incremental value of markers

Classic performance measures: calibration & discrimination

Net Benefit and Decision Curve Analysis (DCA)

Interpreting performance: internal, external validity & generalizability

Guidance on prediction model development

Modeling strategies: meta-analysis, continuous predictors, small sample size

Shrinkage and penalization methods for prediction

Model updating

Links between statistics and machine learning

Guidance and reporting of prediction models

Guidance on prediction model performance

Overall perspectives on model performance

Performance assessment for binary outcome models
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW.
Assessing the performance of prediction models: a framework for traditional and novel measures.
Epidemiology. 2010 Jan;21(1):128-38. doi: 10.1097/EDE.0b013e3181c30fb2. (pdf)

Key work from TG6 extends the proposed framework for assessment of performance to
survival, competing risk and other models
Performance assessment for survival models
DJ McLernon, D Giardiello, B Van Calster…
Assessing performance and clinical usefulness in prediction models with survival outcomes: practical guidance for Cox proportional hazards models
Annals of Internal Medicine 2023
Performance for competing risk models
N Van Geloven, D Giardiello, EF Bonneville, L Teece…
Validation of prediction models in the presence of competing risks: a guide through modern methods
BMJ 2022
Guidance for clinical audiences
Steyerberg EW, Vergouwe Y.
Towards better clinical prediction models: seven steps for development and an ABCD for validation.
Eur Heart J. 2014 Aug 1;35(29):1925-31. doi: 10.1093/eurheartj/ehu207

Wynants L, Collins GS, Van Calster B.
Key steps and common pitfalls in developing and validating risk models.
BJOG. 2017 Feb;124(3):423-432. doi: 10.1111/1471-0528.14170.

Bullock GS, Hughes T, Sergeant JC, Callaghan MJ, Riley RD, Collins GS.
Clinical Prediction Models in Sports Medicine: A Guide for Clinicians and Researchers.
J Orthop Sports Phys Ther. 2021 Oct;51(10):517-525. doi: 10.2519/jospt.2021.10697

Moons KG, Kengne AP, Woodward M, Royston P, Vergouwe Y, Altman DG, Grobbee DE.
Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker.
Heart. 2012 May;98(9):683-90. doi: 10.1136/heartjnl-2011-301246

Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, Woodward M.
Risk prediction models: II. External validation, model updating, and impact assessment.
Heart. 2012 May;98(9):691-8. doi: 10.1136/heartjnl-2011-301247

Collins GS, Dhiman P, Ma J, et al.
Evaluation of clinical prediction models (part 1): from development to external validation.
BMJ. 2024 Jan 8;384:e074819. doi: 10.1136/bmj-2023-074819.

Assessment of incremental value of markers

Review on marker assessment
Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, Van Calster B.
Assessing the incremental value of diagnostic and prognostic markers: a review and illustration.
Eur J Clin Invest. 2012 Feb;42(2):216-28. doi: 10.1111/j.1365-2362.2011.02562.x
Graphical display of incremental value
Steyerberg EW, Vedder MM, Leening MJ, Postmus D, D'Agostino RB Sr, Van Calster B, Pencina MJ.
Graphical assessment of incremental value of novel markers in prediction models: From statistical to decision analytical perspectives.
Biom J. 2015 Jul;57(4):556-70. doi: 10.1002/bimj.201300260
Linking statistical thinking to decision analytic thinking
Van Calster B, Vickers AJ, Pencina MJ, Baker SG, Timmerman D, Steyerberg EW.
Evaluation of markers and risk prediction models: overview of relationships between NRI and decision-analytic measures.
Med Decis Making. 2013 May;33(4):490-501. doi: 10.1177/0272989X12470757

Baker SG, Schuit E, Steyerberg EW, Pencina MJ, Vickers A, Moons KG, Mol BW, Lindeman KS.
How to interpret a small increase in AUC with an additional risk prediction marker: decision analysis comes through.
Stat Med. 2014 Sep 28;33(22):3946-59. doi: 10.1002/sim.6195
Sensitivity and specificity are meaningless performance measures from a decision-analytic perspective
Van Calster B, Steyerberg EW, D'Agostino RB Sr, Pencina MJ.
Sensitivity and specificity can change in opposite directions when new predictive markers are added to risk modelsMed Decis Making. 2014 May;34(4):513-22. doi: 10.1177/0272989X13513654
Pros and cons of the NRI
Leening MJ, Vedder MM, Witteman JC, Pencina MJ, Steyerberg EW.
Net reclassification improvement: computation, interpretation, and controversies: a literature review and clinician's guide.
Ann Intern Med. 2014 Jan 21;160(2):122-31. doi: 10.7326/M13-1522
Overall perspectives on marker studies & systematic reviews
Riley RD, Hayden JA, Steyerberg EW, Moons KG, Abrams K, Kyzas PA, Malats N, Briggs A, Schroter S, Altman DG, Hemingway H; PROGRESS Group.
Prognosis Research Strategy (PROGRESS) 2: prognostic factor research.
PLoS Med. 2013;10(2):e1001380. doi: 10.1371/journal.pmed.1001380

Riley RD, Moons KGM, Snell KIE, Ensor J, Hooft L, Altman DG, Hayden J, Collins GS, Debray TPA.
A guide to systematic review and meta-analysis of prognostic factor studies.
BMJ. 2019 Jan 30;364:k4597. doi: 10.1136/bmj.k4597

Kempf E, de Beyer JA, Cook J, Holmes J, Mohammed S, Nguyên TL, Simera I, Trivella M, Altman DG, Hopewell S, Moons KGM, Porcher R, Reitsma JB, Sauerbrei W, Collins GS.
Overinterpretation and misreporting of prognostic factor studies in oncology: a systematic review.
Br J Cancer. 2018 Nov;119(10):1288-1296. doi: 10.1038/s41416-018-0305-5.

Classic performance measures: calibration & discrimination

Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW.

A calibration hierarchy for risk models was defined: from utopia to empirical data.
J Clin Epidemiol. 2016 Jun;74:167-76. doi: 10.1016/j.jclinepi.2015.12.005

An integrative measure for calibration
Austin PC, Steyerberg EW.
The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models.
Stat Med. 2019 Sep 20;38(21):4051-4065. doi: 10.1002/sim.8281
ROC curves should not be shown in publications; the AUC is sufficient;
if decision thresholds are relevant, a classification plot is suggested
Verbakel JY, Steyerberg EW, Uno H, De Cock B, Wynants L, Collins GS, Van Calster B.
ROC curves for clinical prediction models part 1. ROC plots showed no added value above the AUC when evaluating the performance of clinical prediction models.
J Clin Epidemiol. 2020 Oct;126:207-216. doi: 10.1016/j.jclinepi.2020.01.028
A vivid debate arose in JCE on pros and cons of publishing ROC curves in addition to AUC: Janssens part 2 and part 4
External validation calls for reference values of the c-statistic
Vergouwe Y, Moons KG, Steyerberg EW.
External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients.
Am J Epidemiol. 2010 Oct 15;172(8):971-80. doi: 10.1093/aje/kwq223

van Klaveren D, Gönen M, Steyerberg EW, Vergouwe Y.
A new concordance measure for risk prediction models in external validation settings.
Stat Med. 2016 Oct 15;35(23):4136-52. doi: 10.1002/sim.6997

Discrimination in clustered data

van Klaveren D, Steyerberg EW, Perel P, Vergouwe Y.
Assessing discriminative ability of risk models in clustered data.
BMC Med Res Methodol. 2014 Jan 15;14:5. doi: 10.1186/1471-2288-14-5

Summarizing discrimination by the c-statistic across studies
Snell KI, Ensor J, Debray TP, Moons KG, Riley RD.
Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?
Stat Methods Med Res. 2018 Nov;27(11):3505-3522. doi: 10.1177/0962280217705678
Assessing discriminative ability for categorical outcomes: polytomous or ordinal
Van Calster B, Vergouwe Y, Looman CW, Van Belle V, Timmerman D, Steyerberg EW.
Assessing the discriminative ability of risk models for more than two outcome categories.
Eur J Epidemiol. 2012 Oct;27(10):761-70. doi: 10.1007/s10654-012-9733-3

Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW.
Extending the c-statistic to nominal polytomous outcomes: the Polytomous Discrimination Index.
Stat Med. 2012 Oct 15;31(23):2610-26. doi: 10.1002/sim.5321

Van Calster B, Van Belle V, Vergouwe Y, Steyerberg EW.
Discrimination ability of prediction models for ordinal outcomes: relationships between existing measures and a new measure.
Biom J. 2012 Sep;54(5):674-85. doi: 10.1002/bimj.201200026

Van Calster B, Van Belle V, Vergouwe Y, Steyerberg EW.
Discrimination ability of prediction models for ordinal outcomes: relationships between existing measures and a new measure.
Biom J. 2012 Sep;54(5):674-85. doi: 10.1002/bimj.201200026
Assessing calibration for nominal and ordinal outcomes
K Van Hoorde, Y Vergouwe, D Timmerman, B Van Calster
Assessing calibration of multinomial risk prediction models
Statistics in Medicine, 2014

M Edlinger, M van Smeden, HF Alber, M Wanitschek, B Van Calster
Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption
Statistics in Medicine, 2022

Net Benefit and Decision Curve Analysis (DCA)

Original proposal for DCA
Vickers AJ, Elkin EB.
Decision curve analysis: a novel method for evaluating prediction models.
Med Decis Making. 2006 Nov-Dec;26(6):565-74. doi: 10.1177/0272989X06295361
Net benefit as a concept in other papers
Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JD.
Validity of prognostic models: when is a model clinically useful?
Semin Urol Oncol. 2002 May;20(2):96-107. doi: 10.1053/suro.2002.32521

Peirce CS
The numerical measure of the success of predictions
Science, 1884

Explanation and guidance
Vickers AJ, Van Calster B, Steyerberg EW.
Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.
BMJ. 2016 Jan 25;352:i6. doi: 10.1136/bmj.i6

Vickers AJ, van Calster B, Steyerberg EW.
A simple, step-by-step guide to interpreting decision curve analysis.
Diagn Progn Res. 2019 Oct 4;3:18. doi: 10.1186/s41512-019-0064-7

Van Calster B, Wynants L, Verbeek JFM, Verbakel JY, Christodoulou E, Vickers AJ, Roobol MJ, Steyerberg EW.
Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators.
Eur Urol. 2018 Dec;74(6):796-804. doi: 10.1016/j.eururo.2018.08.038
An overview of calibration , discrimination, and Net Benefit
Riley RD, Archer L, Snell KIE, et al.
Evaluation of clinical prediction models (part 2): how to undertake an external validation study.

BMJ. 2024 Jan 15;384:e074820. doi: 10.1136/bmj-2023-074820.

Multiple estimates of Net Benefit can be summarized in a meta-analysis
Wynants L, Riley RD, Timmerman D, Van Calster B.
Random-effects meta-analysis of the clinical utility of tests and prediction models.
Stat Med. 2018 May 30;37(12):2034-2052. doi: 10.1002/sim.7653.

Interpreting performance: internal, external validity & generalizability

Perspective on internal and external validation

Steyerberg EW, Harrell FE Jr.
Prediction models need appropriate internal, internal-external, and external validation.
J Clin Epidemiol. 2016 Jan;69:245-7. doi: 10.1016/j.jclinepi.2015.04.005

Internal validation is efficient by bootstrap; split sample to be avoided

Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD.
Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.
J Clin Epidemiol. 2001 Aug;54(8):774-81. doi: 10.1016/s0895-4356(01)00341-9

Steyerberg EW.

Validation in prediction research: the waste by data splitting.
J Clin Epidemiol. 2018 Nov;103:131-133. doi: 10.1016/j.jclinepi.2018.07.010.

A summary of validation approaches
Collins GS, Dhiman P, Ma J, et al

Evaluation of clinical prediction models (part 1): from development to external validation.
BMJ. 2024 Jan 8;384:e074819. doi: 10.1136/bmj-2023-074819.

Geographic and temporal validation illustrations

Austin PC, van Klaveren D, Vergouwe Y, Nieboer D, Lee DS, Steyerberg EW.
Geographic and temporal validity of prediction models: different approaches were useful to examine model performance.
J Clin Epidemiol. 2016 Nov;79:76-85. doi: 10.1016/j.jclinepi.2016.05.007

Austin PC, van Klaveren D, Vergouwe Y, Nieboer D, Lee DS, Steyerberg EW.
Validation of prediction models: examining temporal and geographic stability of baseline risk and estimated covariate effects.
Diagn Progn Res. 2017;1:12. doi: 10.1186/s41512-017-0012-3

Differences in setting need to be considered
Debray TP, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KG.
A new framework to enhance the interpretation of external validation studies of clinical prediction models.
J Clin Epidemiol. 2015 Mar;68(3):279-89. doi: 10.1016/j.jclinepi.2014.06.018
Clustered data pose opportunities and challenges for performance assessment

Riley RD, Ensor J, Snell KI, Debray TP, Altman DG, Moons KG, Collins GS.
External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges.
BMJ. 2016 Jun 22;353:i3140. doi: 10.1136/bmj.i3140

Clustered data pose specific technical challenges for performance assessment
van Klaveren D, Steyerberg EW, Perel P, Vergouwe Y.
Assessing discriminative ability of risk models in clustered data.
BMC Med Res Methodol. 2014 Jan 15;14:5. doi: 10.1186/1471-2288-14-5

van Klaveren D, Steyerberg EW, Gönen M, Vergouwe Y.
The calibrated model-based concordance improved assessment of discriminative ability in patient clusters of limited sample size.
Diagn Progn Res. 2019 Jun 6;3:11. doi: 10.1186/s41512-019-0055-8

Measurement heterogeneity determines performance across settings

Luijken K, Groenwold RHH, Van Calster B, Steyerberg EW, van Smeden M.
Impact of predictor measurement heterogeneity across settings on the performance of prediction models: A measurement error perspective.
Stat Med. 2019 Aug 15;38(18):3444-3459. doi: 10.1002/sim.8183

Meta-analysis provides opportunities for assessment of heterogeneity
Steyerberg EW, Nieboer D, Debray TPA, van Houwelingen HC.
Assessment of heterogeneity in an individual participant data meta-analysis of prediction models: An overview and illustration.
Stat Med. 2019 Sep 30;38(22):4290-4309. doi: 10.1002/sim.8296
Sample size needs to be sufficient at external validation
Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JD.
Substantial effective sample sizes were required for external validation studies of predictive logistic regression models.
J Clin Epidemiol. 2005 May;58(5):475-83. doi: 10.1016/j.jclinepi.2004.06.017

Collins GS, Ogundimu EO, Altman DG.
Sample size considerations for the external validation of a multivariable prognostic model: a resampling study.
Stat Med. 2016 Jan 30;35(2):214-26. doi: 10.1002/sim.6787

Snell KIE, Archer L, Ensor J, Bonnett LJ, Debray TPA, Phillips B, Collins GS, Riley RD.
External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb.
J Clin Epidemiol. 2021 Jul;135:79-89. doi: 10.1016/j.jclinepi.2021.02.011

Riley RD, Debray TPA, Collins GS, Archer L, Ensor J, van Smeden M, Snell KIE.
Minimum sample size for external validation of a clinical prediction model with a binary outcome.
Stat Med. 2021 Aug 30;40(19):4230-4251. doi: 10.1002/sim.9025

Guidance on prediction model development

Modeling strategies: meta-analysis, continuous predictors, small sample size

A sensible general strategy
Harrell FE Jr, Lee KL, Mark DB.
Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.
Stat Med. 1996 Feb 28;15(4):361-87. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
Exploiting data from multiple studies
Debray TP, Riley RD, Rovers MM, Reitsma JB, Moons KG; Cochrane IPD Meta-analysis Methods group.
Individual participant data (IPD) meta-analyses of diagnostic and prognostic modeling studies: guidance on their use.
PLoS Med. 2015 Oct 13;12(10):e1001886. doi: 10.1371/journal.pmed.1001886

Riley RD, Steyerberg EW.
Meta-analysis of a binary outcome using individual participant data and aggregate data.
Res Synth Methods. 2010 Jan;1(1):2-19. doi: 10.1002/jrsm.4

Debray TP, Moons KG, Ahmed I, Koffijberg H, Riley RD.
A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis.
Stat Med. 2013 Aug 15;32(18):3158-80. doi: 10.1002/sim.5732
Dichotomization of predictor variables should be avoided
Altman DG, Lausen B, Sauerbrei W, Schumacher M.
Dangers of using "optimal" cutpoints in the evaluation of prognostic factors.
J Natl Cancer Inst. 1994 Jun 1;86(11):829-35. doi: 10.1093/jnci/86.11.829

Royston P, Altman DG, Sauerbrei W.
Dichotomizing continuous predictors in multiple regression: a bad idea.
Stat Med. 2006 Jan 15;25(1):127-41. doi: 10.1002/sim.2331
Restricted cubic splines are attractive
Harrell FE Jr, Lee KL, Pollock BG.

Regression models in clinical studies: determining relationships between predictors and response.

J Natl Cancer Inst. 1988 Oct 5;80(15):1198-202. doi: 10.1093/jnci/80.15.1198

Fractional polynomials can also be used
Royston P, Ambler G, Sauerbrei W.
The use of fractional polynomials to model continuous risk variables in epidemiology.
Int J Epidemiol. 1999 Oct;28(5):964-74. doi: 10.1093/ije/28.5.964
Many implementations of splines are available
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M.
A review of spline function procedures in R.
BMC Med Res Methodol. 2019 Mar 6;19(1):46
Smooth modeling of continuous predictors is advised
Nieboer D, Vergouwe Y, Roobol MJ, Ankerst DP, Kattan MW, Vickers AJ, Steyerberg EW; Prostate Biopsy Collaborative Group.
Nonlinear modeling was applied thoughtfully for risk prediction: the Prostate Biopsy Collaborative Group.
J Clin Epidemiol. 2015 Apr;68(4):426-34. doi: 10.1016/j.jclinepi.2014.11.022
Various approaches can be followed to model continuous predictors, and some are preferable over others
Collins GS, Ogundimu EO, Cook JA, Manach YL, Altman DG.
Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model.
Stat Med. 2016 Oct 15;35(23):4124-35. doi: 10.1002/sim.6986
Small data set strategies
Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD.
Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets.
Med Decis Making. 2001 Jan-Feb;21(1):45-56. doi: 10.1177/0272989X0102100106
In small samples, external knowledge is essential to guide the selection of predictors, with estimation stabilized by shrinkage/penalization
Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD.
Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.
Stat Med. 2000 Apr 30;19(8):1059-79. doi: 10.1002/(sici)1097-0258(20000430)19:8<1059::aid-sim412>3.0.co;2-0
Sample size in a dynamic context
Christodoulou E, van Smeden M, Edlinger M, Timmerman D, Wanitschek M, Steyerberg EW, Van Calster B.
Adaptive sample size determination for the development of clinical prediction models.
Diagn Progn Res. 2021 Mar 22;5(1):6. doi: 10.1186/s41512-021-00096-5
Many papers on sample size with Richard Riley, see his prognosis research website
Sample size for model development and validation
Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, Moons KGM, Collins G, van Smeden M
Calculating the sample size required for developing a clinical prediction model
BMJ. 2020 Mar 18;368:m441. doi: 10.1136/bmj.m441

Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE Jr, Moons KG, Collins GS.
Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes.
Stat Med. 2019 Mar 30;38(7):1276-1296. doi: 10.1002/sim.7992

Riley RD, Snell KIE, Archer L, Ensor J, Debray TPA, van Calster B, van Smeden M, Collins GS.
Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study.
BMJ. 2024 Jan 22;384:e074821. doi: 10.1136/bmj-2023-074821.

Machine learning is data hungry
van der Ploeg T, Austin PC, Steyerberg EW.
Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.
BMC Med Res Methodol. 2014 Dec 22;14:137. doi: 10.1186/1471-2288-14-137
Stepwise selection and dichotomization of predictors plus small sample size at both development and validation destroy well-intended efforts for development of sensible prediction models
Steyerberg EW, Uno H, Ioannidis JPA, van Calster B; Collaborators
Poor performance of clinical prediction models: the harm of commonly applied methods.
J Clin Epidemiol. 2018 Jun;98:133-143. doi: 10.1016/j.jclinepi.2017.11.013
Presentation in score charts should not be oversimplistic
Moons KG, Harrell FE, Steyerberg EW.
Should scoring rules be based on odds ratios or regression coefficients?
J Clin Epidemiol. 2002 Oct;55(10):1054-5. doi: 10.1016/s0895-4356(02)00453-5

Shrinkage and penalization methods for prediction

Overall beneficial for predictive performance
Moons KG, Donders AR, Steyerberg EW, Harrell FE.
Penalized maximum likelihood estimation to directly adjust prediction models for overoptimism: a clinical example.
J Clin Epidemiol. 2004 Dec;57(12):1262-70. doi: 10.1016/j.jclinepi.2004.01.020
But problematic to estimate shrinkage factor when needed most: in small data sets
Van Calster B, van Smeden M, De Cock B, Steyerberg EW.
Regression shrinkage methods for clinical prediction models do not guarantee improved performance: Simulation study.
Stat Methods Med Res. 2020 Nov;29(11):3166-3178. doi: 10.1177/0962280220921415.

Riley RD, Snell KIE, Martin GP, Whittle R, Archer L, Sperrin M, Collins GS.
Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small.
J Clin Epidemiol. 2021 Apr;132:88-96. doi: 10.1016/j.jclinepi.2020.12.005
Tuning can be unstable and impacts on external performance

Martin GP, Riley RD, Collins GS, Sperrin M.
Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance.
Stat Methods Med Res. 2021 Dec;30(12):2545-2561. doi: 10.1177/09622802211046388

Model updating

A perspective on model development, validation, and updating
Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, Riley RD, Hemingway H, Altman DG; PROGRESS Group.
Prognosis Research Strategy (PROGRESS) 3: prognostic model research.
PLoS Med. 2013;10(2):e1001381. doi: 10.1371/journal.pmed.1001381
Dynamic model development with continuous data collection
Strobl AN, Vickers AJ, Van Calster B, Steyerberg E, Leach RJ, Thompson IM, Ankerst DP.
Improving patient prostate cancer risk assessment: Moving from static, globally-applied to dynamic, practice-specific risk calculators.
J Biomed Inform. 2015 Aug;56:87-93. doi: 10.1016/j.jbi.2015.05.001

Siregar S, Nieboer D, Vergouwe Y, Versteegh MI, Noyez L, Vonk AB, Steyerberg EW, Takkenberg JJ.
Improved Prediction by Dynamic Modeling: An Exploratory Study in the Adult Cardiac Surgery Database
Circ Cardiovasc Qual Outcomes. 2016 Mar;9(2):171-81. doi: 10.1161/CIRCOUTCOMES.114.001645

Jenkins DA, Martin GP, Sperrin M, Riley RD, Debray TPA, Collins GS, Peek N.
Continual updating and monitoring of clinical prediction models: time for dynamic prediction systems?
Diagn Progn Res. 2021 Jan 11;5(1):1. doi: 10.1186/s41512-020-00090-3

Booth S, Riley RD, Ensor J, Lambert PC, Rutherford MJ.
Temporal recalibration for improving prognostic model development and risk predictions in settings where survival is improving over time.
Int J Epidemiol. 2020 Aug 1;49(4):1316-1325. doi: 10.1093/ije/dyaa030

Markers can be added in a dynamic modeling trajectory
Nieboer D, Vergouwe Y, Ankerst DP, Roobol MJ, Steyerberg EW.
Improving prediction models with new markers: a comparison of updating strategies.
BMC Med Res Methodol. 2016 Sep 27;16(1):128. doi: 10.1186/s12874-016-0231-2
Updating needs to respect small sample size
Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD.
Validation and updating of predictive logistic regression models: a study on sample size and shrinkage.
Stat Med. 2004 Aug 30;23(16):2567-86. doi: 10.1002/sim.1844

Vergouwe Y, Nieboer D, Oostenbrink R, Debray TPA, Murray GD, Kattan MW, Koffijberg H, Moons KGM, Steyerberg EW.
A closed testing procedure to select an appropriate method for updating prediction models.
Stat Med. 2017 Dec 10;36(28):4529-4539. doi: 10.1002/sim.7179
Combining multiple previously developed model can be valuable
Debray TP, Koffijberg H, Nieboer D, Vergouwe Y, Steyerberg EW, Moons KG.
Meta-analysis and aggregation of multiple published prediction models.
Stat Med. 2014 Jun 30;33(14):2341-62. doi: 10.1002/sim.6080
Polytomous regression models should also consider updating
Van Hoorde K, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW, Van Calster B.
Simple dichotomous updating methods improved the validity of polytomous prediction models.
J Clin Epidemiol. 2013 Oct;66(10):1158-65. doi: 10.1016/j.jclinepi.2013.04.014.
Updating of survival models
Ensor J, Snell KIE, Debray TPA, Lambert PC, Look MP, Mamas MA, Moons KGM, Riley RD.
Individual participant data meta-analysis for external validation, recalibration, and updating of a flexible parametric prognostic model.
Stat Med. 2021 Jun 15;40(13):3066-3084. doi: 10.1002/sim.8959

Links between statistics and machine learning

Open questions for machine learning and AI
Vollmer S, Mateen BA, Bohner G, Király FJ, Ghani R, Jonsson P, Cumbers S, Jonas A, McAllister KSL, Myles P, Granger D, Birse M, Branson R, Moons KGM, Collins GS, Ioannidis JPA, Holmes C, Hemingway H.
Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness.
BMJ. 2020 Mar 20;368:l6927. doi: 10.1136/bmj.l6927
Guidelines for quality improvement of prediction with machine learning and AI
de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, Aardoom JJ, Debray TPA, Schuit E, van Smeden M, Reitsma JB, Steyerberg EW, Chavannes NH, Moons KGM.
Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review.
NPJ Digit Med. 2022 Jan 10;5(1):2. doi: 10.1038/s41746-021-00549-7
Class imbalance correction approaches
van den Goorbergh R, van Smeden M, Timmerman D, Van Calster B.
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.
J Am Med Inform Assoc. 2022 Aug 16;29(9):1525-1534. doi: 10.1093/jamia/ocac093
Performance measures in machine learning literature
de Hond AAH, van Calster B, Steyerberg EW.
Commentary: Artificial Intelligence and Statistics: Just the Old Wine in New Wineskins?
Front Digit Health. 2022 May 20;4:923944. doi: 10.3389/fdgth.2022.923944
Comparisons of performance for statistical versus machine learning models
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B.
A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.
J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004

Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, Steyerberg EW; CENTER-TBI collaborators.
Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury.
J Clin Epidemiol. 2020 Jun;122:95-107. doi: 10.1016/j.jclinepi.2020.03.005

van der Ploeg T, Nieboer D, Steyerberg EW.
Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury.
J Clin Epidemiol. 2016 Oct;78:83-89. doi: 10.1016/j.jclinepi.2016.03.002

De Hond A, Raven W, Schinkelshoek L, Gaakeer M, Ter Avest E, Sir O, Lameijer H, Hessels RA, Reijnen R, De Jonge E, Steyerberg E, Nickel CH, De Groot B.
Machine learning for developing a prediction model of hospital admission of emergency department patients: Hype or hope?
Int J Med Inform. 2021 Aug;152:104496. doi: 10.1016/j.ijmedinf.2021.104496

Comparisons to clinician judgement
Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, Topol EJ, Ioannidis JPA, Collins GS, Maruthappu M.
Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies.
BMJ. 2020 Mar 25;368:m689. doi: 10.1136/bmj.m689
A plea for openness of machine learning models and careful validation
Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS.
Predictive analytics in health care: how can we know it works?

J Am Med Inform Assoc. 2019 Dec 1;26(12):1651-1654. doi: 10.1093/jamia/ocz130

Van Calster B, Steyerberg EW, Collins GS.

Artificial Intelligence Algorithms for Medical Prediction Should Be Nonproprietary and Readily Available.
JAMA Intern Med. 2019 May 1;179(5):731. doi: 10.1001/jamainternmed.2019.0597

Reporting of AI driven decision tools
Vasey B, Nagendran M, Campbell B, Clifton DA, Collins GS, Denaxas S, Denniston AK, Faes L, Geerts B, Ibrahim M, Liu X, Mateen BA, Mathur P, McCradden MD, Morgan L, Ordish J, Rogers C, Saria S, Ting DSW, Watkinson P, Weber W, Wheatstone P, McCulloch P; DECIDE-AI expert group.
Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.
Nat Med. 2022 May;28(5):924-933. doi: 10.1038/s41591-022-01772-9

Guidance and reporting of prediction models

Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD)
Collins GS, Reitsma JB, Altman DG, Moons KG.
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.
Ann Intern Med. 2015 Jan 6;162(1):55-63. doi: 10.7326/M14-0697

Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS.
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.
Ann Intern Med. 2015 Jan 6;162(1):W1-73. doi: 10.7326/M14-0698
Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK) guideline
McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark GM; Statistics Subcommittee of the NCI-EORTC Working Group on Cancer Diagnostics.
Reporting recommendations for tumor marker prognostic studies.
J Clin Oncol. 2005 Dec 20;23(36):9067-72. doi: 10.1200/JCO.2004.01.0454

Altman DG, McShane LM, Sauerbrei W, Taube SE.
Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK): explanation and elaboration.
PLoS Med. 2012;9(5):e1001216. doi: 10.1371/journal.pmed.1001216

Sauerbrei W, Taube SE, McShane LM, Cavenagh MM, Altman DG.
Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK): An Abridged Explanation and Elaboration.
J Natl Cancer Inst. 2018 Aug 1;110(8):803-811. doi: 10.1093/jnci/djy088

Systematic review appraisal: CHARMS
Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, Reitsma JB, Collins GS.
Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist.
PLoS Med. 2014 Oct 14;11(10):e1001744. doi: 10.1371/journal.pmed.1001744
Prediction model Risk Of Bias ASsessment Tool (PROBAST) risk of bias assessment
Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S; PROBAST Group
PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies.
Ann Intern Med. 2019 Jan 1;170(1):51-58. doi: 10.7326/M18-1376

Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S.
PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration.
Ann Intern Med. 2019 Jan 1;170(1):W1-W33. doi: 10.7326/M18-1377.
Reporting often poor: motivations for TRIPOD
Bouwmeester W, Zuithoff NP, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, Altman DG, Moons KG.
Reporting and methods in clinical prediction research: a systematic review.
PLoS Med. 2012;9(5):1-12. doi: 10.1371/journal.pmed.1001221

Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, Voysey M, Wharton R, Yu LM, Moons KG, Altman DG.
External validation of multivariable prediction models: a systematic review of methodological conduct and reporting.
BMC Med Res Methodol. 2014 Mar 19;14:40. doi: 10.1186/1471-2288-14-40

Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L.
Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review.
BMC Med Res Methodol. 2022 Jan 13;22(1):12. doi: 10.1186/s12874-021-01469-6

Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS.
Risk of bias of prognostic models developed using machine learning: a systematic review in oncology.
Diagn Progn Res. 2022 Jul 7;6(1):13. doi: 10.1186/s41512-022-00126-w
Illustration in COVID modeling studies
Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, Bonten MMJ, Dahly DL, Damen JAA, Debray TPA, de Jong VMT, De Vos M, Dhiman P, Haller MC, Harhay MO, Henckaerts L, Heus P, Kammer M, Kreuzberger N, Lohmann A, Luijken K, Ma J, Martin GP, McLernon DJ, Andaur Navarro CL, Reitsma JB, Sergeant JC, Shi C, Skoetz N, Smits LJM, Snell KIE, Sperrin M, Spijker R, Steyerberg EW, Takada T, Tzoulaki I, van Kuijk SMJ, van Bussel B, van der Horst ICC, van Royen FS, Verbakel JY, Wallisch C, Wilkinson J, Wolff R, Hooft L, Moons KGM, van Smeden M.
Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal.
BMJ. 2020 Apr 7;369:m1328. doi: 10.1136/bmj.m1328
Illustration with a short form for pragmatic assessment with key PROBAST items
Venema E, Wessler BS, Paulus JK, Salah R, Raman G, Leung LY, Koethe BC, Nelson J, Park JG, van Klaveren D, Steyerberg EW, Kent DM.
Large-scale validation of the prediction model risk of bias assessment Tool (PROBAST) using a short form: high risk of bias models show poorer discrimination.
J Clin Epidemiol. 2021 Oct;138:32-39. doi: 10.1016/j.jclinepi.2021.06.017