EvaluatiON OF Prediction models

AIME 2017 tutorial on Evaluation of Prediction Models in Medicine

Tutorial Description

The reliable prediction of outcomes from disease and treatment is becoming increasingly important in the delivery and organisation of health care. The learning objective of this tutorial is to show how to quantitatively assess the performance of prediction models. In particular, I address different categories of performance measures (including calibration and discrimination) and valid methods (including bootstrapping and cross validation) for obtaining performance assessments. I will also provide a step-wise framework for developing, evaluating, and reporting on prediction models.

The focus of the tutorial is on conceptual frameworks. Attention will be paid to the various choices in the design of model evaluation procedures, and the relationship between model evaluation and the purposes for which a model has been built. All methods are illustrated with real-world examples from domains such as cardiac surgery and intensive care medicine.


This tutorial is meant for medical informatics and computer science researchers, health care workers, and epidemiologists who are developing, evaluating, or using prediction models. The participants should be acquainted with the general concept of a prediction model and with the concept of probability.

Expected outcome

Upon following the tutorial the participants should be able to assess the performance of prediction models; know how to report on them; and be able to critique performance assessment reported in the literature.

Tutorial Speaker

Ameen Abu-Hanna is Professor and Chair of the Department of Medical Informatics at the Academic Medical Center at the University of Amsterdam. He is Principal Investigator in the research area Methodology in Medical Informatics with interest in artificial intelligence, machine learning and decision support systems. He is a former associate editor of Journal of Biomedical Informatics and president of the European Society of AI in Medicine.

General Organization


  • Prediction models
    • Informing patients, triage, and benchmarking
  • Reasons to evaluate
  • Model building and evaluation
  • Difference between prediction and etiological models

Performance measures

  • Accuracy Measures
    • ROC analysis
    • Generalized measures
  • Net Reclassification Improvement
  • Precision
    • Calibration
    • Indirect measures of precision
  • Sharpness
  • Proper and non-proper measures

Model selection and validation

  • Model selection
    • The bias-variance trade-off
    • Cross-validation, Information Criteria
    • Bootstrapping
  • Internal, temporal and external validation
    • A framework for understanding external validation
  • Comparing different models.
  • Opportunity and threats in the Big Data era.


  • A stepwise framework for developing, evaluating and reporting on prediction models
AIME2017 Abu-Hanna All.pdf