Evaluation of Prediction models

AIME 2023 tutorial on Evaluation of Prediction Models in Medicine. Monday 12 June, morning session.

New: You can find the hand-outs here

Tutorial Description

The reliable prediction of outcomes from disease and treatment is becoming increasingly important in the delivery and organization of health care. The learning objective of this tutorial is to understand the elements underlying predictive performance and to show how to quantitatively assess the performance of prediction models. In particular, I address different categories of performance measures (including calibration, sharpness, resolution, and discrimination) and valid methods (including bootstrapping and cross validation) for obtaining performance assessments. I will also address the important difference between prediction and causal models, and how prediction models can still play a part in causality.

The focus of the tutorial is on conceptual frameworks. Attention will be paid to the various choices in the design of model evaluation procedures, and the relationship between model evaluation and the purposes for which a model has been built. All methods are illustrated with real-world examples.

Audience

This tutorial is meant for medical informatics and computer science researchers, health care workers, and epidemiologists who are developing, evaluating, or using prediction models. The participants should be acquainted with the general concept of a prediction model and with the concept of probability.

Expected outcome

Upon following the tutorial the participants should be able to understand and assess the performance of prediction models; know how to report on them; and be able to critique performance assessment reported in the literature.

Tutorial Speaker

Ameen Abu-Hanna is full professor at the department of Medical Informatics at the Academic Medical Center at the University of Amsterdam. He is Principal Investigator in the research area Methodology in Medical Informatics with interest in artificial intelligence, statistical machine learning, NLP, causal models, and decision support systems. He was formerly an associate editor of the Journal of Biomedical Informatics and president of the European Society of AI in Medicine. In 2017 Ameen became a founding Fellow of the International Academy of Health Sciences Informatics.

General Organization

Introduction

Prediction models and causality
Reasons to evaluate
Model building and evaluation

Predictive performance measures

Performance aspects
- Discrimination
- Calibration
- Range
- Sharpness
- Resolution
Performance measures
- AUC and other ROC-based methods
- The problem of AUC with imbalanced data
- The Area under the Precision-Recall Curve
- Brier (Skill) score
- Net Reclassification Improvement
- Calibration graphs and calibration intercept and slope
Characteristics of performance measures
- Proper scoring rules
- Strictly proper scoring rules

Model selection and validation

Model selection
- The bias-variance trade-off
- Cross-validation, Information Criteria
- Bootstrapping
Internal, temporal and external validation
Comparing model performance

Reflection

Opportunities and threats in the Big Data era.