The Impossibility in Practice: Mapping How Algorithmic Fairness Methods Redistribute Disparities Across Competing Fairness Definitions in Health Prediction
Algorithmic risk scores increasingly guide health resource allocation, yet evidence on how fairness-enhancing methods perform comparatively in health prediction remains limited. We benchmarked 21 methods (3 pre-processing, 8 in-processing, 9 post-processing, plus an uncorrected baseline) on the prediction of four health outcomes — depression, quality of life, self-rated health, and life satisfaction — using data from 69,447 respondents across 28 countries in the Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 9. The TPR gap and PPV gap across methods were strongly negatively correlated (r = −0.76, p < 0.001): methods that nearly eliminated sensitivity disparities inflated predictive value disparities by up to 128%. Post-processing methods achieved this without ranking discrimination loss, while in-processing methods paid 9–13 percentage points of AUROC. The dominant constraint is not accuracy versus fairness, but between competing fairness definitions — a practical manifestation of impossibility theorems. Health systems must explicitly choose which fairness criterion to prioritize; there is no neutral default.