Random Forest achieved the highest F1 score of 0.8586 with precision 0.8817
Logistic Regression collapsed to F1=0.1088 despite AUC 0.9696 — proof that AUC alone is misleading on imbalanced data
Dashed reference lines show this study vs. published baselines (Iqbal et al., 2025)
All four models exceeded AUC 0.96 — confirming strong threshold-independent discrimination
Stacking Ensemble curve hugs the upper-left corner most tightly (AUC = 0.9838)
Logistic Regression diverges at low false positive rates, exposing its linear boundary limit
Explain why PR curves matter more than ROC for 0.172% prevalence. Random Forest maintained highest precision across the full recall range.
At P(Fraud) = 1.0000, LIME identified V17 ≤ −5.29, V4 ∈ (1.33, 4.20), and V10 ≤ −4.59 as the three strongest fraud-driving conditions. This output directly satisfies EU AI Act Article 13 individualised explanation requirements.
Why SHAP Beat Pearson Correlation
V1 appeared in SHAP Top-10 but NOT Pearson Top-10 → proof of a non-linear contribution invisible to linear correlation
9/10 features appeared in both lists → confirms robustness of the fraud signal
Validates attribution-based (SHAP) feature selection as superior for non-linear ensemble models