Results

24 Configurations. 56,962 Test Transactions. One Clear Winner.

284,807

Total Transactions

492 Fraud Cases (0.172%)

24 Configurations Tested

56,962

Test Set Size

Table 4

Table IV: Full 24-configuration experimental design crossing Model Type, Feature Selection Strategy, and Resampling Technique.

Table 5

Table V: Phase 1 Model Performance Comparison — SMOTE + All 30 Features, 56,962 test transactions.

F1 / ROC-AUC Bar Chart

Random Forest achieved the highest F1 score of 0.8586 with precision 0.8817
Logistic Regression collapsed to F1=0.1088 despite AUC 0.9696 — proof that AUC alone is misleading on imbalanced data
Dashed reference lines show this study vs. published baselines (Iqbal et al., 2025)

ROC-AUC Curves

All four models exceeded AUC 0.96 — confirming strong threshold-independent discrimination
Stacking Ensemble curve hugs the upper-left corner most tightly (AUC = 0.9838)
Logistic Regression diverges at low false positive rates, exposing its linear boundary limit

Confusion Matrix

TP: 82 fraud cases caught

FP: 11 false alarms

FN: 16 missed frauds

TN: 56,853 transactions cleared

Precision-Recall Curves

Explain why PR curves matter more than ROC for 0.172% prevalence. Random Forest maintained highest precision across the full recall range.

SHAP Beeswarm Plot

V14: Dominant fraud signal — anomalously low values correlate strongest with fraud

V12: Second most influential — strong negative SHAP at low values

V4: Third — confirms multi-feature interaction patterns missed by single-metric models

LIME Explanation

At P(Fraud) = 1.0000, LIME identified V17 ≤ −5.29, V4 ∈ (1.33, 4.20), and V10 ≤ −4.59 as the three strongest fraud-driving conditions. This output directly satisfies EU AI Act Article 13 individualised explanation requirements.

SHAP vs. Pearson Comparison

Why SHAP Beat Pearson Correlation

V1 appeared in SHAP Top-10 but NOT Pearson Top-10 → proof of a non-linear contribution invisible to linear correlation
9/10 features appeared in both lists → confirms robustness of the fraud signal
Validates attribution-based (SHAP) feature selection as superior for non-linear ensemble models

Table 6

Table VI: Extended Confusion Matrix Metrics for Random Forest (MT1-FSS1-RT1)

Page updated

Report abuse