The Solution

A Two-Phase Explainable Ensemble Learning Framework

Framework Architecture

Phase 1 trained and compared 24 experimental configurations crossing 4 model types, 3 feature selection strategies, and 2 resampling conditions.

Phase 2 applied dual XAI explainability exclusively to the best-performing model, generating compliance documentation directly mappable to EU AI Act obligations.

Fig. 1. System architecture of the proposed explainable ensemble framework depicting flow from data processing to compliance generation.

Preprocessing

Log-transformation of Amount, z-score normalisation of Amount and Time. Zero missing values confirmed across 284,807 records.

SMOTE Resampling

Synthetic Minority Over-sampling corrected the 578:1 imbalance, expanding training data to 454,902 balanced records.

Stacking Ensemble

Random Forest + XGBoost as Level-1 base learners, Logistic Regression as Level-2 meta-learner via 5-fold cross-validation.

SHAP + LIME

SHAP global attribution ranked 500 test instances. LIME constructed locally linear surrogates for individual fraud cases.

Three Feature Selection Strategies Compared

FSS0: All 30 original features (baseline)

FSS1: Top 20 by SHAP global importance (attribution-based)

This design enabled the first direct empirical comparison of attribution-based vs. correlation-based feature selection under identical model configurations.

Page updated

Report abuse