ADASYN, which stands for Adaptive Synthetic Sampling Approach, is a technique used to address class imbalance in datasets. It generates synthetic data points for the minority class, focusing on areas where the minority class is harder to learn. This approach helps to balance the dataset and improve the performance of machine learning models on imbalanced data.
1. Split
The dataset was stratified and split using a 75:25 ratio to ensure sufficient test samples, especially given the small stroke class. Feature selection and scaling were performed post-split.
2. ADASYN
To address class imbalance, ADASYN (Adaptive Synthetic Sampling) was used instead of SMOTE. This dynamic technique focuses on harder-to-learn examples, improving recall on the minority class. A 0.9 sampling ratio was used.
Class distribution after ADASYN:
Counter({0: 3583, 1: 3252})