By looking at the confusion matrices, we realised that the minority of the dataset, Normal MMSE only has 9% of the data. The precision, recall and f1 scores for Normal class are nearly 0. This indicates that both machine learning models cannot detect Normal MMSE correctly.
Thus, resampling of data is needed to be carried out so that repeated samples can be drew from the original data samples.
Resampling of Data
Data is resampled using SMOTE, RandomUnderSampler and Pipeline
Feature Selection (SelectKBest)
We select the features that are score>100, which are the top 11 features.
X_train_f and X_test_f are updated with the new top 11 features.
Predictive Analysis 1.0
Test Accuracy of DecisionTrees after resampling is 52.1%
Test Accuracy of RandomForest after resampling is 56.5%
Both machine learning models increased the precision, recall and f1 scores for Normal MMSE. However, the accuracy is relatively low compared to the one before resampling of data.
Thus, we decided to try on another feature selection technique, which is ExtraTreesClassifer.
Feature Selection (ExtraTreesClassifer)
We selected the features that score>0.02, which are the top 12 features. After that, X_train_f and X_test_f are updated with these features.
Predictive Analysis 2.0
Test Accuracy of DecisionTrees remained the same, which is 54%
Test Accuracy of RandomForest increased to 58.5%
In conclusion, ExtraTreesClassifier has lower scores for precision, recall and f1-socre but better accuracy; while SelectKBest has lower accuracy but scores better in precision, recall and f1-score.