CLL Dataset
Multiple-template Data augmentation solution solve the issue with limited patient samples size
4-template augmentation reached accuracy over 75%
Proved that the appraoch to solve the lack of samples is effective
Proved that the more templates is made to enrich the traning datasets, the higher the testing accuracy
The method is limited to the UMAP runtime. The average runtime to generate a set of embedded images from one template is about 10 hours for a 300-sample datasets.
The robustness of the model increases, as the traning accuracy and the testing accuracy is approaching as the number of templates used for augmentation increases
Hyperparameter optimization with Gridsearch improved the accuracy by 4%.
Figure 6. Accuracies for experiemnts that is augmented with different number of templates
Figure 7. Trends of testing, training and validation accuracy with increasing number of template for augmentation.
2. Techniques to proved to be effective
Overall boosted accuracy from 66% to 76%
GPU: GTX 1070, 8GB RAM, 1920 CUDA Core (cost vs. time trand-off)
Hyperparameter optimization: GridSearch improved the accuracy by 4%.
Ensemble learning algorithm: XGBoost combine multiple models for the optimal performance. Improve accuracy depending on specific datasets.
Figure 8. Gridsearch parameter range and optimal values
ALL Dataset
UMAP Construction and feature transformation, Multiple templates + Single Positive Voting
5 sets of training templates created, 5 templates per set
The testing accuracy varies when we choose different samples to build the template.
There is improvement when we apply this approach: the accuracy level increases from low 80%'s to high 80%'s
Figure 9. The Cross-Validation results for multiple templates + single positive voting approach
2. UMAP Construction and feature transformation, Density-Bias Selection
3 sets of different cells selected, 3 templates in total
The testing accuracy are similar when we select different groups of cells from the same set of samples
There is improvement when we apply this approach: the accuracy level increases from low 80%'s to high 80%'s
Certain ALL "hard cases" remain unidentified
Figure 10. The Cross-Validation results for density-bias approach
Page Leader: Shihui Zhu