Results and Conclusion

Results

F-Beta Plots

Conclusion

Overall, LightGBM performed the best, having the highest mean/median F-beta scores and least outliers among all the models tested. The runtimes LightGBM, compared to specifically SVM and Random Forest, were also significantly faster. Neural networks also generally performed well in validation, however, its test set had lower F-beta scores compared to other models For pre-processing, coefficient of variation with a threshold of 3.5 had the most impact on the F-beta values.

Future Directions

Currently our design has tested and run models using support vector machines, logistic regression, neural networks, and random forest, but there are many more machine learning algorithms to be explored and tested against this type of dataset. With this, a more nuanced evaluation of different types of classifier could be analyzed and evaluated to see which types perform better at the cell type matching.

To better classify the data, the means by which the data is filtered could be improved to select the most important features. Feature selection can greatly impact the performance of the machine learning model and thus can drastically improve the results obtained. Looking into other statistical methods to choose the most significant features could prove useful in narrowing the data and yield the marker genes that belong to certain cell clusters.

Made by Huy Le