Tendai Coady

Leveraging Biological Mechanisms in Machine Learning-Based Breast Cancer Prognostication

Tendai Coady


Mentor: Luigi Marchionni

Sidney Kimmel Comprehensive Cancer Treatment Center at Johns Hopkins University (currently Weill Cornell Medicine)


Incorporation of machine learning into personalized cancer treatment has been suggested as a promising field in oncology; however it has not yet been fully implemented in the clinical settings for the purpose of prognostication. A hindering factor is overfitting of prediction models, when the models perform very well in the dataset used to develop them, but fail on independent new data. Overfitting is caused by the “curse of dimensionality,” a conundrum deriving from the discrepancy between the high number of features (e.g., all the genes expressed in a tumor) and the low number of samples containing these features (e.g., the patients). We tested the hypothesis that the use of prior biological knowledge to reduce the number of features used in the training process could help to circumvent the curse of dimensionality. We trained and tested two k-Top Scoring Pair (k-TSP) classifiers using relative gene expression levels in pairs to determine cancer prognosis in breast cancer patients from the METABRIC dataset. One classifier was trained using all possible gene pairs (agnostic), while the other was trained using only gene pairs justifiable via knowledge of biological mechanisms (mechanistic). Two Random Forest (RF) classifiers were also trained for the same purpose. Testing showed little difference in performance between mechanistic and agnostic classifiers for both classifier types. While our findings did not prove that the use of mechanistic biological knowledge with the mechanisms chosen can improve classification performance and reduce overfitting in this case, the approach merits further investigation.



Tendai Coady - ORAL PRESENTATION ULTIMATE GRAND FINALE