Undergraduates participating in BDSI 2019 were assigned to one of three research groups – Machine Learning, Genomics, and Data Mining. Together with 14 other students, I was part of the Data Mining Group. We were supervised by Faculty Mentors Prof. Johann Gagnon-Bartsch and Prof. Jonathan Terhorst and Graduate Student Instructors Zoe Rehnberg and Anwesha Bhattacharyya at the Department of Statistics at the University of Michigan at Ann Arbor. The goal of our research project was building classification methods that predict the effect of a chemotherapy applied to a cell-line based on its genomic information.
Below you can find the steps of our research split weekly and a broader project description. You can also find this information in the file Research Description.
1. Bhattacharyya and Rehnberg, Introduction
2. Rehnberg, Screening Cleaning
3. Rehnberg, Expression Cleaning
1. Bhattacharyya, Classification Methods
2. Bhattacharyya, Support Vector Machines
Note: I have not included the file for copynumber imputation, as we later identified a mistake in it. We used the R package ‘missForest’. We also tried ‘MICE’, ‘Amelia’, and ‘Hmisc’.
3. Research Group, Logistic Regression LASSO
4. Research Group, Naïve Bayes
6. Research Group, PCR Logistic
7. Research Group, Random Forest
8. Research Group, SVM with Linear Kernel
9. Research Group, SVM with Polynomial Kernel
1. Research Group, Combined KNN voters
2. Research Group, Combined PCR
3. Research Group, Splitting Methylation
4. Research Group, Implicit Imputation via PCA
Note: I have not included the code for cleaning the methylation and expression datasets and splitting the methylation dataset, as these are identical respectively to: Rehnberg, Expression Cleaning; Rehnberg, Methylation Cleaning; Research Group, Splitting Methylation.
Note: There is a mistake in the plot or drug 1054 in the presentation and poster. The plot currently shows a perfect classifier performance
1. Research Group, Plotting Research Results