Syllabus
CSE599S: Machine Learning in Computational Biology
Spring 2021 (March 29, 2021 - June 07, 2021)
Topics
Confounding and batch effect challenges in genomic datasets
Autoencoders for integration of expression datasets
Explainable ML for mechanistic biological discovery
Causality and gene expression
End-to-end predictive models for protein structure and interactions
Generative models for protein design
ML approaches for single cell gene expression data
Learning objectives
Exposure to a variety of cutting-edge research directions in Computational Biology
Critically evaluate literature, and identify strengths and weakness of ML approaches and models in the context of biological discovery
Identify strategies to address typical challenges faced in MLCB, including design of baseline experiments, and model evaluation with limited "gold-standard" data
Course format
The course will start off with two overview lectures to describe the landscape of computational biology, including review material covering relevant aspects of ML and biology. The rest of the classes will alternate between invited lectures and small student group presentations of research papers. The goal of the latter is to contextualize the invited lecture and enable relevant discussion. The instructor will also lead lectures with introductory material when appropriate.
Students will be asked to submit a brief summary of one research paper each week -- these research papers are selected as appropriate to lectures by invited speakers (see reading and schedule pages). The reviews will be due before each Monday's class. Once submitted students can see others' reviews. The purpose of this exercise is to facilitate student understanding and engagement.
Student evaluation
Student evaluations will be based on combinations of paper reviews (see template here), group presentations, student participation in class discussion, and a class project. Students can choose between doing a class project vs an additional paper reviews.
Option 1
Paper reviews: 30% (6 short reviews x 5% each = 30% total). See template for paper review.
Student participation in class discussion: 20%
Student research paper presentation (2 group-based oral presentations): 30%
Course project: 20% (report)
Option 2
Paper reviews: 50% (10 short reviews x 5% each = 50% total)
Student participation in class discussion: 20%
Student research paper presentation (2 oral presentation/student): 30%
Pre-requisites
Have completed a formal class in statistics.
Have completed an introductory class in Machine Learning.
Having completed an introductory course in computational biology, for example CSE427 or CSE527, or equivalent offered by other institutions. To fulfill this requirement student can also complete an online course including MIT's ML in genomics course that is freely available.
Competency in programming in python or a similar language. This will be required for implementing your course project.
Familiarity with general principles molecular biology (e.g., a college level introductory course or self-preparation through online lectures/tutorials).