Syllabus

CSE599S: Machine Learning in Computational Biology

Spring 2021 (March 29, 2021 - June 07, 2021)

Topics

  • Confounding and batch effect challenges in genomic datasets

  • Autoencoders for integration of expression datasets

  • Explainable ML for mechanistic biological discovery

  • Causality and gene expression

  • End-to-end predictive models for protein structure and interactions

  • Generative models for protein design

  • ML approaches for single cell gene expression data


Learning objectives

  • Exposure to a variety of cutting-edge research directions in Computational Biology

  • Critically evaluate literature, and identify strengths and weakness of ML approaches and models in the context of biological discovery

  • Identify strategies to address typical challenges faced in MLCB, including design of baseline experiments, and model evaluation with limited "gold-standard" data


Course format

The course will start off with two overview lectures to describe the landscape of computational biology, including review material covering relevant aspects of ML and biology. The rest of the classes will alternate between invited lectures and small student group presentations of research papers. The goal of the latter is to contextualize the invited lecture and enable relevant discussion. The instructor will also lead lectures with introductory material when appropriate.


Students will be asked to submit a brief summary of one research paper each week -- these research papers are selected as appropriate to lectures by invited speakers (see reading and schedule pages). The reviews will be due before each Monday's class. Once submitted students can see others' reviews. The purpose of this exercise is to facilitate student understanding and engagement.


Student evaluation

Student evaluations will be based on combinations of paper reviews (see template here), group presentations, student participation in class discussion, and a class project. Students can choose between doing a class project vs an additional paper reviews.


Option 1

  • Paper reviews: 30% (6 short reviews x 5% each = 30% total). See template for paper review.

  • Student participation in class discussion: 20%

  • Student research paper presentation (2 group-based oral presentations): 30%

  • Course project: 20% (report)


Option 2

  • Paper reviews: 50% (10 short reviews x 5% each = 50% total)

  • Student participation in class discussion: 20%

  • Student research paper presentation (2 oral presentation/student): 30%


Pre-requisites

  • Have completed a formal class in statistics.

  • Have completed an introductory class in Machine Learning.

  • Having completed an introductory course in computational biology, for example CSE427 or CSE527, or equivalent offered by other institutions. To fulfill this requirement student can also complete an online course including MIT's ML in genomics course that is freely available.

  • Competency in programming in python or a similar language. This will be required for implementing your course project.

  • Familiarity with general principles molecular biology (e.g., a college level introductory course or self-preparation through online lectures/tutorials).