• Please subscribe to the course mailing list.
  • The first meeting will be held on Tuesday, January 7, 2014.

Course Description 

Biological sciences are becoming data-rich and information-intensive. Nowadays, it became possible to obtain very detailed information about living organisms. For instance, we can obtain DNA sequence (3 billion-long string) information, expression (activity) levels of >20,000 genes, and various clinical measurements from humans. The growing availability of such information promises a better understanding of important questions (e.g. causes of diseases). However, the complexity of biological systems and the high-dimensionality of data with noise make it difficult to infer such mechanisms from data.

Machine learning (ML) techniques have become very useful tools for resolving important questions in biology by providing mathematical frameworks to analyze vast amount of biological information. Biology is also a fascinating application area of ML because it presents new sets of computational challenges that can ultimately advance ML. In this course, we will discuss ML/statistical techniques that have been applied to exciting problems in genetics, systems biology, sequence analysis and predictive medicine.

No background in biology is required. Students are expected to have taken undergraduate-level machine learning or statistics courses, and have programming skills in MatLab, R, C++, JAVA, Perl, or Python.


  • 4 Homework assignments (15% each)
  • Final project (35%)
    • Proposal (5%)
    • Midterm report (10%)
    • Presentation (5%)
    • Final report (15%)
  • Attendance/participation (5%)