• Course:    ECE 286, Algorithms for biological data analysis
  • Instructor: Siavash Mirarab (or, if you like a longer version, Mir arabbaygi)
  •                  Office Hours: Monday, 4:00-5:00pm,  ECE 6403
  • TA:            Erfan Sayyari esayyari@eng.ucsd.edu
  •                  Offie Hours: Wednesday, 10:30-11:30AM, ECE 6510
  • Class:       Tues/Thurs from 5:00-6:20pm. WLH 2205


There will be a make-up class in EBU1 4309 on Wednesday Feb 22nd from 2:00-3:30pm.


This course introduces a series of general algorithmic techniques but uses computational evolutionary biology as the context. The course motivates each algorithmic concept using a specific biological application related to evolution and focuses the discussion on specific types of (big) data available in modern biological studies. However, no prior knowledge of biology is needed and all relevant concepts will be briefly described in the class. The course is focused on algorithms, data, and mathematics, and we hope the techniques we learn is applicable to other domains. We assume a working knowledge of programming but do not require programming in any particular language. The goal of the course is not to teach programming to biologists. Neither do we teach students how to use specific bioinformatics tools or databases. While we cover the mathematical basis of methods, proofs and heavy math is also not the focus.

A particular focus of the course will be on scalability. We compare and contrast various algorithms designed for the same problem in terms of their ability to accurately and efficiently analyze large datasets. We emphasize that big data require thinking about scalability, but also about accuracy.

The evolutionary biology covered is mostly at the species level, and hence relates to phylogenetics. The interested students are invited to learn more about this fascinating topic (resources will be added soon) but no prior knowledge is assumed or required.

The course will involve a term-long project (details to be finalized). The goal of the project is for students to either 1) develop new algorithmic techniques, 2) improve the scalability of an existing technique, or 3) perform scientific comparisons of various existing methods in terms of scalability and accuracy. The students will be given the chance to work on their datasets of choice, as long as the questions being addressed are sufficiently related to the course (we may restrict to biological data).  Beyond the course project, homework, written, or oral exams will be used for student evaluation.

Note: this site will be regularly updated with new material and links.

Course work:

There will be 4 homework assignments designed for throughout the course. There is going to also be a project. One homework is going to be a proposal for the project. Check the homeworks and project tab on the left. 


The grading is based on 3 Homework assignments and one project for this course. The tentative grading scale for this course is 

  • Homework 1:        20%
  • Homework 2:        20%
  • Homework 3:        20%
  • Project Proposal: 10%
  • Project:                 30%