Syllabus

Introduction

Basics of biological data

Role of algorithms and data structures

High-throughput DNA/RNA sequencing 

Data structures and algorithms warm-up

Exact string pattern matching

Z algorithm

Genome-scale indexing

Suffix arrays

Suffix tries and suffix trees

Burrows-Wheeler Transform, FM-Index

Approximate pattern matching

Hamming distance, edit distance, dynamic programming

Pairwise sequence alignment

Global vs. local alignment, scoring functions

Alignment-free sequence comparison

k-mer indexes

Co-linear chaining

Sketching and sublinear data structures

Genome Assembly

de Bruijn graphs, overlap graphs

Haplotype assembly and phasing

Pattern discovery

Hidden Markov models, gene finding

Genome compression    

General purpose compression: Lempel–Ziv

Domain-specific compression algorithms 

Building phylogenetic trees 

The tree of life

Maximum parsimony, maximum likelihood algorithms

Other topics (guest lectures)   

Variant calling & emerging deep learning-based methods

Recent advances in genome assembly

Guest lectures

We were fortunate to hear cutting-edge bioinformatics research from the following presenters in the previous iterations of DS202 course.

Project topics selected by students previously

2024

2023

2022

2021

Prerequisites