Syllabus
Introduction
Basics of biological data
Role of algorithms and data structures
High-throughput DNA/RNA sequencing
Data structures and algorithms warm-up
Exact string pattern matching
Z algorithm
Genome-scale indexing
Suffix arrays
Suffix tries and suffix trees
Burrows-Wheeler Transform, FM-Index
Approximate pattern matching
Hamming distance, edit distance, dynamic programming
Pairwise sequence alignment
Global vs. local alignment, scoring functions
Alignment-free sequence comparison
k-mer indexes
Co-linear chaining
Sketching and sublinear data structures
Genome Assembly
de Bruijn graphs, overlap graphs
Haplotype assembly and phasing
Pattern discovery
Hidden Markov models, gene finding
Genome compression
General purpose compression: Lempel–Ziv
Domain-specific compression algorithms
Building phylogenetic trees
The tree of life
Maximum parsimony, maximum likelihood algorithms
Other topics (guest lectures)
Variant calling & emerging deep learning-based methods
Recent advances in genome assembly
Guest lectures
We were fortunate to hear cutting-edge bioinformatics research from the following presenters in the previous iterations of DS202 course.
Jim Shaw (2024)
Mikko Rautiainen (2023)
Niranjan Nagarajan (2022)
Amatur Rahman (2022)
Mile Sikic (2021)
Ramesh Hariharan (2021)
Kishwar Shafin (2021)
Project topics selected by students previously
2024
Linear time suffix tree construction using Ukkonen's algorithm
Algorithms for computing tumour phylogenies
Genotype imputation using hidden Markov Models
Transformer-based deep learning model to predict sequence consensus
Reproducible feature selection in deep neural networks
2023
Likelihood-based algorithms for de novo genome assembly
Greedy algorithm for aligning DNA sequences
Near-linear time edit distance algorithm
Using stochastic context-free grammar to predict RNA structure
Building phylogenetic trees using maximum likelihood
2022
Accelerating sequence alignment using parallel prefix algorithm
Using wheeler graph as succinct index for efficient pattern matching
Construction of string graphs using FM-index for de novo genome assembly
Gene finding using hidden Markov models
2021
Succinct representation of de Bruijn graphs
Minimizer-based k-mer sampling and its applications
Distributed-memory parallel algorithms for construction of suffix array and suffix tree
Optimal algorithms for sequence to pangenome graph alignment
Prerequisites
Knowledge of basic data structures, algorithms, programming experience
DS-221 Intro to scalable systems (or) DS-201 Bioinformatics (or) E0-251 Data structures and algorithms (or) E0-225 Design and analysis of algorithms (or) consent from the Instructor