Computational Biology

Course Logistics

This twenty-weeks long course is designed to introduce students of Biosciences to the Computational Biology- using computer for biological data analysis.

PowerPoint slides of lectures will be available on the presentation part of the course website. The lectures build on each other, so you should view them in the order listed. The few lectures listed as "Honors" or "Optional" will not be covered on the quizzes but are provided in case you want to go into more depth.

Prescribed textbook for this course is Baxevanis, A.D. and Ouellette, B.F.F. (2004). Bioinformatics: A Practical guide to the Analysis of Genes and Proteins. Wiley-Interscience, USA. E-Book


Short ungraded exercises will occasionally be embedded into lectures. Each lecture will be followed by ungraded exercises. Answers to these exercises will be given and explained after you reply to the questions. These exercises do not introduce new material, but they enable you to check your understanding of the lectures. If you miss too many of the exercises after a lecture, then you might want to listen to that lecture again.

Time Commitment

You should expect to spend on about 2 hours per week watching the lectures, another 2 hours per week doing the exercises, and about 1 hour on each quiz. You might want to spend additional time doing the reading and participating in the discussion forums.

Online Q&A Section

Hangouts Group Video Chat Session

Next Google Hangouts Group Video Chat Session is scheduled on August 23, 2017 4PM-5PM Indian Standard Time

How to join?
    Just click on the above image on the schedule. I will be available 5 minutes before the schedule. Make sure you are on a mic, and at a well-lit place. Hangouts works on mobile phones as well with Hangouts application.

Previous schedules (2016)
August 24
Sep 7
Sep 23
Oct 5
Oct 21

Syllabus with Course Materials

Unit 1:

Biological Databases: Nucleotide Sequence Databases, GenBank, DDBJ, EMBL, Sequence Flatfile and submission process, Protein sequence databases, UniProt in detail, Mapping databases, Genomic databases, Data mining.

DNA Sequence Databases

Protein Databases

Interaction and Pathway Databases


Entrez problem set
GenBank problem set
Pubmed Tutorial

Entrez problem set

 The Bioinformatics Gold Rush Scientific American 283.1 (2000): 58-63.

Principles of protein-protein interactions

S Jones, JM Thornton - Proceedings of the National …, 1996 - National Acad Sciences

Protein-Protein Databases

  1. BIND Biomolecular Interaction Network Database
  2. STRING: STRING is a database of known and predicted protein-protein interactions. (EMBL)
  3. Relibase (needs log-in!)
Singnal transduction pathway databases
  1. Cancer Cell Map
  2. Netpath - A curated resource of signal transduction pathways in humans
  3. NCI-Nature Pathway Interaction Database
  4. Reactome - Navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling.
  5. Biocarta
  6. WikiPathways
Metabolic Pathway Databases
  1. BioCyc Database Collection including EcoCyc and MetaCyc
  2. KEGG PATHWAY Database (Univ. of Kyoto)
  3. MANET database  (University of Illinois)
  4. Metabolight Metabolomics experiments and derived information: metabolite structures, reference spectra, biological roles, locations and concentrations. (European Bioinformatics Institute)
  5. Reactome Navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling. (Cold Spring Harbor LaboratoryEuropean Bioinformatics Institute, Gene Ontology Consortium)
Taxonomic Databases
Phylogenetic Databases

Unit 2:

Analysis for nucleotide sequences: Gene Prediction methods and programs, Markov and Hidden Markov models in gene prediction, Promoter analysis, RNA secondary structure thermodynamics, Dynamic programming and genetic algorithms for secondary structure prediction, refining multiple sequence alignment based on RNA secondary structure predictions, Vienna RNAfold, 

Evolution and origins of sequence polymorphisms, SNP discovery methods and databases, Genotyping, International haplotype map project, 1000 genomes project. Presentation

Unit 3:

Analysis for protein sequences: Predicting features of individual residues, Predicting function, Neural Networks, Protein structure prediction, Protein structure databases, PDB in detail, 3D visualization softwares, Pathway and molecular interaction databases, Prediction algorithms for pathways and Molecular Interactions, Integrating gene expression data with pathway information.

Unit 4

Inferring relationships: Global Vs. local sequence alignments, Dotplots, Scoring matrices, Pairwise sequence alignment, BLAST, Position-Specific scoring and PSI-BLAST, MegaBLAST, BL2SEQ, BLAT, FASTA Vs BLAST, Protein multiple sequence alignments, Multiple structural alignments, Shotgun sequencing, Sequence assembly and finishing.

Sequence Alignment



Unit 5:

Phylogenetic Analysis: Basics of phylogenetics, Nucleotide substitution models and selection, Distance-matrix-based methods, Neighbor-Joining, Fitch-Margoliash, Outgroups, UPGMA, Minimum Evolution, Maximum Parsimony, Maximum Likelihood, Bayesian Inference, Searching for trees, Rooting trees, Bootstrapping, Likelihood ratio tests.

BAST, F. 2015. Tutorial on Phylogenetic Inference Part-2. Resonance 20 (5) 445-457 PDF

BAST, F. 2015. Tutorial on Phylogenetic Inference Part-1. Resonance 20 (4) 360-367 PDF


Current Protocols in Bioinformatics (2003) 6.1.1-6.1.13
Current Protocols Essential Laboratory Techniques 11.3.1-11.3.17, June 2009

Introduction to Computational Phylogenetics (With Understanding Trees and Applications)

Substitution Models

Methods: NJ + MP

Methods- ML+BI

K Tamura, D Peterson, N Peterson, G Stecher… - Molecular biology and …, 2011

X Xia, P Lemey - The Phylogenetic Handbook, 2009 -

Unit 6:

Genomics: Comparative Genomics, Genomic alignments, Gene predictions in genomic alignments, Genome-wide association study, Phylogenetic footprinting, Gene annotation, Gene expression analysis using DNA Microarray, Annotation of array probes, Image processing, Normalizing expression measurements.

Unit 7:

Proteomics: Major proteomic approaches, Protein analysis by MALDI and SELDI methods, Time of Flight MS in protein analysis, Protein Identification by Mascot, Peptide Mass Fingerprinting, Comparative proteomics, Two-Dimensional Polyacrylamide Gel Electrophoresis.


Suggested reading

  1. Baxevanis, A.D. and Ouellette, B.F.F. (2004). Bioinformatics: A Practical guide to the Analysis of Genes and Proteins. Wiley-Interscience, USA. (Course textbook, multiple copies available in the University Library) E-Book
  2. Hall, B.G. (2011). Phylogenetic Trees Made Easy: A How-To Manual. Sinauer Associates, Inc. USA.
  3. Lesk, A.M. (2008). Introduction to Bioinformatics. Oxford University Press, UK.
  4. Mount, D.W. (2005). Bioinformatics: Sequence and Genome Analysis. CBS Publishers, New Delhi, India.
  5. Ramsden, J. (2010). Bioinformatics: An Introduction (Computational Biology). Springer, India.
  6. Ye, S.Q.  (2008). Bioinformatics: A Practical approach. Chapman & Hall/CRC, UK.
  7. Zvelebil, M. and Baum, J. (2007). Understanding Bioinformatics, Garland Science, New York, USA.


  1. Presentation: Phylogenetic Analysis using MEGA
  2. Presentation: Base-calling, assembly, auditing, consensus generation, annotation and genbank submission.
    Trace files for the assembly and assembly instruction excel sheet here.


1. Databases-Problem set

(Pages 52 -64)

3.Finding best evolutionary models (Goodness of fit/ML ModelTest) Problem Set

4. Calculating evolutionary distances- Problem Set

5. Distance methods-Problem set

6. Cladistic Methods- Problem set