Molecular Biomedicine

Welcome to the (unusually long named)

Fundamental principals in Bioinformatics, Computational Biology and Biostatistics

A class that is part of the Molecular Biomedicine MSc Program.

9 Lectures on topics related to computational and statistical approaches of molecular/biomedical problems
Readings to be performed BEFORE each class (You have been warned)
Topics and questions to be discussed in class
Final evaluation will consist of written essay(s) and an (old-fashioned) in-class oral test.

Broad Picture: To remind you stuff you have chosen to forget after convincing you they were relevant all along. [Analytical thinking, quantitative approaches, statistical inference]
Concepts: To give you an overview of the problems solved by computational and algorithmic approaches in modern biomedicine.
Skillset: To show you how to perform (but more importantly how to interpret) standard bioinformatics analyses on genomic data.[Gene Expression, Functional Analysis, Modeling of biological data]

This weeks class is an introduction to the concepts we will be discussing throughout the semester. Namely:

The "informatics" in Bioinformatics
- What do we need it for
- What it may refer to
- What it actually is
Remembering stuff (from high-school)
- How to measure things
- How to assess/compare/interpret measurements

Students are all advised to access the Reading material (to be read BEFORE class) that you may find below:

A. Omes and Omics.
- NGS applications in Biology
- The problems
- The solutions
B. After NGS
Types of data
Data retrieval
Preparation for analysis

Reading Material:

Goodwin et al. (2016) Coming of Age. Ten years of NGS technologies. Nat Rev Genetics
Marx. (2013) The Big Challenges of Big Data. Nature
Downloading Genes from Databases (https://www.youtube.com/watch?v=EdY0Vt4xXjk) and (https://www.youtube.com/watch?v=KsgFMoTXu-g)
Lecture 2. "Omics" approaches to Biology

A. Quality Control
- Before QC
- Analysis of sequencing quality
- Resolving QC problems
B. Mapping
The problem of sequence alignment
Sequence Alignment algorithms
Alignment of big data

Reading Material:

A. The theory
- Transcriptome complexity
- Reconstruction of transcripts
- Assessing Expression
- Understanding trends in expression patterns
B. The practice
Transcript reconstruction
RPKM quantification of expression
Differential Expression statistics
Multiple Comparisons
Clustering of gene expression profiles

Reading Material:

A. Analysis of ChIPSeq data
- Normalization
- Creation of abundance profiles
- Localization of peaks of enrichment
B. Analysis of Peaks of enrichment
- Clustering of peaks
- Discovery/Inference of DNA-binding sites
- Assigning peaks to genomic regions of interest

Reading Material:

Genomic Variability Analysis
- Whole-Genome or Whole-Exome Sequencing
- Calling of variants
- Annotation of Variants

Reading Material:

Ανάλυση της Γενετικής Ποικιλομορφίας (Υπολογιστική Βιολογία, Κεφάλαιο 10). Χ. Νικολάου
Bamshad et al. (2011). Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genetics 12, 745-755
Khurana et al. (2013). Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics. Science 342
Understanding Odds-Ratios
Lecture 6. Analysis of Genomic Variation

B. Chromatin Structure Analysis
- Epigenetics and Histone-modifications
- Open Chromatin Approaches
- Chromosomal Conformation

Reading Material:

Δομή της Χρωματίνης (Υπολογιστική Βιολογία, Κεφάλαιο 5). Χ. Νικολάου
Valouev et al. (2011). Determinants of Nucleosome Positioning in primary human cells. Nature 474, 516-520
Marti-Renom and Mirny. (2011). Bridging the resolution gap in structural modeling of 3D genome organization. PLoS Computational Biology, 7, 7, e1002125
Lecture 7. Analysis of Chromatin Structure

The class will focus on the application of web-based bioinformatics tools for simple, straightforward analyses of datasets

Analyzing the quality of a fastq dataset with FASTQC through Galaxy (use.galaxy.org). Get the fastq file here.

Map a fastq file against a reference genome using Galaxy (use.galaxy.org). The file comes from a S. cerevisiae dataset and should be mapped against the SacCer2 genome index. (Note: You will need to use FASTQ-Groomer to convert to Sanger/Illumina before mapping with BWA).
Use MACS through Galaxy to call peaks of ChIP enrichment for a yeast dataset. Find the ChIP SacCer2 bamfiles here (control) and here (condition). Use a user-define Genome Size of 13000000 (approximate for yeast).
Create average plots of a ChIPSeq dataset around the TSS of yeast genes using Galaxy. Find the TSS coordinate files here and the ChIP SacCer2 bamfiles here (control) and here (condition).
Use MEME to discover new motifs from a sequence file originating from a peak caller. Find the sequence file here.
Use a fasta file to extract a logo using Weblogo. Get the fasta file here.
Use the GREAT server to analyze preferences of any type of genomic coordinate files (it can be used for a few organisms). Get a mouse genome coordinates file here.
Use a differential gene expression profile to extract differentially expressed genes. Get two files here and here. You may use Excel or a similar program to call differential expression, in which case you may find this link useful.
Use gprofiler to perform a functional analysis of differentially expressed genes.
Use STRING to visualize networks of Protein-Protein interactions.
Help: install Notepad++ on your computer for easier access of text data.

Google Sites

Report abuse