Search this site
Embedded Files
Skip to main content
Skip to navigation
Data Analysis for the Biosciences
Home
Contents
Algorithms in Bioinformatics
01. Introduction. Concepts and Examples
02. Sequence Analysis Algorithms
03. Motif Analysis
04. Motif Discovery
05. Rapid Sequence Searches
06. NGS Algorithms. I: Mapping
07. NGS Algorithms. II: Applications
08. Genome Biology Algorithms
12. Alignment Strategies (I. Tsamardinos)
Data Analysis with R
01. Introduction to R
02. Using R - Data Input and Output
03. Data types in R
04. R functions and Flow Control
05. Basic statistics with R
06. Plotting with R
07. Programming with R
08. R for advanced uses
Genome Structure and Architecture
00. Genomics Basics
01. Epigenomics
02. Sequence to Structure to Function
Molecular Biomedicine
NGS Data Analysis
00. Genomics Basics
01. Introduction to NGS Methods
02. NGS data processing - QC
03. Advanced Sequence Alignment - NGS read mapping
04. Peak Detection in ChIPSeq data
05. RNASeq Data analysis. Analysis of Differential Gene Expression
06. Functional Analysis and Positional Enrichments
07. Whole Exome Sequencing - Variant detection and Gene Prioritization
08. Bioinformatics Workshop for NGS
Introduction
Practical Exercises
1. R Practical #1. Introduction
2. Analyzing sequence motifs
3. Sequence homology using BLAST
4. Analysis of sequence evolution
5. Functional Analysis of Gene Expression
6. A real problem
Shared Files
Suggested Reading
Data Analysis for the Biosciences
03. Advanced Sequence Alignment - NGS read mapping
The problem of fast alignments
Aligning two sequences exactly. NW algorithm
How about doing it fast for a great number of sequences. Doesn't work
The BLAST euristic approach
Now imagine it for a lot of sequences.
A real-life problem. You have a number of seqs coming from a genome and want to find their exact coordinates
You need:
The sequences
The Reference genome
A BLAST-like approach for a small number (100-1000 seqs). BLAT
Now imagine the same problem for millions of short (or not so short) seqs. The reads from an experiment
How will you do it?
How will you manage the data?
How will you examine the output?
Read Mapping (http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Alignment)
The main philosophy. Create a fast-to-search data structure (index) for the Reference Genome.
Search the reads against this "structure" instead of the serial genome sequence.
Obtain sequence coordinates for each read
Types of structures/transformation
Suffix Tries, Suffix Trees, BW transform
Alignment methods/software
BWA, GEM, Split aligners (TopHat http://avrilomics.blogspot.gr/2013/04/using-tophat-for-mapping-rna-seq-data.html)
Google Sites
Report abuse
Google Sites
Report abuse