03. Advanced Sequence Alignment - NGS read mapping

The problem of fast alignments
Aligning two sequences exactly. NW algorithm
How about doing it fast for a great number of sequences. Doesn't work
The BLAST euristic approach
Now imagine it for a lot of sequences.
- A real-life problem. You have a number of seqs coming from a genome and want to find their exact coordinates
- You need:
  - The sequences
  - The Reference genome
  - A BLAST-like approach for a small number (100-1000 seqs). BLAT
Now imagine the same problem for millions of short (or not so short) seqs. The reads from an experiment
- How will you do it?
- How will you manage the data?
- How will you examine the output?
Read Mapping (http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Alignment)
- The main philosophy. Create a fast-to-search data structure (index) for the Reference Genome.
- Search the reads against this "structure" instead of the serial genome sequence.
- Obtain sequence coordinates for each read
Types of structures/transformation
- Suffix Tries, Suffix Trees, BW transform
Alignment methods/software
- BWA, GEM, Split aligners (TopHat http://avrilomics.blogspot.gr/2013/04/using-tophat-for-mapping-rna-seq-data.html)

Google Sites

Report abuse