Software

Summary

During the time frame of the workshop, we won't have time to look at all the available packages in much detail.  So as a group, we should prioritize which ones we want to learn more about.  Help us fill out the table below, particularly noting packages that you are interested in based on what has already been used for published work.

 Package Application Data types   Times Cited Hardware setup  Notes
 ALLPATHS
 de novo assembly  paired end Illumina  16    Original reference uses simulated data based on Illumina with paired ends.
 QSRA de novo assembly  

assembly

 
 none yet    
 ABySS      none yet    Parallel, distributed processing of the de Bruijn graph. The ABySS paper includes a table comparing ABySS, Velvet, EULER-SR, SSAKE and EDENA:
http://genome.cshlp.org/content/19/6/1117/T4.expansion.html
 Velvet      26    
 SHARCGS      19    
 SSAKE      ?    Based on the EDENA abstract, this is probably not worth pursuing
 SHORTY    ABI SOLiD, other?      
 EDENA assembly     ?    Based on the one abstract, this is probably not worth pursuing.  Does not support paired ends.
 VCAKE      ?    Based on the EDENA abstract, this is probably not worth pursuing 
 "Consensus Program"          
 Bowtie      none yet    See also TopHat
 Seqmap      6    
 Maq  SNPs    22    Maq was used by the Mardis group for SNP identification, and by Jade Wang's group at Baylor CoM for identification of point mutations by whole genome resequencing in B. subtilis.
 MS-PET    paired ends  32    
 ABI SOLiD         none yet    
 TopHat  mapping intron-exon boundaries from RNA-seq data     Linux or Mac TopHat extends BowTie, Maq, and the SeqAn library.  TopHat was on a 3.0 GHz Intel Xeon 5160 processor, using <4 GB of RAM.  The run described in the paper took about 22 hours.




Applications:

For example:
Assembly onto reference scaffold
de novo Assembly
SNP detection
RNA-seq depth analysis

The following  programs were developed in the laboratory of Todd Mockler. These programs are  all open source and the publications reporting them are in the process or either being published or being considered for publication:

HashMatch (fast short read alignments):
http://mocklerlab-tools.cgrb.oregonstate.edu/HashMatch.html
==
SuperSplat (predicts splice junctions from short read data):
http://mocklerlab-tools.cgrb.oregonstate.edu/supersplat.html
==
TAU (reference guided assembly of alternatively splice transcript structures):
http://mocklerlab-tools.cgrb.oregonstate.edu/TAU.html
==
RGA (reference guided assembly):
http://rga.cgrb.oregonstate.edu/
==


ALLPATHS

    -Summary: Assembles a genome from shotgun microreads. For a given read pair, ALLPATHS will find all sequences from
                     one read to the other that are covered by other reads. Read pairs can also be used to isolate and independently
                     assemble small small regions of the genome.
    -Download site:  http://genome.cshlp.org/content/18/5/810/suppl/DC1     
    -Paper: ALLPATHS: de novo assembly of whole-genome shotgun microreads.

               Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB.

               Genome Res. 2008 May;18(5):810-20. Epub 2008 Mar 13.

               PMID: 18340039 [PubMed - indexed for MEDLINE]


QSRA

    -Summary: Quality-value guided Short Read Assembler is a de novo genome assembler designed to minimize error. It is based
                     on VCAKE, though it is supposed to work much faster.
    -Download site: http://qsra.cgrb.oregonstate.edu/
    -Paper: QSRA: a quality-value guided de novo short read assembler.

               Bryant DW Jr, Wong WK, Mockler TC.

               BMC Bioinformatics. 2009 Feb 24;10:69.

               PMID: 19239711 [PubMed - indexed for MEDLINE]


ABySS

    -Summary: Assembly By Short Sequencing (ABySS) is a short read assembler designed to allow for parallel computation of the
                     assembly algorithm across a network of computers.
    -Download site: http://www.bcgsc.ca/platform/bioinfo/software/abyss
    -Paper: ABySS: A parallel assembler for short read sequence data.

               Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I.

               Genome Res. 2009 Apr 21. [Epub ahead of print]

               PMID: 19251739 [PubMed - as supplied by publisher]


Velvet

    -Summary: Velvet manipulates de Bruijn graphs in order to assemble genomes. The graphs are compact representations of
                     microreads of sequence.
    -Download site: http://www.ebi.ac.uk/~zerbino/velvet/
    -Paper: Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

               Zerbino DR, Birney E.

               Genome Res. 2008 May;18(5):821-9. Epub 2008 Mar 18.

               PMID: 18349386 [PubMed - indexed for MEDLINE]


SHARCGS

    -Summary: SHARCGS is a short read genome assembler, fast and accurate.
    -Download site: http://sharcgs.molgen.mpg.de/download.shtml
    -Paper: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing.

               Dohm JC, Lottaz C, Borodina T, Himmelbauer H.

               Genome Res. 2007 Nov;17(11):1697-706. Epub 2007 Oct 1.

               PMID: 17908823 [PubMed - indexed for MEDLINE]


SSAKE

    -Summary: SSAKE is a short read genome assembler that utilizes a prefix tree and searching for the longest possible overlap
                     between two sequence fragments.
    -Download site: http://www.bcgsc.ca/platform/bioinfo/software/ssake
    -Paper: Assembling millions of short DNA sequences using SSAKE.

               Warren RL, Sutton GG, Jones SJ, Holt RA.

               Bioinformatics. 2007 Feb 15;23(4):500-1. Epub 2006 Dec 8.

               PMID: 17158514 [PubMed - indexed for MEDLINE]


SHORTY

    -Summary: SHORTY is a short-read de novo assembler particularly targeted at the ABI SOLiD sequencing technology. The program
                     creates 5-10 "seeds" of 300-500 bp with which the genome is assembled.
    -Download site: http://www.cs.sunysb.edu/~skiena/shorty/
    -Paper: Crystallizing short-read assemblies around seeds.

               Hossain MS, Azimi N, Skiena S.

               BMC Bioinformatics. 2009 Jan 30;10 Suppl 1:S16.

               PMID: 19208115 [PubMed - indexed for MEDLINE]


EDENA

    -Summary: EDENA is a de novo short read sequence assembler, giving very accurate results
    -Download site: http://www.genomic.ch/edena.php
    -Paper: De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads.

               Farrer RA, Kemen E, Jones JD, Studholme DJ.

               FEMS Microbiol Lett. 2009 Feb;291(1):103-11. Epub 2008 Dec 9.

               PMID: 19077061 [PubMed - indexed for MEDLINE]


VCAKE

    -Summary: VCAKE is a de novo short read sequence assember, but it seems to work slower and less efficient than other programs
                      that are out there
    -Download site: http://152.2.15.114/~labweb/VCAKE
    -Paper: De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads.

               Farrer RA, Kemen E, Jones JD, Studholme DJ.

               FEMS Microbiol Lett. 2009 Feb;291(1):103-11. Epub 2008 Dec 9.

               PMID: 19077061 [PubMed - indexed for MEDLINE]


Consensus Program

    -Summary: A multi-read alignment algorithm for de novo or reference-guided genome assembly.
    -Download site: http://www.seqan.de/web/report/Downloads/Projects/
    -Paper: A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads.

               Rausch T, Koren S, Denisov G, Weese D, Emde AK, Döring A, Reinert K.

               Bioinformatics. 2009 May 1;25(9):1118-24. Epub 2009 Mar 5.

               PMID: 19269990 [PubMed - in process]


Bowtie

    -Summary: Bowtie is a short read assembler that trades off accuracy for speed.
    -Download site: http://bowtie-bio.sourceforge.net/index.shtml
    -Paper: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

                Langmead B, Trapnell C, Pop M, Salzberg SL.

                Genome Biol. 2009 Mar 4;10(3):R25. [Epub ahead of print]

                PMID: 19261174 [PubMed - as supplied by publisher]


SeqMap

    -Summary: SeqMap is a tool for mapping large amount of short sequences to the genome. It allows command line options and is capable
                     of running on a computer cluster.
    -Download site: http://biogibbs.stanford.edu/~jiangh/SeqMap/
    -Paper: SeqMap: mapping massive amount of oligonucleotides to the genome.

                Jiang H, Wong WH.

                Bioinformatics. 2008 Oct 15;24(20):2395-6. Epub 2008 Aug 12.

                PMID: 18697769 [PubMed - indexed for MEDLINE]


Maq

    -Summary: Maq assembles genomes from short read shotgun sequences by comparing them to a reference genome.

    -Download site: http://maq.sourceforge.net/

    -Paper: Mapping short DNA sequencing reads and calling variants using mapping quality scores.

                Li H, Ruan J, Durbin R.

                Genome Res. 2008 Nov;18(11):1851-8. Epub 2008 Aug 19.

                PMID: 18714091 [PubMed - indexed for MEDLINE]


No Download Sites


MS-PET

    -Download site:
    -Paper: Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes.

               Ng P, Tan JJ, Ooi HS, Lee YL, Chiu KP, Fullwood MJ, Srinivasan KG, Perbost C, Du L, Sung WK, Wei CL, Ruan Y.

               Nucleic Acids Res. 2006 Jul 13;34(12):e84.

               PMID: 16840528 [PubMed - indexed for MEDLINE]


ABI SOLiD

    -Download site:
    -Paper: Crystallizing short-read assemblies around seeds.

               Hossain MS, Azimi N, Skiena S.

               BMC Bioinformatics. 2009 Jan 30;10 Suppl 1:S16.

               PMID: 19208115 [PubMed - indexed for MEDLINE]