Rnnotator: de novo transcriptome assembly

Comprehensive annotation and quantification of transcriptomes are outstanding problems
in functional genomics. Rnnotator is an automated software pipeline that generates
transcript models by de novo assembly of RNA-Seq data without the need for a reference
genome. The contigs produced by Rnnotator are highly accurate and reconstruct full-length
genes when transcripts are sequenced sufficiently deep, roughly 30X for a given
transcript. Rnnotator was designed to assemble Illumina single or paired-end reads.
Rnnotator is also able to incorporate strand-specific RNA-Seq reads into the assembly in
order to further improve the assembly.


The Rnnotator pipeline was designed to take advantage of the strengths of existing
assemblers, while providing additional functionality to further improve transcriptome

Rnnotator takes short read sequences as input and outputs assembled transcript contigs. It
consists of three major components: preprocessing of reads, assembly, and postprocessing
of contigs.

The read preprocessing step may optionally perform several tasks including: removing
low-quality reads, low-complexity reads, adapter-containing reads, duplicate reads, reads
containing rare k-mers, rRNA containing reads, and read trimming.

After read preprocessing, Rnnotator performs eight assemblies using the assembler of
your choice (Velvet, Oases, etc.). Each assembly uses a different hash length for the De
Bruijn graph. The assemblies will be run either sequentially or in parallel, depending
upon the -n parameter setting. After performing multiple assemblies, Rnnotator removes
redundant contigs and further assembles the contigs where significant overlaps are found.

More details can be found at Rnnotator Workflow.