Home

Rnnotator: de novo transcriptome assembly

Comprehensive annotation and quantification of transcriptomes are outstanding problems

in functional genomics. Rnnotator is an automated software pipeline that generates

transcript models by de novo assembly of RNA-Seq data without the need for a reference

genome. The contigs produced by Rnnotator are highly accurate and reconstruct full-length

genes when transcripts are sequenced sufficiently deep, roughly 30X for a given

transcript. Rnnotator was designed to assemble Illumina single or paired-end reads.

Rnnotator is also able to incorporate strand-specific RNA-Seq reads into the assembly in

order to further improve the assembly.

Overview

The Rnnotator pipeline was designed to take advantage of the strengths of existing

assemblers, while providing additional functionality to further improve transcriptome

assemblies.

Rnnotator takes short read sequences as input and outputs assembled transcript contigs. It

consists of three major components: preprocessing of reads, assembly, and postprocessing

of contigs.

The read preprocessing step may optionally perform several tasks including: removing

low-quality reads, low-complexity reads, adapter-containing reads, duplicate reads, reads

containing rare k-mers, rRNA containing reads, and read trimming.

After read preprocessing, Rnnotator performs eight assemblies using the assembler of

your choice (Velvet, Oases, etc.). Each assembly uses a different hash length for the De

Bruijn graph. The assemblies will be run either sequentially or in parallel, depending

upon the -n parameter setting. After performing multiple assemblies, Rnnotator removes

redundant contigs and further assembles the contigs where significant overlaps are found.

More details can be found at Rnnotator Workflow.