Inchworm RNA-Seq Assembler

Project home page:

The inchworm RNA-Seq assembler, developed at the Broad Institute, employs the Kmer graph method to reconstruct (in many cases full-length) transcripts from Illumina RNA-Seq (preferrably strand-specific) reads. Inchworm is especially effective when used with strand-specific RNA-Seq data.

We have installed the software on Oscar. Here is a brief step by step guild to run the software:

# 1) Log onto Oscar and create a working folder to work with, and load the inchworm module
ssh smp007
mkdir data/inchworm-test
cd data/inchworm-test
module load inchworm
# Load inchworm module
module load inchworm

# 2) Copy paired-end Illumina data to current folder
cp path/to/your/read/folder/s_1_*sequence.txt .

# 3) Get a first 2500 reads to test the software
head -n 10000 s_1_1_sequence.txt > left.fq
head -n 10000 s_1_2_sequence.txt > right.fq

# 4) Extract fasta sequences from fastq files -I left.fq -a 1 --rev > left.fq.fa -I right.fq -a 2 > right.fq.fa
cat left.fq.fa right.fq.fa > both.senseOriented.fa

# 5) Run Inchworm on Oscar

# Running in strand-specific mode if you Illumina run was strand-specific run
inchworm --reads strand-specific.senseOriented.RNASeq_reads.fasta --run_inchworm --monitor 1 > inchworm_assemblies.SS.fasta

# Running in double-stranded mode for any run
inchworm --reads strand-specific.senseOriented.RNASeq_reads.fasta --run_inchworm --DS --monitor 1 > inchworm_assemblies.DS.fasta

# 6) Check the results:

The assembly sequences output by inchworm are formatted like so:


a1 corresponds to assembly 1, and 123 corresponds to an average kmer coverage (ie. read coverage) for the assembly.

Let me know if you have any question or you can also contact the project authors on the project page.

Enjoy data analyzing.