Bowtie2 and SAMtools

Bowtie2 & SAMtools

Bowtie2 [1] and SAMtools [2] are sequencing alignment tools. SAMtools provide various utilities for manipulating alignments in the SAM (Sequence Alignment/Map) format, including sorting, merging, indexing and generating alignments in a per-position format. Bowtie aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. It indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end)

Installed Versions

All the available versions of Bowtie and SAMtools for use can be viewed by issuing the following commands. This applies for other applications as well.

module spider bowtie

module spider samtools

The modules can be loaded as:

module load bowtie2     #for Bowtie

module load samtools    #for samtools

Running Bowtie and Samtools in HPC

Example (from the Bowtie2 Manual):

Copy the example directory from /usr/local/doc/BOWTIE/ to your home directory

cp -r /usr/local/doc/BOWTIE/example <path-in-your-home-directory>/

Change directory to the location where there is a job file "bowtie.slurm"

cd example/bowtie-test

Serial Job

The sample job file (bowtie.slurm) is showed below:

#!/bin/bash

#SBATCH --job-name="Bowtie_test"

#SBATCH --nodes=1

#SBATCH --cpus-per-task=1

#SBATCH --time=00:10:00

#SBATCH -o Bowtie_test.o%j


#Bowtie, Samtools and Bcftools

module purge

module load intel/17 openmpi

module load bowtie2

module load samtools/1.8

module load bcftools/1.8


#Indexing a reference genome

bowtie2-build ../reference/lambda_virus.fa lambda_virus


#Aligning example reads

bowtie2 -x lambda_virus -U ../reads/reads_1.fq -S eg1.sam


#Paired-end example

bowtie2 -x lambda_virus -1 ../reads/reads_1.fq -2 ../reads/reads_2.fq -S eg2.sam


#Local alignment example

bowtie2 --local -x lambda_virus -U ../reads/longreads.fq -S eg3.sam


#Convert the SAM to BAM:

samtools view -bS eg2.sam > eg2.bam


#Convert the BAM file to a sorted BAM file

samtools sort -o eg2.sorted.bam eg2.bam


#We now have a sorted BAM file called `eg2.sorted.bam`. Sorted BAM is a useful

#format because the alignments are (a) compressed, which is convenient for

#long-term storage, and (b) sorted, which is conveneint for variant discovery.

#To generate variant calls in VCF format, run

samtools mpileup -uf ../reference/lambda_virus.fa eg2.sorted.bam | bcftools call -mv -Oz > eg2.raw.vcf

Submit the job

sbatch bowtie.slurm

Check the output files in your working directory (bowtie-test). There is also job output file Bowtie_test.o<jobid> with the content:

...

Settings:

  Output files: "lambda_virus.*.bt2"

  Line rate: 6 (line is 64 bytes)

  Lines per side: 1 (side is 64 bytes)

  Offset rate: 4 (one in 16)

  FTable chars: 10

...

Calculating bucket sizes

  Binary sorting into buckets

  10%

...

94.97% overall alignment rate

6000 reads; of these:

  6000 (100.00%) were unpaired; of these:

    157 (2.62%) aligned 0 times

    5634 (93.90%) aligned exactly 1 time

    209 (3.48%) aligned >1 times

97.38% overall alignment rate

[samopen] SAM header is present: 1 sequences.

[fai_load] build FASTA index.

[mpileup] 1 samples in 1 input files

<mpileup> Set max per-file depth to 8000

[afs] 0:4254.406 1:3.562 2:113.032

View the variant

module purge

module load intel/17 openmpi

module load samtools/1.8

module load bcftools/1.8

bcftools view eg2.raw.vcf

output:

##fileformat=VCFv4.2

##FILTER=<ID=PASS,Description="All filters passed">

##samtoolsVersion=1.8+htslib-1.8

##samtoolsCommand=samtools mpileup -uf ../reference/lambda_virus.fa eg2.sorted.bam

##reference=file://../reference/lambda_virus.fa

##contig=<ID=gi|9626243|ref|NC_001416.1|,length=48502>

...

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT eg2.sorted.bam

gi|9626243|ref|NC_001416.1| 2 . GGCG GGCGCGGGGGCG 11.7169 . INDEL;IDV=1;IMF=0.5;DP=2;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=42 GT:PL 1/1:41,3,0

gi|9626243|ref|NC_001416.1| 244 . CA C 32.4168 . INDEL;IDV=1;IMF=0.0285714;DP=35;VDB=0.816543;SGB=-0.590765;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,4,1;MQ=34 GT:PL 1/1:62,15,0

gi|9626243|ref|NC_001416.1| 245 . ATT AT 209 . INDEL;IDV=33;IMF=0.942857;DP=34;VDB=0.89541;SGB=-0.692562;MQSB=0.605517;MQ0F=0;AC=2;AN=2;DP4=0,0,13,9;MQ=35 GT:PL 1/1:239,66,0

gi|9626243|ref|NC_001416.1| 351 . ATGCTGAAATT A 149 . INDEL;IDV=1;IMF=0.0285714;DP=35;VDB=0.00453739;SGB=-0.688148;MQSB=0.819936;MQ0F=0;AC=2;AN=2;DP4=0,1,8,7;MQ=26 GT:PL 1/1:176,16,0

gi|9626243|ref|NC_001416.1| 353 . GCTGAAATTGA G 225 . INDEL;IDV=24;IMF=0.685714;DP=34;VDB=0.00750609;SGB=-0.692914;MQSB=0.716885;MQ0F=0;AC=2;AN=2;DP4=0,0,13,12;MQ=26 GT:PL 1/1:255,75,0

...

Parallel Job

Bowtie can be run in parallel by simply adding the option -p <n> in Bowtie command where n is the number of processors. In the example slurm script (bowtie.slurm), you need to make the following changes:

You need to request more processors to match the value of n:

#SBATCH --cpus-per-task=<n>

The bowtie commands take the form:

 bowtie2 -p <n>  -x lambda_virus -U ../reads/reads_1.fq -S  eg1.sam  

References:

[1] Bowtie Home- http://bowtie-bio.sourceforge.net/index.shtml

[2] SAMtools Home - http://samtools.sourceforge.net/

[3] Bowtie tutorial - http://bowtie-bio.sourceforge.net/tutorial.shtml

[4] SAMTools Primer - http://biobits.org/samtools_primer.html