Bowtie2 and SAMtools
Bowtie2 & SAMtools
Bowtie2 [1] and SAMtools [2] are sequencing alignment tools. SAMtools provide various utilities for manipulating alignments in the SAM (Sequence Alignment/Map) format, including sorting, merging, indexing and generating alignments in a per-position format. Bowtie aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. It indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end)
Installed Versions
All the available versions of Bowtie and SAMtools for use can be viewed by issuing the following commands. This applies for other applications as well.
module spider bowtie
module spider samtools
The modules can be loaded as:
module load bowtie2 #for Bowtie
module load samtools #for samtools
Running Bowtie and Samtools in HPC
Example (from the Bowtie2 Manual):
Copy the example directory from /usr/local/doc/BOWTIE/ to your home directory
cp -r /usr/local/doc/BOWTIE/example <path-in-your-home-directory>/
Change directory to the location where there is a job file "bowtie.slurm"
cd example/bowtie-test
Serial Job
The sample job file (bowtie.slurm) is showed below:
#!/bin/bash
#SBATCH --job-name="Bowtie_test"
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH -o Bowtie_test.o%j
#Bowtie, Samtools and Bcftools
module purge
module load intel/17 openmpi
module load bowtie2
module load samtools/1.8
module load bcftools/1.8
#Indexing a reference genome
bowtie2-build ../reference/lambda_virus.fa lambda_virus
#Aligning example reads
bowtie2 -x lambda_virus -U ../reads/reads_1.fq -S eg1.sam
#Paired-end example
bowtie2 -x lambda_virus -1 ../reads/reads_1.fq -2 ../reads/reads_2.fq -S eg2.sam
#Local alignment example
bowtie2 --local -x lambda_virus -U ../reads/longreads.fq -S eg3.sam
#Convert the SAM to BAM:
samtools view -bS eg2.sam > eg2.bam
#Convert the BAM file to a sorted BAM file
samtools sort -o eg2.sorted.bam eg2.bam
#We now have a sorted BAM file called `eg2.sorted.bam`. Sorted BAM is a useful
#format because the alignments are (a) compressed, which is convenient for
#long-term storage, and (b) sorted, which is conveneint for variant discovery.
#To generate variant calls in VCF format, run
samtools mpileup -uf ../reference/lambda_virus.fa eg2.sorted.bam | bcftools call -mv -Oz > eg2.raw.vcf
Submit the job
sbatch bowtie.slurm
Check the output files in your working directory (bowtie-test). There is also job output file Bowtie_test.o<jobid> with the content:
...
Settings:
Output files: "lambda_virus.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
...
Calculating bucket sizes
Binary sorting into buckets
10%
...
94.97% overall alignment rate
6000 reads; of these:
6000 (100.00%) were unpaired; of these:
157 (2.62%) aligned 0 times
5634 (93.90%) aligned exactly 1 time
209 (3.48%) aligned >1 times
97.38% overall alignment rate
[samopen] SAM header is present: 1 sequences.
[fai_load] build FASTA index.
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
[afs] 0:4254.406 1:3.562 2:113.032
View the variant
module purge
module load intel/17 openmpi
module load samtools/1.8
module load bcftools/1.8
bcftools view eg2.raw.vcf
output:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##samtoolsVersion=1.8+htslib-1.8
##samtoolsCommand=samtools mpileup -uf ../reference/lambda_virus.fa eg2.sorted.bam
##reference=file://../reference/lambda_virus.fa
##contig=<ID=gi|9626243|ref|NC_001416.1|,length=48502>
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT eg2.sorted.bam
gi|9626243|ref|NC_001416.1| 2 . GGCG GGCGCGGGGGCG 11.7169 . INDEL;IDV=1;IMF=0.5;DP=2;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=42 GT:PL 1/1:41,3,0
gi|9626243|ref|NC_001416.1| 244 . CA C 32.4168 . INDEL;IDV=1;IMF=0.0285714;DP=35;VDB=0.816543;SGB=-0.590765;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,4,1;MQ=34 GT:PL 1/1:62,15,0
gi|9626243|ref|NC_001416.1| 245 . ATT AT 209 . INDEL;IDV=33;IMF=0.942857;DP=34;VDB=0.89541;SGB=-0.692562;MQSB=0.605517;MQ0F=0;AC=2;AN=2;DP4=0,0,13,9;MQ=35 GT:PL 1/1:239,66,0
gi|9626243|ref|NC_001416.1| 351 . ATGCTGAAATT A 149 . INDEL;IDV=1;IMF=0.0285714;DP=35;VDB=0.00453739;SGB=-0.688148;MQSB=0.819936;MQ0F=0;AC=2;AN=2;DP4=0,1,8,7;MQ=26 GT:PL 1/1:176,16,0
gi|9626243|ref|NC_001416.1| 353 . GCTGAAATTGA G 225 . INDEL;IDV=24;IMF=0.685714;DP=34;VDB=0.00750609;SGB=-0.692914;MQSB=0.716885;MQ0F=0;AC=2;AN=2;DP4=0,0,13,12;MQ=26 GT:PL 1/1:255,75,0
...
Parallel Job
Bowtie can be run in parallel by simply adding the option -p <n> in Bowtie command where n is the number of processors. In the example slurm script (bowtie.slurm), you need to make the following changes:
You need to request more processors to match the value of n:
#SBATCH --cpus-per-task=<n>
The bowtie commands take the form:
bowtie2 -p <n> -x lambda_virus -U ../reads/reads_1.fq -S eg1.sam
References:
[1] Bowtie Home- http://bowtie-bio.sourceforge.net/index.shtml
[2] SAMtools Home - http://samtools.sourceforge.net/
[3] Bowtie tutorial - http://bowtie-bio.sourceforge.net/tutorial.shtml
[4] SAMTools Primer - http://biobits.org/samtools_primer.html