Sequence Alignment (Bowtie)

There are two ways to run a job (program) on Oscar. One is interactive mode, the other batch mode. I will show you both mode:

Interactive mode allow you to interact with the program, so you can directly see the command line output, or provide you a GUI interface. But if your connection is interrupted, the program will get terminated.
A good way to avoid the interruption is the screen command.  screen command also allow you logout anytime and come back to anytime: https://sites.google.com/a/brown.edu/bioinformatics-in-biomed/screen-keep-linux-session-alive

# STEP 0
# Login and start a screen following the instructions in the link:  https://sites.google.com/a/brown.edu/bioinformatics-in-biomed/screen-keep-linux-session-alive

# STEP 1
# Check available partition on Oscar:
# There are a few partitions (also called queue, you can think the queue is a waiting list). You can see which partition is not busy for you to use by this command:
sinfo

# The output looks like: ( I highlighted the idle nodes. Please notice the tiny-batch and small-batch have TIMELIMIT)
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug            up 2-00:00:00      6   idle gpu[001-002],node[001-004]
gpu              up   infinite      1  alloc gpu301
gpu              up   infinite     41   idle gpu[302-342]
batch*           up   infinite      3  down* node469,smp[005,010]
batch*           up   infinite      4  drain node[231,434],smp[003-004]
batch*           up   infinite    344  alloc node[015-082,101-120,123-184,201-224,227-230,232-274,401-433,435-468,471-474,501-524,526-532,534-538,540-541,544-551,577-580],smp[008-009]
batch*           up   infinite     33   idle node[121-122,470,525,533,539,542-543,552-576]
tiny-batch       up      15:00      3  down* node469,smp[005,010]
tiny-batch       up      15:00      4  drain node[231,434],smp[003-004]
tiny-batch       up      15:00    345  alloc gpu301,node[015-082,101-120,123-184,201-224,227-230,232-274,401-433,435-468,471-474,501-524,526-532,534-538,540-541,544-551,577-580],smp[008-009]
tiny-batch       up      15:00     80   idle gpu[001-002,302-342],node[001-004,121-122,470,525,533,539,542-543,552-576]
small-batch      up    1:00:00      3  down* node469,smp[005,010]
small-batch      up    1:00:00      4  drain node[231,434],smp[003-004]
small-batch      up    1:00:00    345  alloc gpu301,node[015-082,101-120,123-184,201-224,227-230,232-274,401-433,435-468,471-474,501-524,526-532,534-538,540-541,544-551,577-580],smp[008-009]
small-batch      up    1:00:00     52   idle gpu[302-320],node[121-122,470,525,533,539,542-543,552-576]

# I can see there are a few nodes available in the batch partition. 

# STEP 3a
# First, I will show you to run bowtie under interactive mode:

# So I will request resource from "batch" partition. Here I request, 4 CPU core, 20G memory and 2 hours computing time. 
# You may need wait some time if someone else is using the nodes. 
# You can adjust the number according to the size of your data set.
# Please note that it is a good idea to request a little more time, otherwise, you job may get terminated before it finishes.  
interact -n 4 -m 20g -t 2:00:00 -q batch

# below is my screen cut, here the high :
[ldong@login002 ~]$ interact -n 4 -m 20g -t 2:00:00 -q batch
Cores:    4
Walltime: 2:00:00
Memory:   20g
Queue:    batch
salloc -J interact -N 1-1 -n 4 --time=2:00:00 --mem=20g -p batch srun --pty bash

salloc: Pending job allocation 1613811
salloc: job 1613811 queued and waiting for resources
salloc: job 1613811 has been allocated resources
salloc: Granted job allocation 1613811
..
[ldong@node121 ~]$ 

# Please run this command to see the options:
interact -h

# Screen cut looks like:
/gpfs/runtime/bin/interact: option requires an argument -- h

usage: interact [-n cores] [-t walltime] [-m memory] [-q queue]
                [-o outfile] [-X] [-f featurelist] [-h hostname]
Starts an interactive job by wrapping the SLURM 'salloc' and 'srun' commands.
options:
  -n cores        (default: 1)
  -t walltime     as hh:mm:ss (default: 30:00)
  -m memory       as #[k|m|g] (default: 4g)
  -q queue        (default: 'small-batch')
  -o outfile      save a copy of the session's output to outfile (default: off)
  -X              enable X forwarding (default: no)
  -f featurelist  CCV-defined node features (e.g., 'e5-2600'),
                  combined with '&' and '|' (default: none)
  -h hostname     only run on the specific node 'hostname'
                  (default: none, use any available node)

#make a temporary directory to testing purpose, and goto it
mkdir data/bowtie_test  && cd data/bwotie_test

# Load modules:
 module load bowtie/0.12.9 samtools/0.1.18

#take a look at the test date set
head /gpfs/data/shared/biomed/example_fastq/rna_seq/s1_r1.fastq

# Run bowtie
bowtie dmel5 -p 4 /gpfs/data/shared/biomed/example_fastq/rna_seq/s1_r1.fastq   -S s1_r1.sam

# Screen cut looks like:

# reads processed: 500000
# reads with at least one reported alignment: 604 (0.12%)
# reads that failed to align: 499396 (99.88%)
Reported 604 alignments to 1 output stream(s)

# Take a look at the output
head -n 100 s1_r1.sam | less

# To see all optons for bowtie:
bowtie -h


# Bowtie only give sam format output, if you need bam format, use samtools
samtools view -bS s1_r1.sam > s1_r1.bam

# Screen cut:
[samopen] SAM header is present: 15 sequences.

# to view bam file
samtools view -h  s1_r1.bam | head -n 100 | less

# to see all options:
samtools 

# or for option for a certain command under samtools
samtools [commnand (such as: view,sort,..)]

# STEP 3b
# Next, I will show you to run bowtie through sbatch script:
#make a temporary directory to testing purpose, and goto it
mkdir data/bowtie_test  && cd data/bwotie_test

#take a look at the test date set 
head /gpfs/data/shared/biomed/example_fastq/rna_seq/s1_r1.fastq

# Copy the example sbatch script to current folder:
cp ~/shared/biomed/example_sbatch_script_bowtie_samtools.sh . 

# Take a look at the sbatch script:
less example_sbatch_script_bowtie_samtools.sh 

#!/bin/bash

#SBATCH -J bowtie_testing

#SBATCH -n 4
#SBATCH -t 1:00:00

# submit job to batch
#SBATCH --partition=batch

#SBATCH --mail-type=ALL
#SBATCH --mail-user=xyz@brown.edu

module load bowtie/0.12.9 samtools/0.1.18

# Run bowtie
bowtie dmel5 -p 4 /gpfs/data/shared/biomed/example_fastq/rna_seq/s1_r1.fastq   -S s1_r1.sam

# Bowtie only give sam format output, if you need bam format, use samtools
samtools view -bS s1_r1.sam > s1_r1.bam

echo bowtie and samtools finished successfuly


# Edit the script, change the email to your own email address. Then submit the job:
sbatch  example_sbatch_script_bowtie_samtools.sh 

# Use myq command to check if the job status. You can also check your email for job status. 

# After the job is finished. Take a look at the output
head -n 100 s1_r1.sam | less

# To see all optons for bowtie:
bowtie -h

# to see all options of samtools:
samtools 

# or for option for a certain command under samtools
samtools [commnand (such as: view,sort,..)]


# Let me know if you have any question.