PowerPoint Presentations

    Ingenuity Pathway Analysis (IPA) Guide

    Summer Bioinformatics Course

    Recent Publications

    MicroRNA Seq

    Gene Ontology Analysis

    Phylogeny Analysis

    NCBI SRA Download

    Recent site activity

    blast by qsub

    This tutorial will show you how to divide your sequences into smaller files, then submit the blast jobs to Oscar computer cluster, so that the mapping will finish faster than running it on single machine. After making sure all blast jobs finishes successfully, we will merge the output files together for the down-stream data analysis. 
    Here, I will run blastx and map my contigs (saved as  /gpfs/data/blastdb/example_contig.fasta on Oscar) from Illumina sequencing assembling onto nr protein database. 

    Note: this script is based on the original script by Mark: /gpfs/scratch/mhowison/Patrick/multiblastx.py

    !!! Note: pay attention to use the correct genetic code (option: -query_gencode) when running blastx, the default is 1.

    # first login to oscar, and go to scratch
    ssh oscar
    cd scratch/

    # load biomed module. This will set the path for the python script
    module load biomed

    # Take a look the usage information:
    blast-by-qsub-v1.1.py
    usage: blastx-by-qsub-v1.1.py           query.fa              num_seq_per_job            blast_program                db                number-of-nodes-to-use             wall_time             email_address

    # divide my contigs into smaller files, each one contains 200 contigs, and create a pbs qsub script for blastx on to nr protein database
    blast-by-qsub-v1.1.py    /gpfs/data/blastdb/example_contig.fasta            200                   blastx           nr     12       3:00:00          xxx@brown.edu

    # Here is the output:
    To submit the jobs, run the following command:
    qsub example_contig.fasta_nr.pbs

    # Take a look at the pbs qsub script Certainly, you can edit the script on how to run the blast. Please refer blast manual for the options.
    less example_contig.fasta_nr.pbs

    #!/bin/bash
    #PBS -N example_contig.fasta
    #PBS -t 0-4%12
    #PBS -l nodes=1
    #PBS -l walltime=3:00:00
    #PBS -j oe
    #PBS -M xxx@brown.edu
    module load blast

    cd $PBS_O_WORKDIR
    echo done > tmp/example_contig.fasta_nr_$PBS_ARRAYID.out.start
    blastx -db nr \
           -query tmp/example_contig.fasta_$PBS_ARRAYID.fa \
           -out tmp/example_contig.fasta_nr_$PBS_ARRAYID.out \
           -outfmt 6 \
           -num_threads 8 \
           -num_alignments 10
    echo done > tmp/example_contig.fasta_nr_$PBS_ARRAYID.out.start.done

    # It is a good idea to run 1 or 2 of them before submit the whole thing to the cluster, in case something wrong, such as proper walltime, blast output format, filter and so on. 
    # So, I make a copy of the script and work on the copy, and edit it to work only on job 0 and 1 to see what happens. Note: changed parts are highlighted red.
    cp example_contig.fasta_nr.pbs example_contig.fasta_nr.pbs1
    vi  example_contig.fasta_nr.pbs1
    #!/bin/bash
    #PBS -N example_contig.fasta
    #PBS -t 0-1%12
    #PBS -l nodes=1
    #PBS -l walltime=3:00:00
    #PBS -j oe
    #PBS -M xxx@brown.edu
    module load blast

    cd $PBS_O_WORKDIR
    echo done > tmp/example_contig.fasta_nr_$PBS_ARRAYID.out.start
    blastx -db nr \
           -query tmp/example_contig.fasta_$PBS_ARRAYID.fa \
           -out tmp/example_contig.fasta_nr_$PBS_ARRAYID.out \
           -outfmt 6 \
           -num_threads 8 \
           -num_alignments 10
    echo done > tmp/example_contig.fasta_nr_$PBS_ARRAYID.out.start.done

    # After edit the script, press Exc and :wq to save it. The submit the two job to cluster:
    qsub example_contig.fasta_nr.pbs1
    1372422[].mgt

    # Check the job status:
    showq -u $USER

    # You should receive a email when the jobs finish. 
    # Check if the jobs finish successfully
    ls -l tmp/*done
    -rw-r----- 1 ldong ccvstaff 5 Dec  2 14:20 tmp/example_contig.fasta_nr_0.out.start.done
    -rw-r----- 1 ldong ccvstaff 5 Dec  2 14:20 tmp/example_contig.fasta_nr_1.out.start.done

    # The two jobs are finished. If you have a lot jobs, causing hard to count by eye, you can find the number of finished job by:
    ls -l tmp/*done | wc -l
    2

    # You can see the actual time used for the job from the job output:
    less example_contig.fasta.o1372422-0
    ----------------------------------------
    Begin PBS Prologue Fri Dec  2 15:07:24 EST 2011 1322856444
    Job ID:         1372422[0].mgt
    Username:       ldong
    Group:          illumina
    Nodes:          node154
    End PBS Prologue Fri Dec  2 15:07:25 EST 2011 1322856445
    ----------------------------------------
    ----------------------------------------
    Begin PBS Epilogue Fri Dec  2 15:43:41 EST 2011 1322858621
    Job ID:         1372422[0].mgt
    Username:       ldong
    Group:          illumina
    Job Name:       example_contig.fasta-0
    Session:        23863
    Limits:         neednodes=1,nodes=1,walltime=03:00:00
    Resources:      cput=03:59:08,mem=5216848kb,vmem=11583308kb,walltime=00:36:12
    Queue:          dq
    Account:                
    Nodes:  node154
    Killing leftovers...

    End PBS Epilogue Fri Dec  2 15:43:41 EST 2011 1322858621

    # So we only used about 40 minutes for the fist job. So 1:30:00 should enough instead of 3 hours.
    # Now, I will change the script and submit other jobs. Note: changed parts are highlighted red.
    cp example_contig.fasta_nr.pbs example_contig.fasta_nr.pbs2
    vi  example_contig.fasta_nr.pbs2
    #!/bin/bash
    #PBS -N example_contig.fasta
    #PBS -t 2-4%12
    #PBS -l nodes=1
    #PBS -l walltime=1:30:00
    #PBS -j oe
    #PBS -M xxx@brown.edu
    module load blast

    cd $PBS_O_WORKDIR
    echo done > tmp/example_contig.fasta_nr_$PBS_ARRAYID.out.start
    blastx -db nr \
           -query tmp/example_contig.fasta_$PBS_ARRAYID.fa \
           -out tmp/example_contig.fasta_nr_$PBS_ARRAYID.out \
           -outfmt 6 \
           -num_threads 8 \
           -num_alignments 10
    echo done > tmp/example_contig.fasta_nr_$PBS_ARRAYID.out.start.done

    # now, submit the jobs:
    qsub example_contig.fasta_nr.pbs2

    # Check if the jobs finished
    ls -l tmp/*done | wc -l
    5

    # Combine the blast results into a single file
    cat tmp/*.out > example_contig_blastx_nr.blx

    Let me know if you have any question.   



    Comments