PowerPoint Presentations

    Ingenuity Pathway Analysis (IPA) Guide

    Summer Bioinformatics Course

    Recent Publications

    MicroRNA Seq

    Gene Ontology Analysis

    Phylogeny Analysis

    NCBI SRA Download

    Recent site activity

    Casava 1.7 for RNA-Seq (cluster version)

       

    Casava 1.7 for RNA-Seq

    The CACAVA1.7 pipeline software provided two options to to run the alignments using aligner program ELAND:

    • GERALD.pl (Pink part in above figure): for aligning the non-bar-coded reads with reference sequence.
    • demultiplexer.pl (green part in above figure, note: in CASAVA1.7 this includes two steps, demultiplexer.pl and demultiplexedGERALD.pl): for seperating bar-coded reads into differnet bins, then make alignments.

    GERALD.pl: You can run GERALD.pl through PBS scheduler or you can also the software on smp interactive node.

    # 1) Run GERALD.pl on example data on Oscar computer cluster: 


    #log in oscar
    ssh oscar

    #make and go to a temporary directory for testing purpose, Notice that files in the scratch folder more than 4 weeks old will be automatically deleted
    mkdir scratch/tem_test && cd scratch/tem_test

    #take a look at the test date set
    ls /gpfs/runtime/bioinfo/casava1.7_data_script/Illumina_Genome_Analyzer_Validation_Dataset_v_1_5_0/071112_EAS1_0089_FC20120_R1\
    /Data/C2-37,39-74_Firecrest1.5.0_07-10-2009_craczy/Bustard1.5.0_07-10-2009_craczy


    #copy the PBS job script from /gpfs/runtime/bioinfo/bin/pbs_gerald_batch.script
    cp /gpfs/runtime/bioinfo/casava1.7_data_script/pbs_gerald_batch_rna-seq.script .

    #copy the config.txt file
    cp /gpfs/runtime/bioinfo/casava1.7_data_script/gerald_config_rna-seq.txt .

    #submit the job
    qsub pbs_gerald_batch_rna-seq.script

    #check job status
    showq -w user=$USER

    #When job finishes, it will create a output+error file "pbs_gerald.out" . Take a look at it and see if there is any error.
    tail -n 30 gerald.out

    #Notice: There are two functions "gnuplot" and "xsltproc" that are not available on our cluster nodes. So the some plots and html files are missing in the folder. You will need to create them yourself:
    #Otherwise, if the plots and html files are not essential for your application, this step can be ignored.
    cd GERALD_$(date '+%d-%m-%Y%n')_$USER
    /users/ldong/bio/bin/make_error_plot_and_rerun_make_for_gerald.py


    #review the results:
    The pbs cluster script will create a folder name "GERALD_today's-date_your-user-id" , which contains the alignment results. You can open the master summary file "summary.htm" with your browser to overview the results.
     
    #to run alignment on your own data:
    The following is the PBS cluster job submission script, to run the script for your own data, the file gerald_config_rna-seq.txt and the value of "read_folder" and "walltime" in the job script "
    pbs_gerald_batch_rna-seq.script" needs to be changed accordingly:



    Run Demultiplexer.pl on testing data using Oscar computer cluster (not working yet):


    #log in oscar
    ssh oscar

    #make a temporary directory to testing purpose
    mkdir data/tem_test

    #go to the temporary directory
    cd data/tem_test

    #take a look at the test data set we are going to work on
    ls -l /gpfs/runtime/bioinfo/casava1.7_data_script/TestData/Demultiplexer/PE/Bustard1.5.1_11-11-2009_craczy

    #copy the config file
    cp  /gpfs/runtime/bioinfo/casava1.7_data_script/demultiplex_gerald_config.txt .

    #copy the PBS job script

    cp  /gpfs/runtime/bioinfo/casava1.7_data_script/pbs_demultiplex_gerald_batch.script .

    #submit the job
    qsub pbs_demultiplex_gerald_batch.script

    #check job status
    showq -w user=$USER


    #When job finishes, it will create a output+error file "demultiplexer_gerald.out" . Take a look at it and make sure there are no errors.
    tail -n 30 demultiplexer_gerald.out

    #review the results:
    The pbs cluster script will create a folder named "demultiplexed", which contains folders for each bar-code, and in each bar-code folder, there is a folder called "GERALD_today's-date_your-user-id" , which contains the alignment results. You can open the master summary file "summary.htm" with your browser to overview the results. 

    #to run alignment on your own data:
    To run the script for your own data, the file "sampleSheet.csv", "config.template.txt" and the value of  "-input_dir" and "walltime" in the job script "
    pbs_demultiplex_gerald_batch.script" needs to be changed accordingly:




    Post Alignment Analysis Using CASAVA1.7

    The following page explains the steps needed to process data produced by the CASAVE aligners. Post-Alignment Analysis tools (green part in above figure) can be used for detecting SNP calls, indels, genomic copy number and DGE counting.

    The CACAVA1.7 pipeline software provides "run.pl" and "runRNA.pl" to prepare the task files, then "taskServer.pl" to execute the tasks.

    Run "run.pl" (or "runRNA.pl")  to create the task file on test data, and then run "taskServer.pl" to execute the task using using IBM HP cluster:

    #log in oscar
    ssh oscar

    #make a temporary directory to testing purpose
    mkdir data/tem_test

    #go to the temporary directory
    cd data/tem_test

    #take a look at the test date set
    ls /gpfs/runtime/opt/casava/1.7.0/share/CASAVA-1.7.0/examples/GERALD

    #copy the PBS job script
    cp  /gpfs/runtime/bioinfo/casava1.7_data_script/pbs_post_align_batch.script .

    #submit the job
    qsub pbs_post_align_batch.script

    #check job status
    showq -w user=$USER

    #When job finishes, it will create a output+error file "pbs_post_align.out" . Take a look at it and make sure there are no errors.
    tail -n 30 pbs_post_align.out

    #review the results:
    The pbs cluster script will create a folder name "POST_ALIGN_today's-date_your-user-id" , which contains the analysis results. 

    #to run alignment on your own data:
    The following is the PBS cluster job submission script, to run the script on your own data, the value of "CASAVA_FEATURES", "CASAVA_DATA" and "walltime" in the job script "
    pbs_post_align_batch.script" needs to be changed accordingly:


     #!/bin/sh
    # '#PBS' is the prefix for PBS directives - see "man qsub" for additional options...
    # submit this job with "qsub this_file_name"
    # check on the queue with "showq"
    # delete queueed jobs with "qdel job_name" (use the name found with "showq")

    # name the job
    #PBS -N pbs_post_align_run
    #PBS -r n

    # set up output file for stdout and combine stdout and stderr streams
    #PBS -o pbs_post_align.out
    #PBS -j oe

    #get a email notice when job is done
    #PBS -m e
    #PBS -M your_emal@brown.edu

    # request one node (implicitly 8 processors)
    #PBS -l nodes=1

    # specify a maximum wall clock execution limit - running over will kill job...
    #PBS -l walltime=02:00:00

    cd $PBS_O_WORKDIR

    echo work dir is $PBS_O_WORKDIR

    #PBS_O_WORKDIR=.
    CASAVA_PATH=/gpfs/runtime/opt/casava/1.7.0/bin
    CASAVA_FEATURES=/gpfs/runtime/opt/casava/1.7.0/share/CASAVA-1.7.0/examples/features
    CASAVA_EXAMPLES=/gpfs/runtime/opt/casava/1.7.0/share/CASAVA-1.7.0/examples

    out_folder=$PBS_O_WORKDIR/POST_ALIGN_PE_$(date '+%d-%m-%Y%n')_$('whoami')

    #Run default paired DNA analysis targets on test E_coli data run TestEColiPE lane 4
    ${CASAVA_PATH}/run.pl --runId=TestEColiPE --projectDir=$out_folder \
    -e ${CASAVA_EXAMPLES}/GERALD -l 4 \
    --refSequences=${CASAVA_EXAMPLES}/genomes/E_coli --snpCovCutoff=-1 --indelsCovCutoff=-1

    #out_folder=$PBS_O_WORKDIR/POST_ALIGN_SE_$(date '+%d-%m-%Y%n')_$('whoami')

    #Run default single-ended DNA analysis targets on E_coli data run TestEColiSE lane 4
    #${CASAVA_PATH}/run.pl --runId=TestEColiSE --projectDir=$out_folder \
    #-e ${CASAVA_EXAMPLES}/GERALD -l 4 \
    #--refSequences=${CASAVA_EXAMPLES}/genomes/E_coli --snpCovCutoff=-1 --readMode=single

    #out_folder=$PBS_O_WORKDIR/POST_ALIGN_RNA_UHR_$(date '+%d-%m-%Y%n')_$('whoami')

    #Run default RNA analysis targets on Human_UHR chromosome 22 data run TestRNAUHR lane 2
    #${CASAVA_PATH}/runRNA.pl --runId=TestRNAUHR --projectDir=$out_folder \
    #--seqGeneMdFile=${CASAVA_FEATURES}/human/NCBI/Build36.3/seq_gene.md.gz \
    #-e ${CASAVA_EXAMPLES}/RNA_UHR_GERALD -l 2 \
    #--refSequences ${CASAVA_EXAMPLES}/genomes/human

    echo ready to run the task $(ls $out_folder/tasks*)
    ${CASAVA_PATH}/taskServer.pl --tasksFile=$(ls $out_folder/tasks*) --host=localhost --jobs=8

    echo Finished execution at `date`


    Comments