PowerPoint Presentations

    Ingenuity Pathway Analysis (IPA) Guide

    Summer Bioinformatics Course

    Recent Publications

    MicroRNA Seq

    Gene Ontology Analysis

    Phylogeny Analysis

    NCBI SRA Download

    Recent site activity

    Casava 1.7 for DNA-Seq


    Run alignment using CASAVA1.7

    The following page explains the steps needed to process data produced by the Illumina Genome Analyzer using example data. The Genome Analyzer machine is hosted in the Brown University Center for Genomics and Proteomics and we are notified when a new run has completed.

    The CACAVA1.7 pipeline software provided two options to to run the alignments using aligner program ELAND:

    • GERALD.pl (Pink part in above figure): for aligning the non-bar-coded reads with reference sequence.
    • demultiplexer.pl (green part in above figure, note: in CASAVA1.7 this includes two steps, demultiplexer.pl and demultiplexedGERALD.pl): for seperating bar-coded reads into differnet bins, then make alignments.

    Run GERALD.pl on testing data using IBM HP cluster:

    #log in oscar
    ssh oscar

    #make and go to a temporary directory for testing purpose, Notice that files in the scratch folder more than 4 weeks old will be automatically deleted
    mkdir scratch/tem_test && cd scratch/tem_test

    #take a look at the test date set
    ls /gpfs/runtime/bioinfo/casava1.7_data_script/Illumina_Genome_Analyzer_Validation_Dataset_v_1_5_0/071112_EAS1_0089_FC20120_R1\
    /Data/C2-37,39-74_Firecrest1.5.0_07-10-2009_craczy/Bustard1.5.0_07-10-2009_craczy


    #copy the PBS job script from /gpfs/runtime/bioinfo/bin/pbs_gerald_batch.script
    cp /gpfs/runtime/bioinfo/casava1.7_data_script/pbs_gerald_batch.script .

    #copy the config.txt file
    cp /gpfs/runtime/bioinfo/casava1.7_data_script/gerald_config.txt .

    #submit the job
    qsub pbs_gerald_batch.script

    #check job status
    showq -w user=$USER

    #When job finishes, it will create a output+error file "pbs_gerald.out" . Take a look at it and see if there is any error.
    tail -n 30 gerald.out

    #Notice: There are two functions "gnuplot" and "xsltproc" that are not available on our cluster nodes. So the some plots and html files are missing in the folder. You will need to create them yourself:
    #Otherwise, if the plots and html files are not essential for your application, this step can be ignored.
    cd GERALD_$(date '+%d-%m-%Y%n')_$USER
    /users/ldong/bio/bin/make_error_plot_and_rerun_make_for_gerald.py


    #review the results:
    The pbs cluster script will create a folder name "GERALD_today's-date_your-user-id" , which contains the alignment results. You can open the master summary file "summary.htm" with your browser to overview the results.
     
    #to run alignment on your own data:
    The following is the PBS cluster job submission script, to run the script for your own data, the file gerald_config.txt and the value of "read_folder" and "walltime" in the job script "
    pbs_gerald_batch.script" needs to be changed accordingly:



    Run Demultiplexer.pl on testing data using IBM HP cluster:


    #log in oscar
    ssh oscar

    #make a temporary directory to testing purpose
    mkdir data/tem_test

    #go to the temporary directory
    cd data/tem_test

    #take a look at the test data set we are going to work on
    ls -l /gpfs/runtime/bioinfo/casava1.7_data_script/TestData/Demultiplexer/PE/Bustard1.5.1_11-11-2009_craczy

    #copy the config file
    cp  /gpfs/runtime/bioinfo/casava1.7_data_script/demultiplex_gerald_config.txt .

    #copy the PBS job script

    cp  /gpfs/runtime/bioinfo/casava1.7_data_script/pbs_demultiplex_gerald_batch.script .

    #submit the job
    qsub pbs_demultiplex_gerald_batch.script

    #check job status
    showq -w user=$USER


    #When job finishes, it will create a output+error file "demultiplexer_gerald.out" . Take a look at it and make sure there are no errors.
    tail -n 30 demultiplexer_gerald.out

    #review the results:
    The pbs cluster script will create a folder named "demultiplexed", which contains folders for each bar-code, and in each bar-code folder, there is a folder called "GERALD_today's-date_your-user-id" , which contains the alignment results. You can open the master summary file "summary.htm" with your browser to overview the results. 

    #to run alignment on your own data:
    To run the script for your own data, the file "sampleSheet.csv", "config.template.txt" and the value of  "-input_dir" and "walltime" in the job script "
    pbs_demultiplex_gerald_batch.script" needs to be changed accordingly: