Casava 1.7 for DNA-Seq


Run alignment using CASAVA1.7

The following page explains the steps needed to process data produced by the Illumina Genome Analyzer using example data. The Genome Analyzer machine is hosted in the Brown University Center for Genomics and Proteomics and we are notified when a new run has completed.

The CACAVA1.7 pipeline software provided two options to to run the alignments using aligner program ELAND:

  • GERALD.pl (Pink part in above figure): for aligning the non-bar-coded reads with reference sequence.
  • demultiplexer.pl (green part in above figure, note: in CASAVA1.7 this includes two steps, demultiplexer.pl and demultiplexedGERALD.pl): for seperating bar-coded reads into differnet bins, then make alignments.

Run GERALD.pl on testing data using IBM HP cluster:

#log in oscar
ssh oscar

#make and go to a temporary directory for testing purpose, Notice that files in the scratch folder more than 4 weeks old will be automatically deleted
mkdir scratch/tem_test && cd scratch/tem_test

#take a look at the test date set
ls /gpfs/runtime/bioinfo/casava1.7_data_script/Illumina_Genome_Analyzer_Validation_Dataset_v_1_5_0/071112_EAS1_0089_FC20120_R1\
/Data/C2-37,39-74_Firecrest1.5.0_07-10-2009_craczy/Bustard1.5.0_07-10-2009_craczy


#copy the PBS job script from /gpfs/runtime/bioinfo/bin/pbs_gerald_batch.script
cp /gpfs/runtime/bioinfo/casava1.7_data_script/pbs_gerald_batch.script .

#copy the config.txt file
cp /gpfs/runtime/bioinfo/casava1.7_data_script/gerald_config.txt .

#submit the job
qsub pbs_gerald_batch.script

#check job status
showq -w user=$USER

#When job finishes, it will create a output+error file "pbs_gerald.out" . Take a look at it and see if there is any error.
tail -n 30 gerald.out

#Notice: There are two functions "gnuplot" and "xsltproc" that are not available on our cluster nodes. So the some plots and html files are missing in the folder. You will need to create them yourself:
#Otherwise, if the plots and html files are not essential for your application, this step can be ignored.
cd GERALD_$(date '+%d-%m-%Y%n')_$USER
/users/ldong/bio/bin/make_error_plot_and_rerun_make_for_gerald.py


#review the results:
The pbs cluster script will create a folder name "GERALD_today's-date_your-user-id" , which contains the alignment results. You can open the master summary file "summary.htm" with your browser to overview the results.
 
#to run alignment on your own data:
The following is the PBS cluster job submission script, to run the script for your own data, the file gerald_config.txt and the value of "read_folder" and "walltime" in the job script "
pbs_gerald_batch.script" needs to be changed accordingly:



Run Demultiplexer.pl on testing data using IBM HP cluster:


#log in oscar
ssh oscar

#make a temporary directory to testing purpose
mkdir data/tem_test

#go to the temporary directory
cd data/tem_test

#take a look at the test data set we are going to work on
ls -l /gpfs/runtime/bioinfo/casava1.7_data_script/TestData/Demultiplexer/PE/Bustard1.5.1_11-11-2009_craczy

#copy the config file
cp  /gpfs/runtime/bioinfo/casava1.7_data_script/demultiplex_gerald_config.txt .

#copy the PBS job script

cp  /gpfs/runtime/bioinfo/casava1.7_data_script/pbs_demultiplex_gerald_batch.script .

#submit the job
qsub pbs_demultiplex_gerald_batch.script

#check job status
showq -w user=$USER


#When job finishes, it will create a output+error file "demultiplexer_gerald.out" . Take a look at it and make sure there are no errors.
tail -n 30 demultiplexer_gerald.out

#review the results:
The pbs cluster script will create a folder named "demultiplexed", which contains folders for each bar-code, and in each bar-code folder, there is a folder called "GERALD_today's-date_your-user-id" , which contains the alignment results. You can open the master summary file "summary.htm" with your browser to overview the results. 

#to run alignment on your own data:
To run the script for your own data, the file "sampleSheet.csv", "config.template.txt" and the value of  "-input_dir" and "walltime" in the job script "
pbs_demultiplex_gerald_batch.script" needs to be changed accordingly: