For this project we aligned our reads to our now indexed Glycine Max Williams 82 ISU01 Assembly v2 reference genome again with the STAR software.
We have 15 soybean samples to align to our Glycine Max Williams 82 ISU01 Assembly v2 reference genome. Each of these samples has two paired reads, and after alignment, each sample had one output alignment.bam file for alignment both to the transcriptome *_Aligned.toTranscriptome.out.bam, and the genome *_Aligned.out.bam. Here's the code for our alignment job file.
#!/bin/tcsh
#BSUB -J staralign_Portfolio #job name
#BSUB -n 12 #number of threads
#BSUB -W 10:0 #time for job to complete
#BSUB -R span[hosts=1] #to keep tasks on one node
#BSUB -R "rusage[mem=0.2]" #to request a node with 20MB of memory
#BSUB -o staralign_Portfolio_%J.out #output file
#BSUB -e staralign_Portfolio_%J.err #error file
#to align RNA-seq reads to indexed genome using STAR
#STAR cannot make use of HPC MPI, must have -R options to set 1 node & memory
#set threads under 12 on Henry2
#input of indexed genome path is /share/bitcpt/S23/UNITYID/Portfolio/starindices
#input of sequence reads path is /share/bitcpt/S23/cleandata/Glycine_max/
#output of aligned reads will go into AlignedToTranscriptome subdirectory in working directory
set STAR=/usr/local/usrapps/bitcpt/star/bin/STAR
# SET IN VARIABLES
set IN=/share/bitcpt/S23/RawData/Glycine_max
set index=starindices
set out=AlignedToTranscriptome
################################
## Old Leaf Rep 1
################################
set S=Gm_OldLeaf_Rep1
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Old Leaf Rep 2
################################
set S=Gm_OldLeaf_Rep2
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Old Leaf Rep 3
################################
set S=Gm_OldLeaf_Rep3
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Old Leaf Rep 4
################################
set S=Gm_OldLeaf_Rep4
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Old Leaf Rep 5
################################
set S=Gm_OldLeaf_Rep5
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## SAM Rep 1
################################
set S=Gm_SA_Rep1
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## SAM Rep 2
################################
set S=Gm_SA_Rep2
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## SAM Rep 3
################################
set S=Gm_SA_Rep3
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## SAM Rep 4
################################
set S=Gm_SA_Rep4
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## SAM Rep 5
################################
set S=Gm_SA_Rep5
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Young Leaf Rep 1
################################
set S=Gm_YoungLeaf_Rep1
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Young Leaf Rep 2
################################
set S=Gm_YoungLeaf_Rep2
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Young Leaf Rep 3
################################
set S=Gm_YoungLeaf_Rep3
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Young Leaf Rep 4
################################
set S=Gm_YoungLeaf_Rep4
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
################################
## Young Leaf Rep 5
################################
set S=Gm_YoungLeaf_Rep5
set EN=fq.gz
${STAR} --runThreadN 12 --runMode alignReads --genomeDir ${index} --outFileNamePrefix ${out}/${S}_ --readFilesIn ${IN}/${S}_1.${EN} ${IN}/${S}_2.${EN} --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic --quantMode TranscriptomeSAM
For each sample, we had to align both of the paired reads for one sample, and put the output from that alignment to a directory that was named after that sample. This required the use of variable substitution. This can be seen after the comments that indicate each sample. For example for the SAM Rep 3 sample above, se set the variable S equal to the sample name and the variable EN equal to the file extension of our reads. Variables for the path to our genome index (index) and general output directory (out) were set earlier in the file, after the instructions to the HPC scheduler. In this way, we modified the otherwise same alignment command to the STAR executable to produce alignments separately for each of our soybean samples.
First In the directory where you want to make the script type
cat > jobname.sh
Your file for this step can be called staralign_portfolio.sh
Hit enter. There should appear a blank space.
You can now paste the script from your clipboard with right click.
Once the script is pasted in, hit the 'ctrl' and 'c' keys at the same time to save and exit.
Second you can submit a job to the LSF
bsub < job.sh
example: bsub <staralign_portfolio.sh
bsub without <job.sh will take you into a loop where you will need to "kill" your job with bkill job#
You can check the status of the job with bjobs
You must wait until this job has finished and check for any error by look into the .out and .err files that are created after the jobs is submitted.
Next we can move on to the SALMON quantification step.