STAR software is used to index the chosen reference genome assembly, Lee Assembly v1. This will be used as the reference genome to be aligned to the Raw Data for Glycine Max.
#!/bin/tcsh
#BSUB -J starindices_Portfolio #job name
#BSUB -n 10 #number of nodes
#BSUB -W 2:0 #time for job to complete
#BSUB -o starindices.out.%J #output file
#BSUB -e starindices.err.%J #error file
# For running star to generate genome index
# Run in working directory /share/bitcpt/S23/UnityID/Soy
# Must run this in working directory with subdirectory named starindices/
set STAR=/usr/local/usrapps/bitcpt/star/bin/STAR
set IN=/share/bitcpt/S23/referenceGenomes/Portfolios/Glycine_max_Lee_v1/
${STAR} --runThreadN 10 --runMode genomeGenerate --genomeSAindexNbases 12 --genomeDir
starindices/ --genomeFastaFiles ${IN}/glyma.Lee.gnm1.BXNC.genome_main.fna --sjdbGTFfile ${IN}/glyma.Lee.gnm1.ann1.6NZV.gene_models_main.AGAT.gtf --sjdbOverhang 100
The job script should be placed in the working directory following the path:
/share/bitcpt/S23/UnityID/Portfolio
To the run the job:
bsub <"Job name"
To check the job is running:
bjobs
The output files of the code will look like:
This command defines the variable STAR to be the path to the STAR software:
set STAR=/usr/local/usrapps/bitcpt/star/bin/STAR
This command defines the variable IN to be the path to the reference genome chosen:
set IN=/share/bitcpt/S23/referenceGenomes/Portfolios/Glycine_max_Lee_v1/
This sets the number of threads:
--runThreadN 10
This is telling STAR to index genome:
--runMode genomeGenerate
Integer length (bases) of the SA pre-indexing string:
--genomeSAindexNbases 12
(–genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).)
This tells STAR where to put the output:
--genomeDir starindices/
This tells STAR where the genome assembly file is:
--genomeFastaFiles ${IN}/glyma.Lee.gnm1.BXNC.genome_main.fna
This tells STAR where the annotation file is:
--sjdbGTFfile ${IN}/glyma.Lee.gnm1.ann1.6NZV.gene_models_main.AGAT.gtf
Length of RNA-seq reads - 1. Although, in most cases default value of 100 works as well as the ideal value
--sjdbOverhang 100