Login into HPC
mcescalo@login.hpc.ncsu.edu
Change the working directory to my user name under bitcpt group
>>: cd /share/bitcpt/Fall2022/mcescalo/Portafolio
Create "Sl.starindex.sh" shell script using vi command and write the star index script for Sl
>>>: vi Sl.starindex.sh
#!/bin/tcsh
#BSUB -J starindices_Sl_Caroindex #job name
#BSUB -n 10 #number of nodes
#BSUB -W 2:0 #time for job to complete
#BSUB -o starindices.out.%J #output file
#BSUB -e starindices.err.%J #error file
# For running star to generate genome index
# Run in working directory /share/bitcpt/Fall2022/UnityID/Portofolio
# Must run this in working directory with subdirectory named /starindices
module load conda
conda activate /usr/local/usrapps/bitcpt/star
set IN=/gpfs_common/share03/bitcpt/Fall2022/referenceGenomes/Solanum_lycopersicum/Portfolio/Tom-Heinz1706
STAR --runThreadN 10 --runMode genomeGenerate --genomeSAindexNbases 13 --genomeDir starindices --genomeFastaFiles ${IN}/Tom-Heinz_assembly.fasta --sjdbGTFfile ${IN}/Tom-Heinz.agat.gtf --sjdbOverhang 58
Submit the job to run the star indexing
>>>: bsub < Sl.starindex.sh
After successful completion of the star indexing job, change the directory to starindices (output directory for star indexing)
>>>: cd starindices
list out the files inside the starindices directory using the tree command
>>>: tree
.
├── chrLength.txt
├── chrNameLength.txt
├── chrName.txt
├── chrStart.txt
├── exonGeTrInfo.tab
├── exonInfo.tab
├── geneInfo.tab
├── Genome
├── genomeParameters.txt
├── Log.out
├── SA
├── SAindex
├── sjdbInfo.txt
├── sjdbList.fromGTF.out.tab
├── sjdbList.out.tab
└── transcriptInfo.tab
Interpretation of the Script
Option Breakdown
--runThreadN 10 ##Set the number of threads, must match #BSUB -n value
--runMode genomeGenerate ##STAR can do genome indexing and alignment. Here we're telling STAR that we want to index the genome.
--genomeDir starindices/ ##Tell STAR where to write the output
--genomeFastaFiles ${IN}/Tom-Heinz_assembly.fasta ##Tell STAR where the genome assembly file(s) are
--sjdbGTFfile ${IN}/Tom-Heinz.agat.gtf ##Tell STAR where the annotation file is
--sjdbOverhang 36 ##Length of RNA-seq reads - 1. After cleaning, our Arabidopsis reads are 58 bp.