In this section, I am going to be using STAR to incorporate my processed fasta files into a program that checks for overlaps and use them as a reference to form longer sequences. These build from reads, into contigs, and ultimately into scaffolds. This allows us to speed up the alignment process by finding the overlapping regions in comparison to the reference genome.
Spliced Transcripts Alignment to a Reference, STAR, is an aligner program that is designed to specifically address many of the challenges of RNA-seq data mapping using a strategy that is also spliced aware.
#Copy the index script into the Portfolio directory
#I am going to call my script files moving forward as Cher.
#directory: /share/bitcpt/Fall2022/UnityID/Portfolio
cp /share/bitcpt/Fall2022/mmohamm8/Tom/Tom.starindex.sh Cher.starindex.sh
ll #to make sure 'Cher.starindex.sh' is located in the Portfolio directory
#To look at the content of the index script
more Cher.starindex.sh
#To edit the script
vi Cher.starindex.sh #press I to edit, "esc" to stop editing, and type ":wq" to exit the file.
#Cherry tomato index script, make sure to have the starindices directory in the Portfolio directory
#!/bin/tcsh
#BSUB -J starindices_Cher_AfiqTom #job name
#BSUB -n 10 #number of nodes
#BSUB -W 2:0 #time for job to complete
#BSUB -o starindices.out.%J #output file
#BSUB -e starindices.err.%J #error file
# For running star to generate genome index
# Run in working directory /share/bitcpt/Fall2022/UnityID/Portfolio/Tom-Cherry
# Must run this in working directory with subdirectory named starindices created
module load conda
conda activate /usr/local/usrapps/bitcpt/star
set IN=/share/bitcpt/Fall2022/referenceGenomes/Solanum_lycopersicum/Portfolio/Tom-Cherry
STAR --runThreadN 10 --runMode genomeGenerate --genomeSAindexNbases 13 --genomeDir starindices --genomeFastaFiles ${IN}/Tom-Cherry_assembly.fasta --sjdbGTFfile ${IN}/Tom-Cherry.agat.gtf --sjdbOverhang 100
# I used the overhang default 100 and it worked!
First picture is the starindices output for my Tom Script. The second picture is output for my Cher script