Genome Indexing

By creating a genome index, the aligner can focus on a query sequence's potential genomic origin.

Login

mcescalo@login.hpc.ncsu.edu

Change the working directory to my user name under bitcpt group

>>: cd /share/bitcpt/Fall2022/mcescalo/Portafolio

Create "Sl.starindex.sh" shell script using vi command and write the star index script for Sl

>>>: vi Sl.starindex.sh

#!/bin/tcsh

#BSUB -J starindices_Sl_Caroindex #job name

#BSUB -n 10 #number of nodes

#BSUB -W 2:0 #time for job to complete

#BSUB -o starindices.out.%J #output file

#BSUB -e starindices.err.%J #error file

# For running star to generate genome index

# Run in working directory /share/bitcpt/Fall2022/UnityID/Portofolio

# Must run this in working directory with subdirectory named /starindices

module load conda

conda activate /usr/local/usrapps/bitcpt/star

set IN=/gpfs_common/share03/bitcpt/Fall2022/referenceGenomes/Solanum_lycopersicum/Portfolio/Tom-Heinz1706

STAR --runThreadN 10 --runMode genomeGenerate --genomeSAindexNbases 13 --genomeDir starindices --genomeFastaFiles ${IN}/Tom-Heinz_assembly.fasta --sjdbGTFfile ${IN}/Tom-Heinz.agat.gtf --sjdbOverhang 58

Submit the job to run the star indexing

>>>: bsub < Sl.starindex.sh

After successful completion of the star indexing job, change the directory to starindices (output directory for star indexing)

>>>: cd starindices

list out the files inside the starindices directory using the tree command

>>>: tree

├── chrLength.txt

├── chrNameLength.txt

├── chrName.txt

├── chrStart.txt

├── exonGeTrInfo.tab

├── exonInfo.tab

├── geneInfo.tab

├── Genome

├── genomeParameters.txt

├── Log.out

├── SA

├── SAindex

├── sjdbInfo.txt

├── sjdbList.fromGTF.out.tab

├── sjdbList.out.tab

└── transcriptInfo.tab

Interpretation of the Script

Option Breakdown

--runThreadN 10 ##Set the number of threads, must match #BSUB -n value

--runMode genomeGenerate ##STAR can do genome indexing and alignment. Here we're telling STAR that we want to index the genome.

--genomeDir starindices/ ##Tell STAR where to write the output

--genomeFastaFiles ${IN}/Tom-Heinz_assembly.fasta ##Tell STAR where the genome assembly file(s) are

--sjdbGTFfile ${IN}/Tom-Heinz.agat.gtf ##Tell STAR where the annotation file is

--sjdbOverhang 36 ##Length of RNA-seq reads - 1. After cleaning, our Arabidopsis reads are 58 bp.

Page updated

Report abuse

Genome Indexing

By creating a genome index, the aligner can focus on a query sequence's potential genomic origin.

Login

Get in touch at mcescalo@ncsu.edu