First working directories will be set up for project in HPC. Then, quality control checks will be preformed using the software FastQC to spot potential problems raw sequence data coming from high throughput sequencing. FastQC reports will then be downloaded and analyzed.
Make needed directories - UnityID was personalized:
cd /share/bitcpt/Fall2022/UnityID
mkdir At Tom Heinz
cd Heinz
mkdir AlignedToTranscriptome fastqc starindices transcriptome salmon_align_quant starOutputfiles
Copy At script to working directory and edit:
cd ../Heinz
cp /share/bitcpt/Fall2022/scripts/At.fastqc.sh /share/bitcpt/Fall2022/cpjohns4/Heinz/Heinz.fastqc.sh
vi Heinz.fastqc.sh
##edit to change path to Solanum_lycopersicum
Submit job:
bsub < Heinz.fastqc.sh
##check that it's submitted/running
bjobs
Download FastQC files using Globus:
scp ‘UnityID@login.hpc.ncsu.edu:/share/bitcpt/Fall2022/UnityID/At/fastqc/*’ /Users/ComputerName/Documents/School/BIT_CPT
#!/bin/tcsh
#BSUB -J fastqc_Heinz_UnityID #job name
#BSUB -n 20 #number of nodes
#BSUB -W 2:0 #time for job to complete
#BSUB -o fastqc.out.%J #output file
#BSUB -e fastqc.err.%J #error file
# For running fastqc on all my Heinz samples
# Run in working directory /share/bitcpt/Fall2022/UnityID/Heinz
# Must run this in working directory with subdirectory named /fastqc
module load conda
conda activate /usr/local/usrapps/bitcpt/fastqc
# -t specifies number of threads
fastqc /share/bitcpt/Fall2022/RawData/Solanum_lycopersicum/* -t 20 -o ./fastqc
~
~
~
Figure 1. Heinz Leaf Rep1_3X_1.fq Per Base Sequence Quality showing overall high quality scores.
Figure 2. Heinz Leaf Rep1_3X_1.fq Per base sequence content failed for base content early in sequence. This is typical in RNA sequencing.
Overall these files appreared to be adaquate for use but will still require triming. Since the data is RNA-Seq, the FastQC Report indicated that there were abnormal base sequence contents for each of our reads. Becuase of this, the beginning ~12 nucleotides will be trimmed.