Quality Control: FastQC

Objective

First working directories will be set up for project in HPC. Then, quality control checks will be preformed using the software FastQC to spot potential problems raw sequence data coming from high throughput sequencing. FastQC reports will then be downloaded and analyzed.

FASTQC

CODE

Make needed directories - UnityID was personalized:

cd /share/bitcpt/Fall2022/UnityID

mkdir At Tom Heinz

cd Heinz

mkdir AlignedToTranscriptome fastqc starindices transcriptome salmon_align_quant starOutputfiles

Copy At script to working directory and edit:

cd ../Heinz

cp /share/bitcpt/Fall2022/scripts/At.fastqc.sh /share/bitcpt/Fall2022/cpjohns4/Heinz/Heinz.fastqc.sh

vi Heinz.fastqc.sh

##edit to change path to Solanum_lycopersicum

Submit job:

bsub < Heinz.fastqc.sh

##check that it's submitted/running

bjobs

Download FastQC files using Globus:

scp ‘UnityID@login.hpc.ncsu.edu:/share/bitcpt/Fall2022/UnityID/At/fastqc/*’ /Users/ComputerName/Documents/School/BIT_CPT

Edited Tomato FASTQC script

#!/bin/tcsh

#BSUB -J fastqc_Heinz_UnityID #job name

#BSUB -n 20 #number of nodes

#BSUB -W 2:0 #time for job to complete

#BSUB -o fastqc.out.%J #output file

#BSUB -e fastqc.err.%J #error file

# For running fastqc on all my Heinz samples

# Run in working directory /share/bitcpt/Fall2022/UnityID/Heinz

# Must run this in working directory with subdirectory named /fastqc

module load conda

conda activate /usr/local/usrapps/bitcpt/fastqc

# -t specifies number of threads

fastqc /share/bitcpt/Fall2022/RawData/Solanum_lycopersicum/* -t 20 -o ./fastqc

~

ANALYSIS FASTQC

BIT_CPT_Heinz_FastQC

FastQC Results Examples

Figure 1. Heinz Leaf Rep1_3X_1.fq Per Base Sequence Quality showing overall high quality scores.

Figure 2. Heinz Leaf Rep1_3X_1.fq Per base sequence content failed for base content early in sequence. This is typical in RNA sequencing.

CONCLUSIONS

Overall these files appreared to be adaquate for use but will still require triming. Since the data is RNA-Seq, the FastQC Report indicated that there were abnormal base sequence contents for each of our reads. Becuase of this, the beginning ~12 nucleotides will be trimmed.