FastQC

Image Reference:

FastQC was conducted on all of the sequencing reads provided in the RawData/Glycine_Max folder by running the job script using the HPC. This job was run in the directory titled /share/bitcpt/S23/cdthieke/Soy within the HPC. The following job script was provided by the instructors and used to obtain the fastqc data for each sequence.

Code for FastQC

#!/bin/tcsh

#BSUB -J fastqc_Soy #job name

#BSUB -n 50 #number of nodes

#BSUB -W 2:0 #time for job to complete

#BSUB -o fastqc.%J.out #output file

#BSUB -e fastqc.%J.err #error file

# For running fastqc on all my Soy samples

# Run in working directory /share/bitcpt/S23/UnityID/Soy

# Must run this in working directory with subdirectory named /fastqc

# -t specifies number of threads

/usr/local/usrapps/bitcpt/fastqc/bin/fastqc /share/bitcpt/S23/RawData/Glycine_max/* -t 20 -ou

tdir ./fastqc

Work Flow and Description of Code

Work Flow

The job script should be placed in the working directory following the path:

/share/bitcpt/S23/UnityID/Portfolio

To the run the job:

bsub <"Job name"

To check the job is running:

bjobs

After the job has finished running, the .html files will be transferred to the local machine with the Globus software. The .html files can then be opened for analysis of the fastqc data.

Description

This denotes the number of requested nodes for the job:

#BSUB -n 50 #number of nodes

This is the path to the fastqc software:

/usr/local/usrapps/bitcpt/fastqc/bin/fastqc

This is the input file path:

/share/bitcpt/S23/RawData/Glycine_max/

This denotes the option to use 20 threads:

-t 20

This denotes the output directory:

-outdir ./fastqc

The output files in the fastqc subdirectory will be .zip and . html files for each sample Rep.

Fast QC Analysis

Young Leaf FastQC data

Basic Statistics

This category doesn't mention the quality of the data and instead goes over composite information about the data.

Per Base Sequence Quality

This category includes a graph with the y-axis denotes quality scores of each base call. The green color indicates good quality calls, yellow indicates acceptable quality calls, and red indicating poor quality calls.

Above is an example for one of the Young Leaf Glycine Max samples. For all of the Young Leaf Glycine Max samples analyzed, all base calls were found to be within the green zone.

After analyzing the FastQC data for all the Young Leaf Glycine Max samples, it was concluded that the data was high quality and did not need to be trimmed.

Old Leaf FastQC Data

After analyzing the Old Leaf Glycine Max samples, it was determined that the FastQC data was of high quality and none of the sequences needed to be trimmed.

Reference:

https://sites.google.com/ncsu.edu/big-boy-leaf/home#h.9hsrhoj1gpjj

Meristem FastQC Data

MultiQC data analysis was run for the meristem samples. It was determined that the meristem samples FastQC data was of high quality and none of the sequences required trimming.

Reference:

https://sites.google.com/ncsu.edu/bitcpt-spring2023-meristem/fastqcmultiqc?authuser=0

Page updated

Report abuse