FastQC was conducted on all of the sequencing reads provided in the RawData/Glycine_Max folder by running the job script using the HPC. This job was run in the directory titled /share/bitcpt/S23/cdthieke/Soy within the HPC. The following job script was provided by the instructors and used to obtain the fastqc data for each sequence.
#!/bin/tcsh
#BSUB -J fastqc_Soy #job name
#BSUB -n 50 #number of nodes
#BSUB -W 2:0 #time for job to complete
#BSUB -o fastqc.%J.out #output file
#BSUB -e fastqc.%J.err #error file
# For running fastqc on all my Soy samples
# Run in working directory /share/bitcpt/S23/UnityID/Soy
# Must run this in working directory with subdirectory named /fastqc
# -t specifies number of threads
/usr/local/usrapps/bitcpt/fastqc/bin/fastqc /share/bitcpt/S23/RawData/Glycine_max/* -t 20 -ou
tdir ./fastqc
The job script should be placed in the working directory following the path:
/share/bitcpt/S23/UnityID/Portfolio
To the run the job:
bsub <"Job name"
To check the job is running:
bjobs
After the job has finished running, the .html files will be transferred to the local machine with the Globus software. The .html files can then be opened for analysis of the fastqc data.
This denotes the number of requested nodes for the job:
#BSUB -n 50 #number of nodes
This is the path to the fastqc software:
/usr/local/usrapps/bitcpt/fastqc/bin/fastqc
This is the input file path:
/share/bitcpt/S23/RawData/Glycine_max/
This denotes the option to use 20 threads:
-t 20
This denotes the output directory:
-outdir ./fastqc
The output files in the fastqc subdirectory will be .zip and . html files for each sample Rep.
Basic Statistics
This category doesn't mention the quality of the data and instead goes over composite information about the data.
Per Base Sequence Quality
This category includes a graph with the y-axis denotes quality scores of each base call. The green color indicates good quality calls, yellow indicates acceptable quality calls, and red indicating poor quality calls.
Above is an example for one of the Young Leaf Glycine Max samples. For all of the Young Leaf Glycine Max samples analyzed, all base calls were found to be within the green zone.
After analyzing the FastQC data for all the Young Leaf Glycine Max samples, it was concluded that the data was high quality and did not need to be trimmed.
After analyzing the Old Leaf Glycine Max samples, it was determined that the FastQC data was of high quality and none of the sequences needed to be trimmed.
MultiQC data analysis was run for the meristem samples. It was determined that the meristem samples FastQC data was of high quality and none of the sequences required trimming.