This is for informational purposes. Fastp trimming will be done by instructors.
Sometimes, our raw sequence data needs to be cleaned and trimmed before using. To find out what kind of cleaning and trimming we need to do, we should run fastqc and read the fastqc reports.
Our fastqc reports indicated several quality aspects that need attention.
Per-base sequence content at the front (beginning) of reads
Unidentified over-represented sequences
Poly-X tails
Sequence duplication
1) Hard trims the first 14 base pairs of each read to address the per-base sequence content issues
2) Trims poly-x tails
3) Automatically detects adapter sequences and trims
4) Trims poor quality base calls
5) Removes poor quality reads
#!/bin/tcsh
#BSUB -J fastp_At-Leaf #job name
#BSUB -n 20 #number of nodes
#BSUB -W 2:0 #time for job to complete
#BSUB -o fastp_At-Leaf_%J.out #output file
#BSUB -e fastp_At-Leaf_%J.err #error file
module load conda
conda activate /usr/local/usrapps/bitcpt/fastp
#File structure: At-Leaf1_L02_1.fq.gz
set S1=At-Leaf1_L02
set IN=/share/bitcpt/Fall2022/RawData/Arabidopsis_thaliana
set OUT=/share/bitcpt/Fall2022/RawData/Arabidopsis_thaliana/TrimData_At
fastp
-i ${IN}/${S1}_1.fq.gz -I ${IN}/${S1}_2.fq.gz
-o ${OUT}/${S1}_1.fp.fq.gz -O ${OUT}/${S1}_2.fp.fq.gz
--json ${OUT}/${S1}.json --html ${OUT}/${S1}.html
--length_required 50
--detect_adapter_for_pe
--trim_poly_g --trim_poly_x
--trim_front1 14 --trim_front2 14
--qualified_quality_phred 15
--unqualified_percent_limit 40