fastp

Introduction

This is for informational purposes. Fastp trimming will be done by instructors.

Sometimes, our raw sequence data needs to be cleaned and trimmed before using. To find out what kind of cleaning and trimming we need to do, we should run fastqc and read the fastqc reports.

Our fastqc reports indicated several quality aspects that need attention.

Per-base sequence content at the front (beginning) of reads
Unidentified over-represented sequences
Poly-X tails
Sequence duplication

Examples of sequence quality aspects

Per-base sequence content at the front (beginning) of reads

fastp script for trimming

1) Hard trims the first 14 base pairs of each read to address the per-base sequence content issues

2) Trims poly-x tails

3) Automatically detects adapter sequences and trims

4) Trims poor quality base calls

5) Removes poor quality reads

#!/bin/tcsh

#BSUB -J fastp_At-Leaf #job name

#BSUB -n 20 #number of nodes

#BSUB -W 2:0 #time for job to complete

#BSUB -o fastp_At-Leaf_%J.out #output file

#BSUB -e fastp_At-Leaf_%J.err #error file

module load conda

conda activate /usr/local/usrapps/bitcpt/fastp

#File structure: At-Leaf1_L02_1.fq.gz

set S1=At-Leaf1_L02

set IN=/share/bitcpt/Fall2022/RawData/Arabidopsis_thaliana

set OUT=/share/bitcpt/Fall2022/RawData/Arabidopsis_thaliana/TrimData_At

fastp

-i ${IN}/${S1}_1.fq.gz -I ${IN}/${S1}_2.fq.gz

-o ${OUT}/${S1}_1.fp.fq.gz -O ${OUT}/${S1}_2.fp.fq.gz

--json ${OUT}/${S1}.json --html ${OUT}/${S1}.html

--length_required 50

--detect_adapter_for_pe

--trim_poly_g --trim_poly_x

--trim_front1 14 --trim_front2 14

--qualified_quality_phred 15

--unqualified_percent_limit 40

Page updated

Report abuse

fastp

Introduction

Examples of sequence quality aspects

Per-base sequence content at the front (beginning) of reads

fastp script for trimming