DADA2 LongRead
DATA:
Technology = PacBio SeqWell2
Amplicon = 16S rRNA gene long read with Primers(27F/1492R)
- F27:5'-AGRGTTYGATYATGGCTCAG-3'
- R1492:5'-RGTTACCTTGTTACGACTT-3'(R1492-Revcom: AAGTCGTAACAAGGTAACY)
16S_Forward:/5Phos/GCATC-[PacBio barcode sequences]-AGRGTTYGATYMTGGCTCAG
16S_Reverse:/5Phos/GCATC-[PacBio barcode sequences]-RGYTACCTTGTTACGACTT
input file = [Unziped.fastq]
ref. benjjneb.github
まず、プライマーを除く必要がある!!
R環境設定
R@ library(dada2); library(Biostrings); library(ShortRead); library(ggplot2); library(reshape2); library(gridExtra)
Dereplicate:
R@ drp2 <- derepFastq('../working/', verbose=T)
- Dereplicating sequence entries in Fastq file: ../working//MAO6_LR16S_keep75.fastq
- Encountered 36055 unique sequences from 61498 total sequences read.
Learning errors
R@ err2 <- learnErrors(drp2, errorEstimationFunction=PacBioErrfun, BAND_SIZE=32, multithread=T)
- 93761581 total bases in 61498 reads from 1 samples will be used for learning the error rates.
R@ saveRDS(err2, "./MAO6_err2.rds")
R@ plotErrors(err2)
Denoising
R@ dd2 <- dada(drp2, err=err2, BAND_SIZE=32, multithread=T)
- Sample 1 - 61498 reads in 36055 unique sequences.
R@ saveRDS(dd2, "MAO6_dd2.rds")
Writeout sequence table
R@ st2 <- makeSequenceTable(dd2); dim(st2)
-----
Fasta形式へ変換
% cat dada_keep75v2.txt | sed "s/^/>ASV_/" | perl -pe 's/\s/\n/g' > MAO6kp75_dada2.fas
アラインメント(F/Rの方向を一方へ統一: --adjustdirection)
% mafft --thread 4 --adjustdirection MAO6kp75_dada2.fas > MAO6kp75_dada2_ali.fas