DADA2 LongRead

DATA:

Technology = PacBio SeqWell2

Amplicon = 16S rRNA gene long read with Primers(27F/1492R)

  • F27:5'-AGRGTTYGATYATGGCTCAG-3'
  • R1492:5'-RGTTACCTTGTTACGACTT-3'(R1492-Revcom: AAGTCGTAACAAGGTAACY)

16S_Forward:/5Phos/GCATC-[PacBio barcode sequences]-AGRGTTYGATYMTGGCTCAG

16S_Reverse:/5Phos/GCATC-[PacBio barcode sequences]-RGYTACCTTGTTACGACTT


input file = [Unziped.fastq]

ref.  benjjneb.github

まず、プライマーを除く必要がある!!

R環境設定

R@ library(dada2); library(Biostrings); library(ShortRead); library(ggplot2); library(reshape2); library(gridExtra)

Dereplicate:

R@ drp2 <- derepFastq('../working/', verbose=T)

  • Dereplicating sequence entries in Fastq file: ../working//MAO6_LR16S_keep75.fastq
  • Encountered 36055 unique sequences from 61498 total sequences read.

Learning errors

R@ err2 <- learnErrors(drp2, errorEstimationFunction=PacBioErrfun, BAND_SIZE=32, multithread=T)

  • 93761581 total bases in 61498 reads from 1 samples will be used for learning the error rates.

R@ saveRDS(err2, "./MAO6_err2.rds")

R@ plotErrors(err2)

Denoising

R@ dd2 <- dada(drp2, err=err2, BAND_SIZE=32, multithread=T)

  • Sample 1 - 61498 reads in 36055 unique sequences.

R@ saveRDS(dd2, "MAO6_dd2.rds")

Writeout sequence table

R@ st2 <- makeSequenceTable(dd2); dim(st2)

-----

Fasta形式へ変換

% cat dada_keep75v2.txt | sed "s/^/>ASV_/" | perl -pe 's/\s/\n/g' > MAO6kp75_dada2.fas

アラインメント(F/Rの方向を一方へ統一: --adjustdirection)

% mafft --thread 4 --adjustdirection MAO6kp75_dada2.fas > MAO6kp75_dada2_ali.fas