why should I use DADA?

DADA has several advantages when denoising amplicon sequencing data

    • We have shown DADA to be more accurate than other approaches, and especially to be more sensitive to the detection of very close, but genuine diversity. This is a consequence of DADA's explicit use of both read abundance and the distance between sequences.
    • It is based on a clear probabilistic model, and consequently the parameters that determine how conservative or liberal it behaves have a simple interpretation. Further, when you use DADA, you will be able to find out not only which sequences DADA thinks are real, but also which sequences you would have missed even if they were present (as a function of their read abundance and distance from other sequences).
    • The error parameters of your data set will be inferred by DADA, so there is no need to sequence control data or trust that someone else's control data is representative of your setup.
    • DADA is designed to be as fast as possible and minimize the number of (computationally taxing) sequence alignments that must be performed.

When might DADA not work?

    • DADA's assumes that each error is statistically independent of all other errors. In reality, some data sets do not live up to this assumption. In particular, when there are many reads generated for each DNA molecule in your sample that seeds PCR, then there will be some amount of non-independence between errors on different reads. We discuss this problem in the manuscript, and have done (unpublished) work to correct our statistics for this effect. But for now, these are not a part of DADA. Starting with a larger amount of sample, where possible, is the best way to get rid of this problem.
    • DADA does not model indel (insertion and deletion) errors, handling them entirely by alignments. For studying very indel-rich regions such the ITS, DADA is still somewhat untested.
    • DADA has not yet been tested on the Illumina platform, although should in principle be effective. If you attempt to use it to denoise Illumina data, let us know what you find!