Estimation of the number of DNA molecules prior to ultra-deep sequencing

Gallet et al (2017) Journal of virology

Until recently, the study of natural viral populations was technically limited. The coming of next-generation sequencing (NGS) technologies has opened this field of research by vastly improving the detection and quantification of genetic variability in viral populations. Ultradeep sequencing (UDS) also made it possible to investigate the intraspecific and intrahost variations of viral populations. The high sequencing coverage provided by UDS enables the detection of mutants at low frequencies and the quantification of them with precision. It is often tempting to believe that the coverage of our NGS analysis reflects the number of individuals present in the sample. But for most methods, the first step in the NGS pipeline is the amplification of sequences from the collected samples through either PCR, reverse transcription-PCR (RT-PCR), or rolling-circle amplification (RCA). In a recent study, McCrone and Lauring (2016) showed that the sensitivity of single-nucleotide variant detection was limited at low nucleic acid concentrations, probably because of the small number of template molecules, analogous to population bottlenecks occurring at the amplification and/or library preparation step. Bottlenecks can seriously reduce the detection of this variability (at the intraspecific and interspecific levels), depending on how severe they are.

Illustration of the genome molecule population in the sample (left) and actually sequenced after NGS (right)

Thus, the NGS coverage does not correspond to the size of our population sample, and estimating the sensitivity of detection of our method as well as its precision requires estimating the number of molecules that were amplified during the NGS pipeline. We did so by tagging genomic segments of the multipartite Faba bean necrotic stunt virus. We introduced 20 bases long sequences in the N and S segments in order to have 2 distinct N and S mutants. These markers were always introduced in the same region of the segments, and also served as a primer for qPCR detection. We estimated the relative frequencies of these markers before and after a regular Rolling Circle Amplification (RCA) and used the frequency variations to estimate the bottlenecks undergone by these molecule populations during this amplification.

Our results showed that RCA respectively amplified 600 S segments and 3000 N segments. These numbers are no very large compared to the coverage of our UDS, and clearly show the limited detection of this technique. Improving detection level would probably involve the optimitation of the RCA step, by using higher concentrations of the Phi29 polymerase (the polymerase used in the TempliPhi kit.