What are these COVID-19 PCR tests everyone's buying?

28th March 2020

Rohan T. Ranasinghe


*Note* This page is about the nucleic acid swab tests that have been used in the vast majority of COVID-19 diagnoses so far (as at 28th March 2020). These tests are neither the antigen nor the antibody tests that have received a lot of UK media attention in the last few days.

Health services the world over are stockpiling “RT-PCR tests” like a panic buyer in a supermarket toilet paper aisle. The US Government spirited half a million swabs out of Brescia, while Mossad procured a hundred thousand kits by… some means. But what are these tests and how do they work? The answer involves strange bacteria that live in conditions that would scald your skin, a Nobel Laureate that said he made his major breakthrough thanks to tripping, and tiny molecular probes that work like a 1980s arcade game.

Short version

At the moment, nearly all health services are diagnosing COVID-19 with something called an RT-PCR test. “RT-PCR” stands for Reverse Transcription Polymerase Chain Reaction. It’s a type of genetic test, which looks for certain nucleic acid (DNA or RNA) sequences. These types of test detect species-specific genes (stretches of nucleic acid that encode the sequence of a protein). We can use them to look for mutations in people’s genomes, diagnose infections like Chlamydia and hepatitis C, or detect bioterrorism agents on the post. The COVID-19 tests look for RNA that appears only in the genome of the particular coronavirus (SARS-CoV-2) that causes the disease. Because the coronavirus hijacks your cells to make copies of itself, finding its RNA in someone's throat tells you they have an active infection.

How does it work?

There are really two ways to work out whether a particular DNA or RNA sequence is present in a swab of cells from someone’s throat. We can read every sequence in the sample, one letter at a time (this is DNA sequencing), or we can just search the sample for the specific order of letters we’re interested in (a sort of molecular Googling exercise). We need both to keep on top of a pandemic.

The RT-PCR tests happening in hospitals take the Googling approach to tracking down the coronavirus. This is faster and cheaper than sequencing, but we can only do the search if we already know what to look for, and we only know that by sequencing the virus’s RNA. We need sequencing to find a series of letters that appear in the coronavirus genome, but not of anything else you're likely to find in someone's throat.

Sequencing is now incredibly fast: the first draft of the coronavius genome was uploaded on 5th January, less than two weeks after the first (recognised) hospitalisation. It had 30,473 letters (about a hundred thousand times fewer than our genome) and you can read the whole thing if you want. A week later, bespoke RT-PCR kits were arriving in Wuhan.

Genome centres don’t hoard their data: as of 28th March, 2,084 sequences (from different patients, hospitals and countries) were publicly available. Scientists are constantly using these, to track mutations of the virus, work out whether the RT-PCR tests are still looking for the right letters, and even analyse whether it might be an escaped lab strain (current consensus: hard “no”). Once we know the sequence of letters to look for, we need a way to find them. Even the most in-demand swabs might extract only a few thousand virus genomes from a patient, and the equipment in hospitals isn’t sensitive enough to detect so few molecules.

Sequencing and hybridisation both rely on what is called Watson-Crick base pairing. DNA has four bases (A, G, C and T), but usually only two base pairs: A prefers to pair up with T, while G likes C best. Watson and Crick figured this out and it solved the mystery of the structure of DNA. The double helix is made of two strands of DNA; where you find an A in one strand, you’ll find a T in the other (likewise for G and C). Sequencing asks each base in order which its preferred partner letter is, but RT-PCR uses a chemically synthesised DNA strand (whose sequence we therefore know) and only asks whether its partner sequence (of ~20 letters) is somewhere in the sample: this is called hybridisation.

Once sequencing tells us which letters to look for, we make the partner sequence using some of the most efficient organic chemistry ever developed. A series of chemical reactions (pioneered by Marv Caruthers and Serge Beaucage at the University of Colorado in the early 1980s) grow a DNA strand on a tiny glass bead. Once assembled, we chop the strand off the bead and purify it. Today, it’s totally automated (you just type in a sequence you want and a machine mixes the chemicals in the right order to make it) and each reaction is usually over 99% efficient. Perhaps a victim of its own success, the importance of DNA synthesis in genetic analysis is often overlooked as 'routine', precisely because it works so well.

PCR is a way to copy a specific piece of DNA lots of times. It runs in cycles and, like a virus spreading through a population lacking immunity, it amplifies exponentially. This means one cycle makes two copies of the original DNA, and 8 cycles give 2x2x2x2x2x2x2x2 = 256 copies. Usually, clinical tests run for around 40 cycles, which in principle generates just over a billion copies of each DNA that was present at the start (1,073,741,824, to be precise). PCR only works on DNA, so it’s good for amplifying bits of our genome (say, if you want to find out whether someone has a genetic disease like cystic fibrosis, or carries one of the BRCA mutations that raises their risk of getting breast cancer). But the coronavirus genome is made of the less-stable nucleic acid, RNA, which we humans only use for temporary storage of bits of our genetic code (usually when we need to export information out of the nucleus in the middle of our cells, where we cloister our genome). This is where the “RT” bit comes in.

In RT-PCR, an enzyme (a protein that speeds up chemical reactions) called Reverse Transcriptase copies a section of RNA into its complementary DNA (cDNA). These enzymes were originally discovered in retroviruses like HIV (but not coronaviruses), that use reverse transcriptases to hide their RNA genome inside their host’s DNA. Now that we know what they’re made of, we can make these enzymes to order and use them for own ends, like tracking down viruses. In the COVID-19 tests, the reverse transcriptase takes a bit of information in the coronavirus’s RNA genome and converts it into DNA that PCR can run its own stockpiling program on, stacking up billions of identical copies.

Kary Mullis – PCR’s inventor – was one of the more idiosyncratic Chemistry Nobel laureates. A Californian surfer, he became a proponent of conspiracy theories around HIV/AIDs and climate change, and was hired as an expert witness by OJ Simpson’s defence team (though never called to the stand). The most powerful amplification technology for producing large quantities of DNA was invented on a moonlit drive through the Californian mountains, but thanks to LSD-driven mind expansion, according to its inventor. Bored of doing manual DNA synthesis at the Cetus corporation in the US (in the days before Caruthers' chemistry was automated), Mullis designed an incredibly simple way of making large quantities of DNA from a few initial copies (see above).

Mullis may have been the architect of PCR, but at the molecular level, the enzyme Taq Polymerase is the star. PCR relies on a dance of DNA strands constantly exchanging partners in each round of the reaction, without which copying wouldn’t happen. Once a template strand couples up with a new primer, a polymerase enzyme copies the template by sticking building blocks called dNTPs (one for each of A, G, C and T) on the end of the primer in the order dictated by the template. But long DNA strands need heat to persuade them to split from their partners, which gave Mullis a problem in the early incarnations of PCR. While DNA can handle being in boiling water, most enzymes can’t; the key player, polymerase, kept conking out in every cycle. PCR didn’t take off.

A couple of years after first telling the world about PCR, Mullis and his co-workers reported a minor modification: they still used a polymerase, but this time from a bacterium called Thermus aquaticus, which had been discovered in Yellowstone National Park in the 1960s. Thermus aquaticus (Taq for short) lives in hot springs, and grows best at temperatures of ~65 °C. It’s made of the same sorts of molecules as us, but they’re engineered to withstand high temperatures much better than ours. Taq’s polymerase enzyme can copy DNA without succumbing to high temperatures, which made PCR much more practical. PCR revolutionised biology, Taq Polymerase won Science magazine’s first award for Molecule of the Year in 1989 and Mullis got his Nobel in 1993.

As far as molecules go, DNA (even the short sections usually made by PCR) is big, but even a billion of them comprise far too little matter to see with the naked eye. We need a way of keeping track of the amplification process. RT-PCR uses small DNA probes that are made by chemical synthesis and modified to fluoresce; that is, they absorb light of one colour and emit light of a different colour, which is what tonic water does (whether mixed with gin or not) under UV light. The most widely-used probes in RT-PCR are called TaqMan probes. They get chewed up each time a DNA strand is copied, which releases a fluorescence signal like a molecular flare. The probe is specific to the COVID-19 genome, so if the signal builds over the PCR, it tells you that the virus is there, and how quickly this happens tells you how much virus.

Most of the technology used in RT-PCR is quite old: the fluorescent dyes we use are close relatives of molecules made in the early 20th century, PCR was invented in the mid-1980s, and the DNA probes - conceptually the newest component - were invented nearly 30 years ago. It takes time to make RT-PCR kits for a new pandemic not because new technology need to be invented, but because the components need to be tailor made for the new contagion, and on a massive scale.


About me: I’m a postdoc researcher in the Department of Chemistry, University of Cambridge and Bye Fellow and Director of Studies in Natural Sciences at Sidney Sussex College, Cambridge. I did my PhD with Prof. Tom Brown at the University of Southampton on new probes for PCR. None of them were quite as good as TaqMan. Tom co-founded Primerdesign, one of the companies that is supplying Public Health England with RT-PCR kits. In Cambridge, I work with Prof. Sir David Klenerman. Dave is a co-inventor of the very fast DNA sequencing method that now belongs to Illumina, which is the most widely used method in Genome Centres. The first SARS-CoV-2 genome (along with many, many others) was sequenced using Illumina sequencing.

https://twitter.com/RTRanasinghe

https://www.researchgate.net/profile/Rohan_Ranasinghe/