The Idea

Rare disease detection so lowcost, we can't afford not to do it.

In one of Jules Verne's Extraordinary Voyages the children face a challenge. Their father, Captain Grant, is shipwrecked and they have only the latitude of his location. This forces them on an expedition around the world — sailing the 37th parallel south — in a long, arduous search. So too is the search for rare disease.

Tests to diagnose most rare diseases exist. However, finding the disease before irreversable symptoms set in is a challenge; there are no signs no phenotype to guide you. Brute force is one solution. Test everyone, one by one. This works. But it is laborious and, at least for now, very expensive.

Then what if you combine samples; testing groups of 100 individuals at a time to cut costs? This sort of works, but does not directly reveal carriers. This "Captain Grant Problem" (knowing only one coordinate) forces retesting across every individual in each positive group. Of course, what the children of Captain Grant were missing were longitude an extra coordinate that would pinpoint the location of their father. That's exactly what our method achieves.

Double-Batch Sequencing (DoBSeq) begins by arranging DNA samples from many individuals into a grid. The samples are organized kind of like a Microsoft Excel spreadsheet; columns have letters, rows have numbers. 

Here's a 48 by 48 grid, covering 2,304 individuals:

Step one is batching all sample according to the rows, creating 48 "row batches". Once this is done, each batch is a single tube containing DNA from 48 individuals.

The process looks like this:

Step two batches again using DNA from the same 2,304 individuals. Only this time batches are made according to the columns, making 48 "column batches". Again each batch contains DNA from 48 individuals.

Now that the 2,304 DNA samples are double batched, step three is DNA sequencing. Each of the 96 batches is deeply seqeunced using conventional next-generation sequencing.

Rare genetic variants associated with severe genetic diseases are found. One time in a row batch and one time in a column batch. This takes us to step four, the main innovation of our method. Bioinformatically, variants are cross-referenced, pinpointing a specific variant back to a unique individual.

Here's what that looks like for the red variant, an RB1 single-basepair deletion at the 489th nucleotide of the gene:

To sum up, among a large group of 2,304 individuals, we've now found a pathogenic RB1 variant that causes childhood cancer. Instead of using 2,304 tests to acheive this, we used just 96 tests (>20-fold reduction). This takes screening for pathogenic RB1 variants (present in 1 out of 15,000 newborns) from cost-prohibitive to highly cost-effective. 

In the end, the true power of our method lies in the fact that 100s of diseases can be screened for at once, detecting rare variants across 100s of genes in the whole group at once.

Here, shown for the RB1 variant as well as for pathogenic HNF1A and GLA variants:

Finally, because many different variants in the same gene lead to the same disease, multiple variants causing the same disease can be detected at once:

Using this idea 1,000s, perhaps even millions of children and adults could be screened for variants we already know cause severe, preventable disease. For now we have proved that the method works on at medium scale (see scientific publication). The PREDiSPOSED project's plan is to scale fully.