GGR Newsletter
March 2025
GGR Newsletter
March 2025
Brendan Gallagher, Ph.D.
March 2025
The last few years have seen massive leaps in computational power and storage capacity—advances that would have been unfathomable just a decade ago. The idea of going online and purchasing a 1TB microSD card for under $100 would have absolutely broken my brain as a child. However, these advances still don’t solve one major challenge: existing storage technologies are inherently impermanent.
Solid-state drives (SSDs) and magnetic tape, which are often used for archival data storage, have finite lifespans. Over long enough time scales, SSDs degrade due to charge leakage in flash memory cells, while magnetic tapes and hard drives suffer from demagnetization, material decay, and mechanical failures. For true archival storage of information, there is a push to ditch electricity altogether in favor of one of the oldest known information storage mechanisms: DNA.
DNA is one of the most stable information storage media known, remaining intact for thousands of years under the right conditions1. Once you decide to use DNA to store information, the “how”, at least at a basic conceptual level, is fairly straightforward. The simplest scheme maps binary code onto DNA sequences, with two nucleotides representing 1s and the other two representing 0s. More sophisticated encoding schemes exist, but this method has been successfully used to store everything from the works of Shakespeare to a GIF of a running horse.
The practical limitation, as ever, is cost. DNA synthesis is expensive, and traditional encoding methods require sequencing an entire strand to retrieve even a small piece of information. In 2013, one DNA storage method was estimated to cost $12,400 per megabyte to write and $220 per megabyte to read. Costs have since improved but remain a major barrier to widespread adoption.
Catalog, a Boston-based biotech company, is tackling this problem by relying on pre-made “component” DNA chunks. Instead of synthesizing every strand from scratch, these components can be mass-produced and assembled dynamically through ligation. In this system, a terabyte of data, for example, could be represented using just 120 pre-synthesized DNA components2. This modular “combinatorial assembly” approach significantly reduces costs and enables scalable data storage.
The cost effectiveness of DNA data storage depends on the cost of reading the data as well. For true long-term archival storage, reading is minimal, and the longer the data is stored without accessing, the more cost-effective it is. But when frequent access is required, strategies exist to reduce retrieval costs without direct improvements in sequencing.
Let’s say you wanted to encode a database of all Pokémon along with some basic attributes in DNA. If every Pokémon was stored as one continuous strand, from Bulbasaur to Melmetal3, retrieving information about Pikachu would require sequencing the entire dataset and manually extracting the relevant portion. A more efficient approach would be to break the data up into smaller pieces and store it in physically separate bins, perhaps one bin for every ten Pokémon. A separate index bin would contain metadata mapping Pokémon names to their respective bins. Now you could query the index for "Pikachu," determine that its information is in Bin #3, and pull out the relevant strand with an affinity probe. This combined strategy of combinatorial assembly and indexing has made the storage cost-effective enough that Catalog encoded a book that is being sold for only $604.
The outlook for DNA as a digital storage medium is incredibly promising, and many hurdles have already been cleared. With synthesis and sequencing costs continuing to decline, along with creative strategies like combinatorial assembly, the question is no longer whether DNA can store our data, but rather what else we can do with it once it’s there.
Catalog, for their part, is now putting a heavy focus on DNA computing. Their index-based search was a starting point, but they have their eyes turned toward more sci-fi applications. DNA computing has massive parallelization potential. Since DNA molecules can interact in solution simultaneously, operations can occur in parallel rather than sequentially like in conventional computers. In theory, this could lead to breakthroughs in machine learning at scales beyond what traditional hardware can achieve.
DNA-stored digital information may also have uses beyond computation. Iridia, a California biotech, is developing what they call the Molecular Avatar, a DNA-based “ink” that can be embedded into physical items to create unique identifiers—essentially a molecular version of the blockchain. Some applications are obvious, like use in paint, but they also claim potential for sports memorabilia, luxury goods, glass spirits bottles, and even defense. While it’s clear how this provides a molecular identifier, some key details remain uncertain. It’s not obvious how well the ink holds up over time or how identification, which presumably involves sequencing the ink’s molecular signature, can be done without damaging the product itself.
Unless you’re purchasing DNA-encoded luxury goods5, DNA data storage and computing likely won’t be part of your daily life any time soon. But it’s real, it’s happening, and it’s just getting started.
Footnotes:
1 – This is why George Church, a leading proponent of DNA data storage, has also proposed resurrecting the woolly mammoth by retrieving ancient genetic sequences.
2 – Arranged into 12 groups of 10, and assuming each component represents one byte of data, there are 1012 unique sequences.
3 – I’ll be deep in the cold, cold ground before I recognize Gens VIII and IX.
4 – This includes a physical copy of the book which is being sold on its own for $25. While this implies a cost for the DNA-encoded book of $35, it’s highly likely that this isn’t being priced to make a profit, and the cost is being brought down with the marketing budget. Nevertheless, according to Wired, the cost of encoding and producing 1,000 copies of the book was in the “low thousands of dollars”, making this an impressive proof of concept.
5 – If you are, please reach out, I would honestly take a nice DNA-encoded sweater despite my confusion on the process.
Illustration by Mary Cundiff