Why are proteins so hard to work with?

Brendan Gallagher, Ph.D.
June 2025

This question may be surprising to some, especially for those who regularly work with the notoriously finicky RNA. In many ways, proteins are dream molecules for biologists: they’re the physical effectors of most biological function, and they’re impressively stable. In post-mortem tissue, proteins remain detectable for days—compared to RNA’s mere hours—and when fixed in formalin (FFPE), they can persist for decades. And yet, despite their central importance, so much of our biological toolkit remains focused on DNA and RNA.

As the predominant structural and functional operators in cells, one of the most basic and critical questions you can ask of a biological system is: What proteins are here? There are a lot of good ways to start answering this question, but they all have some pretty serious drawbacks. The current state of the art is identification with liquid chromatography with tandem mass spectroscopy (LC-MS/MS), which can identify thousands of proteins from a tissue sample. The big catch, though, is that you need to know what you’re looking for first (proteins are identified by matching their signatures to a reference database) and often identifies proteins from one or two short peptide sequences, so differentiating between subtle isoforms isn’t always possible.

Compare that to Next-Generation Sequencing technologies which can take a solution of entirely unknown DNA or RNA composition and read out the exact composition down to the last nucleic acid with between 90-99.8% accuracy, depending on the method used.

Given how much easier it can be to work with nucleic acids, it can be tempting to use RNA expression, for instance, as a proxy for how much of its corresponding protein is present. For many questions, this is completely suitable, especially when all you need is a ballpark estimate (RNA transcripts actually have a shockingly low correlation to their respective protein concentrations), or when you need single-nucleotide sensitivity and high throughput. There are many other times, however, such as investigating whether a new drug was able to restore the function of a pathological protein, when nothing short of direct detection will do.

Given all this, there is a high potential impact for any technology developed that makes studying proteins as easy as nucleic acids, with many academic and industry scientists vying for the “holy grail” of protein detection.

To understand why this has been such a challenge, it helps to first appreciate why asking the analogous question of nucleic acids is so straightforward. DNA and RNA operate in a language designed for reproduction: a short, four-letter alphabet, where every letter has a direct complement. This makes them easy to copy, read, and write.

Illumina sequencing, a widely used method today, relies on this base-pair complementarity. DNA strands are immobilized on a flow cell and the reverse strand extended one base at a time using fluorescently labeled nucleotides. The order in which the fluorescent signals appear reveals the sequence of the original strand.

Nanopore sequencing, another major approach, works in a completely different way. It relies on the uniform physical properties of nucleic acids: there are only four bases, and they all have roughly similar shapes and charges. A voltage is applied across a membrane containing tiny pores, drawing negatively charged strands through one nucleotide at a time. As each base moves through the nanopore, it causes a unique disruption in the current, which can be decoded to reveal the sequence in real time.

These techniques only work because nucleic acids are chemically consistent and structurally predictable.

Proteins, by contrast, are written in a far more complex language, optimized for function, not replication. Their alphabet has 20 letters, each with different shapes, sidechains, and chemical properties. Some are positively charged, others negatively, and many are neutral. This variability makes it much harder to coax a denatured protein through a nanopore in a consistent way. That variability, which is precisely what makes proteins so versatile and

powerful, also makes them much harder to read.

Illumina-like sequencing doesn’t work for proteins, either. As the “end of the line” in the central dogma, proteins are made to do their job and then be degraded. There are no “complementary” amino acids that the technique would require. If a cell wants more protein, it’s translated from RNA, not directly copied from the protein. This translation is a one-way street, as well. There isn’t a way to take an amino acid sequence and turn it back into RNA or DNA that’s easier to work with. It’s less like translating Spanish to English and more like trying to turn smoke signals back into fire.

That doesn’t mean all hope is lost. Many believe we’re on the cusp of a true Next-Generation Protein Sequencing breakthrough. Connecticut-headquartered Quantum-Si released the first commercial protein sequencing platform in 2023, relying on sequential fluorescent protein binding and electronic readout with proprietary semiconductor chip for sequence decoding. This allows the sequencing of proteins one amino acid at a time, similarly to nucleic acid-based technologies, but is still currently limited to about one-third coverage of the proteome.

Another company, Nautilus Biotechnology, is developing its own technology aimed at scalable, high-resolution protein sequencing—though their combined microarray and machine learning approach has yet to be released. Like mass spectrometry proteomics, the Nautilus system would be limited to detecting proteins in a reference database, making it unable to tell if there was a truly novel protein present. Other researchers are exploring clever workarounds for nanopore-based protein sequencing, like attaching proteins to DNA shuttles to facilitate more uniform movement through the pore.

Whether these methods or another finally brings us into the age of proteome sequencing, it seems like it will happen relatively soon. When it does, we’ll enter a new age of biological research.

Page updated

Google Sites

Report abuse