Biology exists at a scale far beyond what we’ll ever be able to experimentally measure…
Suppose we want to understand all possible proteins — there are 20100 possible 100 amino acid proteins (and 100 amino acids is a pretty small protein!). If we synthesized just a single molecule of each protein, that would require a total protein mass of 3.9 x 1082 x mass of the Earth. Impossible! And we also want to understand all RNA molecules and their interactions with proteins…
If we want to understand biology at this scale, we’ll need to develop predictive sequence-structure-function models. We're especially interested in understanding how structure within human cells, at multiple length scales — from molecular conformations, to formation of higher-order assemblies, to subcellular localization — impacts molecular and cellular function, and how this goes awry in disease.
The Kappel Lab aims to develop these models by combining high-throughput experiments — to make structural and functional measurements for as many protein and RNA sequences as possible — with machine learning and computational biophysical methods. Our goal is to harness these models to understand how RNA and proteins contribute to cellular function and disease-related dysfunction and to design novel functional RNA and protein sequences for therapeutic and biotechnology applications.
Deciphering the condensate code: How do protein and RNA sequences dictate intracellular condensate formation?
Condensates, membrane-less subcellular assemblies including nucleoli, nuclear speckles, and Cajal bodies, provide a fundamental mechanism of subcellular organization; they are involved in RNP complex assembly, gene regulation, stress response, and viral replication. Aberrant condensate formation is linked to neurodegenerative diseases and cancer. Understanding the rules of condensate formation will enable novel ways to modulate cell behavior and disease progression. A primary protein or RNA sequence must somehow encode its ability to form condensates – what specific sequence features confer this property? We recently developed CondenSeq, a high-throughput pooled imaging-based approach to efficiently characterize the propensities of thousands of protein sequences to form condensates in a single experiment. We are leveraging and building on this approach to systematically decipher the condensate code. Combining the resulting large-scale datasets with computational approaches, we are developing predictive models of condensates.
Characterizing and predicting intracellular RNA and RNA-protein complex structure and function
The structures of RNA molecules and their interactions with proteins are critical to processes such as gene regulation, splicing, and translation, and aberrant changes in these properties underpin many diseases. For most biological questions, we need to understand structure — spanning multiple length scales — in the cellular environment, which can deviate substantially from in vitro structure. Even within a cell, structures often vary between different subcellular compartments. We are developing new methods to predict and characterize RNA and RNP structure and function in cells, across subcellular compartments.