My research interests focus on making biology predictable, in a similar way as physics or chemistry. Ever since my early school days, my interests drew me to biology and physics: I was fascinated by the ability that mathematics and physics afford us in understanding and engineering things around us, and was keen on using similar approaches on biology. For example, during the Permian period there were giant dragonflies (~75 cm) because oxygen levels were much higher than now (~30% vs 20%) and insects get their oxygen through diffusion, not lungs. How could one generally predict the maximum size of living beings? What cellular or structural details condition this size? Similar fundamental biological questions roamed my mind: Why do we spend about one-third of our lives sleeping? What determines the lifespan of a living being?
To address the challenge of making biology predictable, my career has involved a multidisciplinary approach, combining research efforts in diverse fields: macroecology, microbial ecology, Bose-Einstein condensates, Path Integral Monte Carlo sampling, metagenomics, functional genomics, machine learning, microfluidics, database design, metabolic flux analysis, synthetic biology, retrobiosynthesis, and artificial intelligence for the advancement of science. My research trajectory began with explaining universal scaling laws in ecology before moving to microbial ecology, which offered systems more amenable to manipulation. I used a variety of state-of-the-art techniques to study microbial systems, and eventually transitioned into synthetic biology, since the systems under study (biological cells) could be more efficiently controlled, and manipulated at the genetic level. Within synthetic biology, I have found that the integration of Artificial Intelligence with robotics is the key to finally achieving the predictive capabilities needed to effectively guide biological engineering (see below for details). These advances are paving the way for an era where building with biology (biomanufacturing) will reach the same level of predictability and reliability as manufacturing based on physics or chemistry, the foundational technologies of modern society.
In this undertaking, I have repeatedly found that emergent properties and complexity are critical to understanding biology. Emergent properties arise from the tight connection and the sheer amount of components of a system rather than the particular characteristics of each of these components, and complexity studies how large groups of interacting, simple components spontaneously self-organize to create complex global patterns and behaviors. Both concepts challenge a purely reductionistic view of biology, a perspective that I believe must be overcome to truly achieve predictability in the life sciences.
The most appealing examples of complexity are, in my opinion, found in biology. Hence, although I started my Ph.D. in theoretical physics under the direction of Nigel Goldenfeld, my initial research delved into an ecological puzzle: the Species Area Relationship (SAR). This striking regularity, known for over a hundred years, states that as larger and larger areas of an ecosystem are sampled, the number of species contained in those areas grows as a power law with a coefficient close to one quarter (S~A1/4). This pattern appeared to be universal, applying across all forms of life, from bacteria and plants to mammals and birds.
We demonstrated that this universal trend is a robust consequence of species' spatial and abundance distributions. Specifically, the S~A1/4 relationship is expected when individuals of a species are highly clustered and their abundance distribution approximates a lognormal (a frequently observed phenomenon). Therefore, this trend is rather independent from the particular dynamics of the individuals in the ecosystem (competition, dispersal… etc), leading to its apparent universality.
Species-area accumulation curve from Garcia Martin et al. As long as individuals cluster and the abundance distribution is log-normal, you get a power-law Species-Area Rule.
Travertine terraces harboring the microbial communities studied in Garcia Martin et al and Veysey et al.
I was initially captivated by the complex examples of biocomplexity in macroecology but also discouraged by the difficulty of conducting real-time experiments in that setting. This led me to pivot my focus to the rapidly growing field of microbial ecology. My initial work in this area involved a close collaboration with geomicrobiologists, where we investigated the influence of microbial ecosystems on terrace formation. A central question was whether microbes were responsible for these formations and if they could serve as indicators for potential landing sites on Mars. In this project, I developed and applied new and existing methods for estimating microbial abundance and diversity using 16S rRNA gene surveys.
Vortex profile for a rotating Bose Einstein Condensate from my thesis.
However, despite the utility of molecular biology techniques, I recognized that 16S rRNA data could only provide a descriptive catalog of a microbial ecosystem. To develop predictive quantitative models of ecosystem complexity, a deeper understanding of microbial metabolism was necessary. This awareness prompted me to concentrate on the emerging field of metagenomics. For my postdoc, I chose to join the group of Philip Hugenholtz, a pivotal figure and co-author in one of the first two seminal metagenomic papers.
Convinced that many of the tools commonly used in physics would have future applications in biology, I devoted the rest of my Ph. D. to study Bose Einstein Condensates using Path Integral Monte Carlo simulations, a technique which involves no uncontrolled approximations in dealing with many-body interactions.
Enhanced Biological Phosphorus Removal (EBPR) bioreactor (taken at Trina McMahon’s lab at UW Madison) and metabolic map obtained from studying its metagenome.
Costa Rican termites (photo courtesy of David Gilbert), and model of nutritional symbiosis in their hindgut.
During my postdoc with Philip Hugenholtz's Microbial Ecology group at the Joint Genome Institute (JGI), I targeted a system that I thought fulfilled the required characteristics for successful modeling: Enhanced Biological Phosphorus Removal (EBPR) sludge. This microbial community, widely used in wastewater engineering, was an ideal system for modeling because one bacterium was highly dominant, accounting for approximately 80% of the community. This dominance made the system similar enough to a single culture that established single-culture techniques could be adapted easily, while still representing a complex microbial ecosystem with about ~15 distinct relevant species.
My time at JGI was highly productive. We studied and published the first metagenome for the EBPR sludge, and was part of the first termite gut metagenomic study. These studies gave me the opportunity not only to learn how to analyze metagenomic data sets, but also become acquainted with machine learning and microfluidics, help develop metagenomic databases, and study the impact of phages in microbial communities. My experience at JGI instilled in me a profound appreciation for the collaborative, team-based science characteristic of national laboratories.
While the metagenomic study of this microbial community yielded a detailed blueprint of the metabolic pathways present, this knowledge was descriptive in nature, rather than predictive. Questions such as: “which species will become dominant if a given condition (e.g. pH, or acetate availability) is altered?”, or “what will be the biochemical impact of a community on its environment?” are not answerable from just the knowledge of the genomes (or even transcripts, proteins or metabolites) present in a microbial community.
We soon realized that the next step in terms of developing quantitative predictive models involved studying metabolic fluxes. Prediction of metabolic fluxes (i.e. the rate at which molecules proceed through a reaction per unit of time) for all metabolic reactions in a given organism entails knowledge of growth rates (from the carbon fluxes to biomass) and metabolite excretion (from outgoing metabolite fluxes).
To pursue this, we established a collaboration with Jay Keasling, whose group operated an EBPR bioreactor and had previously tried (with limited success due to the lack of metagenomic, metatranscriptomic or metaproteomic information) to produce a Flux Balance Analysis (FBA) model of a this system. During this collaboration, I became acquainted with an accurate technique (13C Metabolic Flux Analysis) to measure metabolic fluxes for pure cultures. This technique seemed like the perfect experimental validation tool for EBPR flux models, requiring only proper adaptation to account for the system's community nature.
Metabolic fluxes for Shewanella spp. from Tang et al.
Figure 7: JBEI logo and JBEI building.
I joined the Joint BioEnergy Institute (JBEI) as an LBNL staff scientist and group lead shortly after its initial funding, following my postdoc. My role focused on developing predictive quantitative models for the microbial hosts used at the institute, including E. coli, S. cerevisiae, and S. acidocaldarius. Years later, I would also join the Agile Biofoundry (ABF), fulfilling a similar role (quantitative modeling) but with a stronger emphasis on a close interaction with industry and accelerating the Design-Build-Test-Learn cycle. Developing quantitative predictive models for pure cultures seemed a good stepping stone for predicting more complex biological systems, and synthetic biology allowed us to modify the genetic code determining cell behavior.
From early on, it became clear that metabolic engineering had remained a collection of elegant demonstrations rather than a systematic discipline with generalizable methods. This limitation significantly hampered the progress of what was then considered one of the top ten emerging technologies.
Consequently, my group focused on developing general tools and methods that could be systematically applied to any host, pathway or final product. For example, we created methods to measure fluxes for comprehensive genome-scale models by using 13C labeling experiments, demonstrating that these precisely determined fluxes can enable quantitative predictions. We also showcased the practical utility of these flux measurements for better understanding production metabolic profiles and successfully making recommendations to increase biofuel production. Extending beyond single organisms, we created methods for measuring fluxes in microbial communities. For the common scenario lacking accurate mechanistic models to guide metabolic engineering, we developed techniques using -omics data combined with data mining and machine learning to increase biofuel yields. To supply the necessary large datasets for machine learning, we also contributed to the development of microfluidic chips to automate synthetic biology protocols. These techniques have been successfully applied to diverse synthetic biology challenges, ranging from the synthesis of jet fuels to the production of molecules imparting "hoppy" flavor to beer.
Quantitative predictions for 48 measurements enabled by combining 13C labeling data with genome-scale models.
Furthermore, we invested significant effort in developing foundational software tools to enable various predictive modeling techniques. One key tool is the Experiment Data Depot, a repository of standardized experimental data that housed JBEI's data, making it usable for testing modeling approaches. Another tool is ClusterCAD, a computational retrobiosynthesis platform designed to streamline the process of designing Polyketide Synthases (PKSs) variants to achieve a desired molecule. We believe this collection of tools and methods provides a solid foundation for the progressive development of a more predictive synthetic biology.
Microfluidic chips allowed us to miniaturize and automate synthetic biology protocols.
The Automated Recommendation Tool (ART) is a versatile machine learning tool adapted to the needs of synthetic biology and provide predictions accurate enough to effectively guide biological engineering.
It soon became evident to us that the future of predictable synthetic biology would need to pass through Artificial Intelligence (AI) and automation/robotics. We made significant efforts to explain and review the (yet unknown) field of AI to metabolic engineers, and developed new machine learning (ML) tools specifically targeted at the needs of synthetic biology. We also realized the importance of automating bench work to provide the large amounts of high-quality data AI needed to be effective, and invested time and effort in microfluidics tools and the concept of self-driving labs for synthetic biology.
The combination of AI and automation led to significant successes: e.g. in designing pathways, in using CRISPRi for host engineering for the production of improved energy density aviation fuels, in finding surprising growth media components for improved production, and in predicting microbial metabolism with an accuracy approaching the predictive capabilities found in physics and chemistry. We are now leveraging all these tools for protein engineering and general biomanufacturing needs. Simultaneously, we keep working on combining mechanistic models with AI, engineering microbial communities, and retrobiosynthesis for suggesting synthetic biology targets.
ART leveraged machine learning to recommend which genes to downregulate through CRISPR interference (CRISPRi), and increase production ~500%.
My current research concentrates on leveraging AI for scientific purposes, combining it with automation and robotics, and demonstrating these capabilities in synthetic biology. We work on adapting and developing AI methods to meet scientific needs: e.g., sparse data sets, feedback loops with experimental work, and combining mechanistic insight with the predictive capabilities of machine learning. We also intend to merge AI with robotics and automation to automate the creation of scientific knowledge through self-driving labs. We aim to miniaturize and automate molecular biology processes through microfluidics, and further develop and instantiate the concept of self-driving labs for synthetic biology. Furthermore, we expect to further demonstrate how AI and robotics can disruptively improve the scientific process by helping synthesize new biomaterials, fuels, and therapeutic drugs. Finally, we hope to continue to show that biology can be as predictable as physics and chemistry by tapping the right approaches.
The Quantitative Metabolic Modeling group combines research in mathematical modeling with machine learning, synthetic biology and automation in order to enable predictive biology for the benefit of society.