My research interests focus on mathematical modeling of biological systems: from synthetic biology, machine learning, systems biology, metabolic flux analysis, data visualization, scientific software development, ecology, to automation and complexity.
I am particularly interested in the appearance of emergent properties, i.e. properties which arise from the tight connection and the sheer amount of components of a system rather than the particular characteristics of each of these components.
Modeling universal scaling laws in ecology
The most appealing examples of complexity are found in biology, in my opinion. Hence, in spite of having enrolled in a theoretical physics program for my Ph.D. (under the direction of Nigel Goldenfeld), I started my research life studying an ecological problem: the Species Area Relationship (SAR).
This striking regularity, known for over a hundred years, states that as larger and larger areas of an ecosystem are sampled, the number of species contained in those areas grow as a power law with a coefficient close to one quarter. This pattern seemed almost universal, being applicable to all ranks of life, from bacteria to plants, mammals and birds.
We were able to prove that such a universal trend was a robust consequence of spatial and abundance distributions. Therefore, this trend is rather independent from the particular dynamics of the individuals in the ecosystem (competition, dispersal… etc).
Microbial ecology and Path Integral Monte Carlos
I was excited by the rich array of biocomplexity examples in ecology, but discouraged by the difficulty of performing experiments in real time in macroecological settings; so I decided to focus on the burgeoning field of microbial ecology. I joined one of my group’s efforts studying the impact of microbial ecosystems on terrace formation in collaboration with geomicrobiologists. A fundamental question was whether these terrace formations were produced by microbes, and could be used to pinpoint landing spots on Mars. In this project, I developed new methods and applied known ones for the estimation of abundance and diversity based on 16S rRNA gene surveys.
While appealed by the new molecular techniques being applied to microbial ecology, I realized that 16S data can only produce a descriptive picture of a microbial ecosystem. I believe that ecosystem complexity can only be properly studied when predictive quantitative models are available, and 16S data looked insufficient for this goal: we needed to understand community metabolism. I therefore concentrated my attention on the emerging field of metagenomics and, for a postdoc, decided to join the group of Philip Hugenholtz, a key player in one of the first two pioneer metagenomic papers.
Convinced that many of the tools commonly used in Physics will have future applications outside this field, I devoted the rest of my Ph. D. to study Bose Einstein Condensates using Path Integral Monte Carlo simulations, at technique which involves no uncontrolled approximations in dealing with many-body interactions.
Enhanced Biological Phosphorus Removal (EBPR) bioreactor (taken at Trina McMahon’s lab at UW Madison) and metabolic map obtained from studying its metagenome.
Costa Rican termites (photo courtesy of David Gilbert), and model of nutritional symbiosis in their hindgut.
Metagenomics at the Joint Genome Institute
At the Microbial Ecology group at the Joint Genome Institute (JGI) led by Philip Hugenholtz, I targeted a system that I thought fulfilled the required characteristics for successful modeling: Enhanced Biological Phosphorus Removal (EBPR) sludge, a microbial community widely used in wastewater engineering for phosphate removal. This community was an ideal target because a single bacterium was dominant, comprising ~80% of the abundance. Hence, this system is close enough to a single culture that techniques developed for single cultures can be adapted with limited modifications, while still a full microbial ecosystem with ~15 distinct relevant species.
During my postdoctoral stay at JGI, my collaborators and I studied and published the first metagenome for the EBPR sludge, and was part of the first termite gut metagenomic study. These studies gave me the opportunity not only to learn how to analyze metagenomic data sets, but also become acquainted with machine learning and microfluidics, help develop metagenomic databases, and study the impact of phages in microbial communities. My stay at JGI also infused me with a deep appreciation with the type of team-based science done at the national labs.
Metabolic Flux Analysis
While the metagenomic study of this system yielded a detailed blueprint of the metabolic pathways present, this knowledge was descriptive in nature, rather than predictive. Questions such as: “which species will become dominant if a given condition (e.g. pH, or acetate availability) is altered?”, or “what will be the biochemical impact of a community on its environment?” are not answerable from just the knowledge of the genomes (or even transcripts, proteins or metabolites) present in a microbial community.
We soon realized that the next step in terms of developing quantitative predictive models involved studying metabolic fluxes. Quantitation of metabolic fluxes (i.e. the rate at which molecules proceed through a reaction per unit of time) for all metabolic reactions in a given organism entails knowledge of growth rates (from the carbon fluxes to biomass) and metabolite excretion (from outgoing metabolite fluxes).
We therefore established a collaboration with Jay Keasling, whose group operated an EBPR bioreactor and had previously tried (with limited success due to the lack of metagenomic, metatranscriptomic or metaproteomic information) to produce a Flux Balance Analysis (FBA) model of a this system. During this collaboration, I became acquainted with an accurate technique (13C Metabolic Flux Analysis) to measure metabolic fluxes for pure cultures. This seemed the perfect addition to provide the experimental check for flux models of EBPR (after proper modification to account for the community nature of this system)
Quantitative modeling at the Joint BioEnergy Institute (JBEI)
It was at that time that JBEI was initially funded and I joined as an LBNL staff scientist to help develop predictive quantitative models of microbial metabolism for the hosts to be used at the institute: E. coli, S. cerevisiae and S. acidocaldarius.
From early on, it became clear that metabolic engineering had remained a collection of elegant demonstrations rather than a systematic discipline with generalizable methods, in spite of being on the trajectory to be considered one of the top ten emerging technologies.
Hence, we focused on developing tools and methods that could tackle general problems and be systematically used. For example, we have created methods to measure fluxes for comprehensive genome-scale models by using 13C labeling experiments, and showed that these precisely determined fluxes enable quantitative predictions. Furthermore, we have used the understanding of metabolic fluxes provided by this technique to increase biofuel production. We have also developed methods for measuring fluxes in microbial communities. For the (common) case that no accurate mechanistic models are available to guide metabolic engineering we have developed ways to use -omics data to increase biofuel yields by using data mining and machine learning techniques. In order to feed machine learning techniques the large amounts of data they require, we have also help develop microfluidic chips that automate synthetic biology protocols.
Furthermore, we invested a significant amount of time developing the basic software tools that will enable a variety of predictive modeling techniques.
One of them is the Experiment Data Depot, a repository of standardized experimental data that harbours JBEI’s data in a way that can be used for testing modeling approaches.
A second tool is Arrowland, an interactive, intuitive tool for visualizing -omics data (transcriptomics, proteomics, metabolomics and fluxomics). We think that these set of tools and methods provide a solid base from which to create and develop progressively more predictive models.
Moreover, we have also developed ClusterCAD: a computational retrobiosynthesis platform to streamline the process of designing Polyketide Synthases (PKSs) variants to obtain a desired molecule.
Current research: synthetic biology, machine learning and automation
In my current positions in JBEI and the Agile BioFoundry (ABF) I am interested in developing models of microbial metabolism which are as predictive and quantitative as possible, and then showcasing their validity by directing metabolic engineering efforts. I strongly believe that, in order for biology to mature further, it must become both quantitative and predictive.
Biofuel (and other bioproduct) synthesis provides a perfect catalyst for this evolution since it requires a predictive understanding of the biology involved and provides a valuable goal with a potentially very relevant impact in the world.
I am also very interested in further developing quantitative predictive models for microbial communities and expanding our techniques to deal with human metabolism and medical drugs.
The Quantitative Metabolic Modeling group combines research in mathematical modeling with machine learning, synthetic biology and automation in order to enable predictive biology for the benefit of society.