Summary

My primary research focuses on the interdisciplinary study of applied mathematics (particularly ordinary differential equations and control theory), machine learning (particularly physics-informed neural networks and deep reinforcement learning), and complex biological systems (particularly large metabolic networks and circadian rhythms). My research projects aim to develop and apply powerful mathematical modeling and machine learning approaches to gain insights into complex biological systems. In the era of big data, in addition to the traditional mechanistic modeling of complex biological systems, a vast increase in the use of machine learning models has shown its power to keep pace with the data explosion. I gravitate to the increasing computational power and highly developed deep learning algorithms. I am strongly motivated to work at the juncture of systems biology and artificial intelligence.

Major Research Projects

Embedded physics-informed neural network (ePINN) for parameter and hidden dynamics inference: Identifying resource competition phenotypes in synthetic biochemical circuits

Biological systems have been widely studied as complex dynamic systems that evolve with time in response to the internal resources abundance and external perturbations due to their common features. Integration of systems and synthetic biology provides a consolidated framework that draws system-level connections among biology, mathematics, engineering, and computer sciences. One major problem in current synthetic biology research is designing and controlling the synthetic circuits to perform reliable and robust behaviors as they utilize common transcription and translational resources among the circuits and host cells. While cellular resources are often limited, this results in a competition for resources by different genes and circuits, which affect the behaviors of synthetic genetic circuits. The manner competition impacts behavior depends on the “bottleneck” resource. With knowledge of physics laws and underlying mechanisms, the dynamical behaviors of the synthetic circuits can be described by the first principle models, usually represented by a system of ordinary differential equations (ODEs).

In this work, we develop the novel embedded PINN (ePINN), which is composed of two nested loss-sharing neural networks to target and improve the prediction of the unknown dynamics from quantitative time series data. We apply the ePINN approach to identify the mathematical structures of competition phenotypes. Firstly, we use the PINNs approach to infer the model parameters and hidden dynamics from partially known data (including a lack of understanding of the reaction mechanisms or missing experimental data). Secondly, we test how well the algorithms can distinguish and extract unknown dynamics from noisy data. Thirdly, we study how the synthetic and competing circuits behave in various cases when different particles become a limited resource.

Quantitative Systems Pharmacology Model of Large Metabolic Networks

Quantitative systems pharmacology (QSP) is an integrated approach that combines mathematical, computational, and experimental methods to determine mechanisms of new and existing drug treatment on humans. I have collaborated with Dr. Karim Azer and his group (at Sanofi, now at Axcella) to develop a QSP model named Linear-In-Flux-Expressions (LIFE). LIFE methodology provided a new approach to modeling metabolic networks by using correlations among fluxes of metabolic networks and reducing the number of model parameters. The work contains the theoretical study of the stability and control of metabolic networks and the industry clinical data applications, see An et al. (2019); McQuade et al. (2018); Merrill et al. (2018). The major contributions of my work are:

  1. Provide general results relating stability of metabolic systems to the structure of the associated graph.

  2. Apply stability analysis from the fields of network flows, compartmental systems, control theory, and Markov chains to LIFE systems.

  3. Solve two control problems on metabolic networks: a. The optimization of intakes from the outside environment to drive the system to the desired state, and b. The inclusion of inhibitors and enhancers and their optimization.

Mathematical Modeling of Circadian Systems

I began my graduate study with a project on modeling circadian systems. In a series of publications, An et al. (2021); Lee et al. (2017), I developed a two-step ordinary differential equation-based model to analyze two essential characteristics of an authentic circadian clock – period and phase. The major contributions of my work are

  1. Generate and explain a broader range of period-phase relationships which cannot be explained by existing models.

  2. Provide theoretical analysis of the existence and stability of circadian systems’ periodic orbits.

  3. Provide the analytical solution to an important concept - the range of entrainment.

The findings can be applied to diagnostics and treatments for patients with sleep disorders caused by shift work or jet lag.

Further, I collaborated with Dr. Till Roenneberg, Dr. Martha Merrow, and Dr. Kwangwon Lee to develop a unified approach named dynamic circadian integrated characteristics (dCIRC) to investigate a fundamental problem in chronobiology - modeling and predicting entrainment An et al. (2022). Our model reconciles the traditional non-parametric and parametric approaches. It is the first model in the field that can describe and analyze the clock’s velocity changes and phase shift in natural cycling conditions. In this project, I found species-specific dCIRC parameters by fitting the Neurospora developmental/molecular rhythm data and solving a convex optimization problem. We found that individuals have non-uniform strategies of integrating light effects to accomplish the optimal phase of entrainment. The unified model provides new insights into how the circadian clock of diverse organisms functions in natural habitats.

Future work

While reading other related work in the field, I gained inspiration for the following topic: Twenty-four hours is not necessarily the appropriate demarcation line between fast and slow free-running periods. This fact may reflect the underlying genetic information of their non-uniform responses to light. It further suggests a more complicated period and phase relationship rather than short period results in an advanced phase, and long period results in a delayed phase. A broader goal is to discover a phase-period-genotype relationship by fitting extensive circadian rhythmic data across species (with ML/AI approaches). This relationship will act as a “circadian fingerprint” to identify chronotypes and guide sleep disorder treatments.

Modeling Forensic DNA Interpretation Process

In the past several decades, DNA profiling has dramatically developed into an indispensable tool for identifying individuals in the investigative and judicial process involving criminal cases. The DNA interpretation is a complex stochastic process that consists of five steps: desorption and extraction, quantification (pre-PCR), PCR amplification, capillary electrophoresis (CE) analysis, and short tandem repeat electropherogram (EPG) interpretation. The increasing computing power allows us to perform DNA profiling in-silico in much less costly and time-consuming ways. I worked with Dr. Catherine Grgicak to create a mathematical framework that combines forensic science, statistics, and computational methods to improve and standardize DNA evidence interpretation. We implemented the model to investigate the following problems: (1) Calculate the allele’s dropout rates and find the optimized laboratory settings which minimize the dropout rate. (2) Compare the DNA interpretation results of routine bulk analysis and single-cell analysis.

Future work

The discrepancies among the DNA interpretation processes across laboratories result in significantly different decisions in the judicial process that cause severe problems. The future work is to apply machine learning algorithms to find the optimized experimental protocols and standardize the DNA interpretation protocols across all laboratories.