Abstract:
Ecosystems are central to the global carbon and nitrogen cycles, yet their complexity and heterogeneity pose major challenges to modeling efforts. While process-based models offer theoretical rigor grounded in biophysical and biochemical principles, they often face limitations in computational scalability, data assimilation, and structural uncertainty. Conversely, purely data-driven models can leverage rich sensing data streams but tend to lack generalizability and scientific interpretability. Knowledge-Guided Machine Learning (KGML) bridges these paradigms by embedding domain knowledge into machine learning frameworks to produce models that are both accurate and mechanistically informed. In this talk, I will introduce a series of applications that demonstrate how KGML improves simulations of carbon and nitrogen fluxes across agricultural and natural ecosystems. Use cases include modeling carbon budgets from croplands, advancing digital twin frameworks for sustainable agriculture, and predicting methane dynamics in natural ecosystems. By blending scientific priors with data-driven flexibility, KGML facilitates ecosystem modeling that is explainable, scalable, and climate-action-relevant. Ultimately, this approach supports more robust decision-making in environmental management and provides a pathway for developing AI systems that are physically consistent, data-efficient, and applicable to complex Earth system processes.
Bio:
Dr. Licheng Liu is a research scientist at the University of Minnesota and the lead of the KGML division in the NSF AI-LEAF Institute, and the AI for Nature Methane working group. His research integrates process-based modeling, machine learning, in-situ sensing, and remote sensing to understand biogeochemical dynamics in agricultural and natural ecosystems, with a focus on greenhouse gas emissions and climate feedbacks. His work spans carbon-nitrogen-water cycle modeling, AI-enhanced crop and soil simulations, and the development of open-source KGML frameworks for ecosystem prediction. Starting January 2026, Dr. Liu will join the University of Wisconsin–Madison as an Assistant Professor in the Department of Biological Systems Engineering.
Summary:
Focus: understanding the behavior of the world's ecosystems.
Increasing global populations have put a strain on ecosystems and natural resources: land use, water use, energy use
To sustain ecosystems we need to understand them: carbon, water, nutrients
Strongly coupled/interdependent with
Global climate: rainfall, temperature, variability
Human management: fertilization, irrigation, land use
Measuring ecosystems is challenging
Limited measurement data
Challenges:
Ecosystem heterogeneity: soil, vegetation, management
Complex biogeochemical, physical processes
Opportunity: AI for ecosystem understanding
Data: in-situ sensor networks, remote sensing, meteorological data, bopgeochemical, geospatial&survey, synthetic
Applications: greenhous gases, carbon sequestration,. Production, water, air quality, soils, biodiversity, natural hazards
Modeling can infer unknown information from measurements
Process-based models incorporate known dynamics but are limited by process representation
Machine learning models (black box) adapt to data but have poor explainability and generalizability
Hybrid AIU models can leverage the best of both techniques (e.g. differentiable simulations, SciML)
Knowledge-Guided ML: Training based on domain-specific prior knowledge: invariants, useful examples
Example: advancing agroeconomics in US corn belt
Challenge: simulating N2O emissions from use of fertilizer in corn farming
High spatial/temporal variability
KGML:
Train ML model using runs oc ecosys model https://github.com/jinyun1tang/ECOSYS
KGML model is able to accurately predict N2O emissions
Supports data inversion across US midwest cornbelt: infer emissions hotspots from sparse regional measurements
Supports interpretability via causal diagrams, which can guide additional improvements in process-based model (using PCMCI: https://jakobrunge.github.io/tigramite)
Example: understanding carbon budget in agriculture
Agriculture is both a source and a sink for carbon
Modeling emissions from agricultural activities
Train ML model based on traces of ecosys model
Used a knowledge-guided loss function: mass balance, threshold control, response control
Knowledge-guided extrapolation by assimilating remote-sensed data
Hybrid model
Outperforms both pure-ML and ecosys alone in accuracy
More efficient than ecosys
Can use emissions model to estimate carbon credit risk
Example: Advancing natural ecosystem understanding with hybrid AI
Methane is a major contributor to climate change
Data on emissions is sparse and diverse
Pure AI models are good in-sample but don’t generalize
Most data are limited to high latitude in North Hemisphere
Poor accuracy in tropics, South hemisphere
Integrating scientific knowledge can improve accuracy
Knowledge-guided initialization TEM-MDM: https://www.eaps.purdue.edu/ebdl/resources/ecosystems-biochemical.html
Training ML model on its traces improves model accuracy
AI model can help plan new locations for placing measurement sites across the world to maximize predictive accuracy
AI for Natural Methane Working Group:
Ongoing work:
Water quantity & natural concentration
Precision agriculture
Global carbon cycle