Harnessing the Data Revolution: Knowledge Guided Machine Learning

A Framework for Accelerating Scientific Discovery

Please click here for information on our 2nd Annual KGML workshop, coming in August 2021!

Introduction

Knowledge Guided Machine learning

The success of machine learning (ML) in many applications where large-scale data is available has led to a growing anticipation of similar accomplishments in scientific disciplines. The use of data science is particularly promising in scientific problems involving processes that are not completely understood. However, a purely data-driven approach to modeling a physical process can be problematic. For example, it can create a complex model that is neither generalizable beyond the data on which it was trained nor physically interpretable. This problem becomes worse when there is not enough training data, which is quite common in science and engineering domains. A machine learning model that is grounded by explainable theories stands a better chance at safeguarding against learning spurious patterns from the data that lead to non-generalizable performance. This is especially important when dealing with problems that are critical and associated with high risks (e.g., extreme weather or collapse of an ecosystem). Hence, neither an ML-only nor a scientific knowledge-only approach can be considered sufficient for knowledge discovery in complex scientific and engineering applications. This project is developing novel techniques to explore the continuum between knowledge-based and ML models, where both scientific knowledge and data are integrated synergistically. Such integrated methods have the potential for accelerating discovery in a range of scientific and engineering disciplines. This project will train interdisciplinary scientists who are well versed in such methods and will disseminate results of the project via peer-reviewed publications, open-source software, and a series of workshops to engage the broader scientific community.

This project aims to develop a framework that uses the unique capability of data science models to automatically learn patterns and models from data, without ignoring the treasure of accumulated scientific knowledge. Specifically, the project builds the foundations of knowledge-guided machine learning (KGML) by exploring several ways of bringing scientific knowledge and machine learning models together using pilot applications from four domains: aquatic ecodynamics, climate and weather, hydrology, and translational biology. These pilot applications were selected because they are at tipping points where knowledge-guided machine learning can have a transformative effect. KGML has the potential for providing scientists and engineers with new insights into their domains of interest and will require the development of innovative new machine learning approaches and architectures that can incorporate scientific principles. Scientific knowledge, KGML methods, and software developed in this project could potentially be extended to a wide range of scientific applications where mechanistic (also known as process-based) models are used.

Funding

Harnessing the Data Revolution

NSF Awards: 1934668 (CSU), 1934548 (Penn State), 1934600 (UVA), 1934633 (UW), 1934721 (UMN)

This 2-year, $1.8 Million project is funded by an award from the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea. The program is a national-scale activity to enable new modes of data-driven discovery that will allow new fundamental questions to be asked and answered at the frontiers of science and engineering. The HDR Big Idea will establish theoretical, technical, and ethical frameworks that will be applied to tackle data-intensive problems in science and engineering, contributing to data-driven decision-making that impacts society.

Particpating Institutions

The project team, led by the University of Minnesota, includes faculty and researchers from the College of Science and Engineering and the College of Food, Agricultural and Natural Resource Sciences, as well as researchers from the University of Virginia, the University of Wisconsin-Madison, Colorado State University, Pennsylvania State University, and the United States Geological Survey.