With promising advances in scientific modeling, complex phenomena (e.g., particle collisions, rocket combustion, microbial communities) can now be reliably simulated via virtual simulators. These simulators, or computer experiments, pave the way for novel approaches for scientific inference in challenging scientific problems. A key bottleneck is that these simulations can be very expensive: they require millions of CPU hours to perform on supercomputers, which can be prohibitive for modern scientific settings.
A promising solution is probabilistic surrogate modeling, which uses Bayesian machine learning (ML) and artificial intelligence (AI) models to emulate the costly simulator. When constructed well, such surrogates can accurately predict the virtual simulator outputs with precise uncertainties at greatly reduced computational costs. We develop novel methods, theory and algorithms for constructing effective Bayesian surrogates and for using them to accelerate scientific discoveries. This involves:
The integration of scientific knowledge (e.g., mechanistic equations, scientific principles) to guide interpretable surrogate learning,
The construction of Bayesian learning models on complex data objects often encountered in scientific applications (e.g., tensors, manifolds, networks),
The experimental design of training data for reliable model training via active learning and reinforcement learning,
The effective use of surrogates to guide downstream scientific decision-making involving optimization, control, calibration and inverse problems,
The development of probabilistic anomaly detection methods to ensure data quality in scientific experiments.
Simulation of the exotic quark-gluon plasma, which sheds light on the very early universe. Our group (as part of the JETSCAPE collaboration) studies properties of this plasma via Bayesian ML / AI methods.
The design of next-generation rocket engines can be greatly accelerated via virtual simulators. This video shows the simulated rocket engine flows (top; O(1000) CPU hours) and the predicted flows from our Bayesian ML surrogate model (bottom; ~10 CPU hours).
Recent years have seen fundamental advancements in ML/AI methods. However, much of this work is on black-box learning models, which are purely data-driven and do not permit the integration or learning of guiding principles. Particularly for scientific applications, where experimental data are oftentimes noisy, limited and/or of a complex form, models trained via state-of-the-art black-box approaches may be wholly uninterpretable for the end user. This lack of interpretability greatly obscures scientific findings and hinders the use of ML/AI models for confident scientific decisions.
Our group tackles this urgent need via the development of interpretable ML/AI models that combine the data with the science, by directly integrating and learning underlying scientific and/or domain principles. This integration serves three key purposes: it enables effective and controlled learning from limited and/or noisy experimental data, facilitates interpretable and confident scientific decision-making, and provides a framework for uncovering scientific principles from data. Our recent work investigates:
The incorporation of known boundary information for probabilistic learning,
The identification and incorporation of multi-physics for probabilistic learning,
The probabilistic learning of underlying mechanistic equations from data,
AI/ML music generation methods that incorporate music composition principles.
Visualizing our work on the integration of known boundary information for predictive modeling. Red points mark the data points and pink curves mark known boundaries. Boundary-integrated models offer much better predictions and greater interpretability for scientific end-users.
Our promotional video at NeurIPS 2023 on our sentiment-driven ML music composition model called SentHYMMent.
Our group works closely within multi-disciplinary collaborations to tackle a broad range of modern scientific applications. Our current or past collaborations include:
The JETSCAPE collaboration (Duke, LBNL, UC Berkeley, Texas A&M, MIT and other institutions) on high energy physics
The Bayesian UQ collaboration (Duke, LBNL, UC Berkeley and Wayne State) on nuclear physics
Prof. Suo Yang's lab on sustainable aviation fuels
Prof. Ophelia Venturelli's lab on microbial communities
JMP (a subsidiary of SAS Institute) on AI-automated software testing
Prof. Cynthia Rudin's lab on algorithmic music generation
Prof. Yao Xie's lab on change-point and anomaly detection
The School of Aerospace Engineering at Georgia Tech on rocket engine design
The Georgia Tech Manufacturing Institute on 3D-printed aortic valves
We are always looking for new collaborators! If interested, free free to email sm769[at]duke.edu.
Our group works closely within the JETSCAPE collaboration, a multi-disciplinary team studying the quark-gluon plasma via particle collisions.
We have also worked with the Georgia Tech Manifacturing Institute on designing personalized 3D-printed aortic valves for urgent surgical planning,