With promising advances in scientific modeling, complex phenomena (e.g., particle collisions, rocket combustion, microbial communities) can now be reliably simulated via virtual simulators. These simulators, or computer experiments, pave the way for novel approaches to scientific inference in challenging scientific problems. A key bottleneck, however, is that these simulations can be very expensive to perform, requiring millions of CPU hours on supercomputers; such costs can thus be prohibitive for modern scientific discoveries.
A promising solution is probabilistic surrogate modeling, which uses Bayesian machine learning (ML) and artificial intelligence (AI) models to "emulate" the costly simulator. When constructed well, such surrogates can accurately predict the virtual simulator outputs with precise uncertainties at greatly reduced computational costs. Our research develops novel methods, theory and algorithms for constructing effective Bayesian surrogates and their use for accelerating scientific discoveries. This involves:
The integration of scientific knowledge (e.g., partial differential equations) for guiding interpretable and cost-efficient surrogate learning,
The construction of Bayesian learning models on complex data objects encountered in scientific applications (e.g., tensors, manifolds and networks),
The experimental design of training data for reliable model training via active learning and reinforcement learning,
The effective use of trained surrogates for guiding downstream scientific decisions involving optimization, control, calibration and inverse problems,
The development of probabilistic anomaly detection methods to ensure data quality of massive streaming datasets in scientific experiments.
Simulation of the exotic quark-gluon plasma, which sheds light on the very early universe. Our group (as part of the JETSCAPE collaboration) studies properties of this plasma via Bayesian ML / AI methods.
The design of next-generation rocket engines can be greatly accelerated via virtual simulators. This video shows the simulated rocket engine flows (top; O(1000) CPU hours) and the predicted flows from our Bayesian ML surrogate model (bottom; ~10 CPU hours).
Recent years have seen fundamental progress in ML / AI methods. However, despite such progress, much of this work is on black-box learning models, which are purely data-driven and do not permit the integration of guiding domain knowledge. Particularly for scientific applications, where experimental data are oftentimes noisy, limited and/or of a complex form, models trained via state-of-the-art black-box approaches may still be wholly uninterpretable for the end user. This lack of interpretability can greatly obscure scientific findings and prevent the use of trained models for confident downstream scientific decisions.
Our group tackles this urgent need via the development of interpretable ML / AI models that combine the data with the science, by directly integrating various forms of guiding scientific knowledge within probabilistic ML models. This integration serves three key purposes: it facilitates effective learning from limited and/or noisy experimental data, enables confident downstream decision making and analysis, and provides a framework for discovering governing scientific principles from data. Our recent work investigates:
The incorporation of known boundary information for probabilistic learning,
The identification and incorporation of multi-physics for probabilistic learning,
Bayesian methods for partial differential equation system identification with uncertainty,
Music composition methods that incorporates ML / AI within guiding composition principles.
Visualizing our work on the integration of known boundary information for predictive modeling. Red points mark the data points and pink curves mark known boundaries. Boundary-integrated models offer much better predictions and greater interpretability for scientific end-users.
Our promotional video at NeurIPS 2023 on our sentiment-driven ML music composition model called SentHYMMent.
Our group works closely within multi-disciplinary collaborations to tackle a broad range of modern scientific applications. Our current or past collaborations include:
The JETSCAPE collaboration (Duke, LBNL, UC Berkeley, Texas A&M, MIT and other institutions) on high energy physics
The Bayesian UQ collaboration (Duke, LBNL, UC Berkeley and Wayne State) on nuclear physics
Prof. Suo Yang's lab on sustainable aviation fuels
Prof. Ophelia Venturelli's lab on microbial communities
JMP (a subsidiary of SAS Institute) on AI-automated software testing
Prof. Cynthia Rudin's lab on algorithmic music generation
Prof. Yao Xie's lab on change-point and anomaly detection
The School of Aerospace Engineering at Georgia Tech on rocket engine design
The Georgia Tech Manufacturing Institute on 3D-printed aortic valves
We are always looking for new collaborators! If interested, free free to email sm769[at]duke.edu, and we're happy to discuss more.
Our group works closely within the JETSCAPE collaboration, a multi-disciplinary team studying the quark-gluon plasma via particle collisions.
We have also worked with the Georgia Tech Manifacturing Institute on designing personalized 3D-printed aortic valves for urgent surgical planning,