Research
Cost Efficient Scientific Discovery
With advances in scientific modeling, complex phenomena (e.g., universe expansions, rocket combustion, human organs) can now be simulated at high accuracies using computer codes. These virtual simulations (or computer experiments) provide the means for Bayesian inference in many scientific problems. Such experiments can be very expensive, however, requiring millions of CPU hours on supercomputers. This is a key barrier for modern scientific discovery.
A promising solution is surrogate modeling, which uses simulated data to build predictive models that efficiently emulates the expensive computer code. Much of our research develops cost-efficient emulation methods (supported by theory & algorithms) for prediction, uncertainty quantification, and Bayesian inference of expensive systems. This involves:
A careful experimental design of training data for model training,
Building probabilistic predictive models on complex objects from scientific applications (e.g., functions, manifolds, networks),
Leveraging big data analytics and modern computing architecture for scalable prediction.
Our group also develops methodology that makes use of emulation for downstream decision making objectives, including optimization, control, reinforcement learning, and calibration & inverse problems.
Simulation of the quark-gluon plasma, which filled the universe shortly after the Big Bang. Our group (within the JETSCAPE collaboration) studies properties of this plasma via Bayesian inference.
Simulated rocket engine flows (top; O(1000) CPU hours) and our emulated flows (bottom; ~10 CPU hours) from Chang et al (2021).
Interpretable Statistical Learning
Data science is at a defining crossroad. With advances in statistical and machine learning, practitioners have access to a suite of state-of-the-art data-driven models. However, such models are nearly all black-box: they allow for minimal user interaction and domain knowledge integration, and as such yield outputs which may be wholly uninterpretable for the end user. This lack of interpretability not only results in poor predictions with limited & noisy training data, but also greatly inhibits scientific discovery.
To address this, our group develops interpretable statistical learning methods that combine the data with the science, via the embedding of prior domain knowledge within probabilistic machine learning models. This integration not only provides improved predictive performance given limited data, but also enables better interpretability and thus decision making. We develop novel methods for interpretable learning (supported by theory & algorithms), with applications in the physical sciences, engineering, finance and the arts.
Our recent work (Ding et al., 2022+) on integrating known boundary information for predictive modeling. Such models not only provide improved predictions but also greater interpretability for end-users.
KDD presentation video for our new algorithmic composition model SchenkComposer (Hahn et al., 2023), which embeds music composition principles within a machine learning framework.
NeurIPS promotion video for our new sentiment-driven algorithmic composition model SentHYMMent (Hahn et al., 2023+).
All Sorts of Applications!
Our work is directly motivated by important applications from multi-disciplinary collaborations, primarily in physical sciences and engineering but also recently in the social sciences & arts. Our group is working (or has worked) closely with:
The JETSCAPE collaboration on nuclear & high-energy physics
Lawrence Berkeley National Laboratory on high-energy physics
Prof. Suo Yang's lab on design & control of unmanned aerial vehicles
JMP/SAS on automated software testing
Prof. Cynthia Rudin's lab on algorithmic music generation
Prof. Yao Xie's lab on change-point & anomaly detection
The School of Aerospace Engineering at Georgia Tech on rocket engine design
The Georgia Tech Manufacturing Institute on 3D-printed aortic valves
We are always looking for new collaborators! Feel free to send an email to sm769[at]duke.edu, and we can discuss further.
Our group works closely within the JETSCAPE collaboration, a multi-disciplinary team studying the quark-gluon plasma via heavy-ion collisions.
We have also worked with GTMI on designing personalized 3D-printed aortic valves for urgent surgical planning,