Research

Cost Efficient Scientific Discovery

With advances in scientific modeling, complex phenomena (e.g., universe expansions, rocket combustion, human organs) can now be simulated at high accuracies using computer codes. These virtual simulations (or computer experiments) provide the means for Bayesian inference in many scientific problems. Such experiments can be very expensive, however, requiring millions of CPU hours on supercomputers. This is a key barrier for modern scientific discovery.

A promising solution is surrogate modeling, which uses simulated data to build predictive models that efficiently emulates the expensive computer code. Much of our research develops cost-efficient emulation methods (supported by theory & algorithms) for prediction, uncertainty quantification, and Bayesian inference of expensive systems. This involves:

Our group also develops methodology that makes use of emulation for downstream decision making objectives, including optimization, control, reinforcement learning, and calibration & inverse problems.

Simulation of the quark-gluon plasma, which filled the universe shortly after the Big Bang. Our group (within the JETSCAPE collaboration) studies properties of this plasma via Bayesian inference.

emu.mp4

Simulated rocket engine flows (top; O(1000) CPU hours) and our emulated flows (bottom; ~10 CPU hours) from Chang et al (2021).

Interpretable Statistical Learning

Data science is at a defining crossroad. With advances in statistical and machine learning, practitioners have access to a suite of state-of-the-art data-driven models. However, such models are nearly all black-box: they allow for minimal user interaction and domain knowledge integration, and as such yield outputs which may be wholly uninterpretable for the end user. This lack of interpretability not only results in poor predictions with limited & noisy training data, but also greatly inhibits scientific discovery.

To address this, our group develops interpretable statistical learning methods that combine the data with the science, via the embedding of prior domain knowledge within probabilistic machine learning models. This integration not only provides improved predictive performance given limited data, but also enables better interpretability and thus decision making. We develop novel methods for interpretable learning (supported by theory & algorithms), with applications in the physical sciences, engineering, finance and the arts.

Our recent work (Ding et al., 2022+) on integrating known boundary information for predictive modeling. Such models not only provide improved predictions but also greater interpretability for end-users.

KDD presentation video for our new algorithmic composition model SchenkComposer (Hahn et al., 2023), which embeds music composition principles within a machine learning framework.

NeurIPS promotion video for our new sentiment-driven algorithmic composition model SentHYMMent (Hahn et al., 2023+).

All Sorts of Applications!

Our work is directly motivated by important applications from multi-disciplinary collaborations, primarily in physical sciences and engineering but also recently in the social sciences & arts. Our group is working (or has worked) closely with:

We are always looking for new collaborators! Feel free to send an email to sm769[at]duke.edu, and we can discuss further.

Our group works closely within the JETSCAPE collaboration, a multi-disciplinary team studying the quark-gluon plasma via heavy-ion collisions.

aortic.mp4

We have also worked with GTMI on designing personalized 3D-printed aortic valves for urgent surgical planning,