ML for Scientific Data - ORNL

Research Highlight

PI3NN: Out-of-distribution-aware prediction intervals from three neural networks

Achievement

We have developed a prediction interval method (PI3NN) to quantify ML model prediction uncertainty and theoretically proven that it precisely captures the uncertainty for a given confidence level and completely avoids the crossing issues suffered by the state-of-the-art methods for a regression problem. PI3NN is particularly suitable for scientific ML (SciML). The produced uncertainty bound can assess the model prediction’s credibility and trustworthiness; it can identify data/domain shift and understand when the learned model could fail and why it fails; and it also can guide data collection to automate the experimental design and inform decision making. We developed PI3NN to accurately quantify model prediction uncertainty by training three neural networks, one neural network to approximate the prediction value, the other two neural networks to estimate the lower and upper bound of the prediction interval. The method directly communicates uncertainty using intervals which provides understandable information for decision making. PI3NN does not introduce any unusual hyperparameters resulting in stable performance and it requires no distributional assumption on data which makes it widely applicable for credible predictions in natural sciences and safe operation and control in complex engineered systems.

Github Link: https://github.com/liusiyan/PI3NN

Figure 1: Top panels illustrate the four steps of our PI3NN algorithm. Bottom panels illustrate the effectiveness of the OOD identification feature. Step1 trains one NN to obtain the prediction mean value; Step 2 adds a shift v to approximate median and generate samples for training the other two NNs; Step 3 trains the other two NNs to learn the upper and lower bound of the interval; and Step 4 precisely determines the prediction interval via root-finding method. When turning on the OOD identification feature, PI3NN can accurately identify the OOD regions [-7, -4] [4, 7] by giving them increasingly large prediction intervals as their distance from the training data gets large. If we turn off the OOD identification by using the default initialization, PI3NN will not identify the OOD regions by giving them a narrow uncertainty bound.

Publications

S. Liu, P. Zhang, D. Lu and G. Zhang, PI3NN: Out-of-distribution-aware prediction intervals from three neural networks, Proceedings of 10th International Conference on Learning Representations (ICLR), 2022.
P. Zhang, S. Liu, D. Lu, R. Sankaran, G. Zhang, An out-of-distribution-aware autoencoder model for reduced chemical kinetics, Discrete & Continuous Dynamical Systems - Series S, 15(4), pp. 913-930, 2022.