NCP:Neighborhood Conformal Prediction

Subhankar Ghosh* Taha Belkhouja* Yan Yan Jana Doppa

Abstract

Safe deployment of deep neural networks in high-stake real-world applications requires theoretically sound uncertainty quantification. Conformal prediction (CP) is a principled framework for uncertainty quantification of deep models in the form of prediction set for classification tasks with a user-specified coverage (i.e., true class label is contained with high probability). This paper proposes a novel algorithm referred to as Neighborhood Conformal Prediction(NCP) to improve the efficiency of uncertainty quantification from CP for deep classifiers (i.e., reduce prediction set size). The key idea behind NCP is to use the learned representation of the neural network to identify k nearest-neighbors calibration examples for a given testing input and assign them importance weights proportional to their distance to create adaptive prediction sets. We theoretically show that if the learned data representation of the neural network satisfies some mild conditions, NCP will produce smaller prediction sets than traditional CP algorithms. Our comprehensive experiments on CIFAR-10, CIFAR-100, and ImageNet datasets using diverse deep neural networks strongly demonstrate that NCP leads to significant reduction in prediction set size over prior CP methods.

Vanilla CP method treats all calibration examples uniformly. In reality, the effect of all calibration example to the test point may not be uniform based on the distance from the test point to calibration points. NCP(Neighborhood conformal prediction) method considers the heterohinity behaviour of calibration examples for the test point.

The center point is the test data on both circles. The figure shows that the vanilla CP and NCP treat the calibration examples differently. We can visualize the effects of calibration points on the test data, the nearer calibration points has more effect on the test data compared to the distanced points. We select top KNN points for the calibration to find out the quantile-threshold value to estimate prediction set for the new data.

Algorithms

We present four algorithms to show what we did for the experiment part. We use an adaptive prediction set(APS) conformity score for the classification and absolute residual(AR) for the regression as our baselines. We did not show NCP(RAPS) and NCP(RAR) as they can be formed using two algorithms from the given ones.

The algorithm 1 is the Vanilla split conformal prediction(CP) using APS conformity score for the classification. Line 2 indicates training a model, line 3 says estimate APS conformity scores for all calibration points, line 4 indicates to estimate quantile threshold, and line 5 estimates the prediction set for the new point.

The algorithm 2 is the Vanilla split conformal prediction(CP) using NCP(APS) conformity score for the classification. Line 2 indicates training a model, line 3 estimates APS conformity scores for all calibration points, line 4 modifies the existing APS score to NCP(APS) score, line 5 indicates to estimate quantile threshold, and line 6 estimates the prediction set for the new point.

As we are weighing the conformity scores, the weighted distribution is no longer exchangeable. To hold the exchangeability, we need to either tune the $\lambda_L$ or select different $\alpha$ to guarantee finite-sample coverage.

The algorithm 3 is the Vanilla split conformal prediction(CP) using AR conformity score for the regression. Line 2 indicates training a model, line 3 says estimate absilute residual(AR) conformity scores for all calibration points, line 4 indicates to estimate quantile threshold, and line 5 estimates the prediction set for the new point.

The algorithm 4 is the Vanilla split conformal prediction(CP) using regularization absolute residual (RAR) conformity score for the regression. Line 4 modifies the existing AR score to RAR score, and line 6 estimates the prediction set for the new point.

Advantages compared to existing methods:

Our method(NCP) provides better efficiencies(less prediction set size) than RAPS while both maintain the marginal coverage guarantee.

Disadvantages compared to existing methods:

NCP needs local information during test time so it consumes more memory than the RAPS method.

RESULTS

The above two figures present results for marginal coverage and prediction set size respectively for the CIFAR10 data set. As the top-1 accuracy for the CIFAR10 data set is above 90%, we made the target coverage is 96% in order to show the variations on prediction set size constructed by different methods.

The above two figures present results for marginal coverage and prediction set size respectively for the CIFAR100 data set. As can be seen, the efficiency of NCP(RAPS) is the highest among all methods while maintaining the desired coverage. The target coverage is 90%.

The above two figures present results for marginal coverage and prediction set size respectively for the ImageNet data set. As can be seen, the efficiency of NCP(RAPS) is the highest among all methods while maintaining the desired coverage. The target coverage is 90%.