Reliable Predictive Inference

Overview

An important factor to guarantee a responsible use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance.

This website contains a Python implementation of conformalized quantile regression (CQR) methodology for constructing marginal distribusion-free prediction intervals. It also implements the equalized coverage framework that builds valid group-conditional prediction intervals.

CQR is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. It combines the statistical efficiency of quantile regression with the distribution-free coverage guarantee of conformal prediction. On one hand, CQR is flexible in that it can wrap around any algorithm for quantile regression, including random forests and deep neural networks. On the other hand, a key strength of CQR is its rigorous control of the miscoverage rate, independent of the underlying regression algorithm.

For more information, please refer to the synthetic experiment and real data experiment pages.

Y. Romano, E. Patterson, and E. J. Candès, “Conformalized quantile regression.” arXiv:1905.03222, 2019.

To support equitable treatment, the equalized coverage methodology forces the construction of the prediction intervals to be unbiased in the sense that their coverage must be equal across all protected groups of interest. Similar to CQR and conformal inference, equalized coverage offers rigorous distribution-free guarantees that hold in finite samples. This methodology can also be viewed as a wrapper around any predictive algorithm.

For more information, please refer to the real data experiment and detect prediction bias pages.

Y. Romano, R. F. Barber, C. Sabbatti and E. J. Candès, "With malice towards none: Assessing uncertainty via equalized coverage." 2019.


The methods described in the paper are implemented in Python, and available under the MIT license.

The code is self-contained and publicly available from GitHub: https://github.com/yromano/cqr.

Some of the code is taken from the nonconformist package available at https://github.com/donlnz/nonconformist. One may refer to the nonconformist repository to view other applications of conformal prediction.

Please contact Yaniv Romano for bug reports. Email: yaniv dot romano at gmail dot com

Illustration

The figure below illustrates the marginal prediction intervals constructed by CQR on synthetic example. The heteroskedasticity of the data is evident, as the dispersion of 𝑌 (the response) varies considerably with 𝑋 (the features). The data also contains outliers.

We start by fitting two quantile regressors (random forests in this example) on the training set to obtain initial estimates of the lower and upper bounds of the prediction interval (the two black curves in the figure). Then, we use a held-out validation set to conformalize and, if necessary, correct this prediction interval. Unlike the original interval, the conformalized prediction interval (the highlighted blue area) is guaranteed to satisfy the coverage requirement (90% in this example) on unseen test data, regardless of the choice or accuracy of the quantile regression estimator.

Notice how the length of constructed interval varies with 𝑋, reflecting the uncertainty in the prediction of 𝑌.

For more details, please refer to the synthetic experiment page and to our paper.