Conformalized Quantile Regresion

Overview

Conformalized quantile regression (CQR) is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. It combines the statistical efficiency of quantile regression with the distribution-free coverage guarantee of conformal prediction.

Yaniv Romano, Evan Patterson, and Emmanuel J. Candes, “Conformalized quantile regression”, arXiv:1905.03222, 2019.

Code

The methods described in the paper are implemented in Python, and available under the MIT license.

The code is self-contained and publicly available from GitHub: https://github.com/yromano/cqr.

Some of the code is taken from the nonconformist package available at https://github.com/donlnz/nonconformist. One may refer to the nonconformist repository to view other applications of conformal prediction.

Key features

CQR inherits both the finite sample, distribution-free validity of conformal prediction and the statistical efficiency of quantile regression. On one hand, CQR is flexible in that it can wrap around any algorithm for quantile regression, including random forests and deep neural networks. On the other hand, a key strength of CQR is its rigorous control of the miscoverage rate, independent of the underlying regression algorithm.

Illustration

The figure below illustrates the prediction intervals constructed by CQR on synthetic example. The heteroskedasticity of the data is evident, as the dispersion of 𝑌 (the response) varies considerably with 𝑋 (the features). The data also contains outliers.

We start by fitting two quantile regressors (random forests in this example) on the training set to obtain initial estimates of the lower and upper bounds of the prediction interval (the two black curves in the figure). Then, we use a held-out validation set to conformalize and, if necessary, correct this prediction interval. Unlike the original interval, the conformalized prediction interval (the highlighted blue area) is guaranteed to satisfy the coverage requirement (90% in this example) on unseen test data, regardless of the choice or accuracy of the quantile regression estimator.

Notice how the length of constructed interval varies with 𝑋, reflecting the uncertainty in the prediction of 𝑌.

For more details, please refer to the synthetic experiment page and to our paper. 