Motivation
The Geometric Sensitivity Decomposition (GSD) provides a way to extract the variance terms, which are the sensitive components of norms and angles. If a model can directly output the sensitive norm and angle instead of the original ones, the model will become more sensitive to covariate and concept shift. In this section, we will introduce a parametrized training scheme that incorporates GSD and improves the sensitivity of norms and angles during training. The resultant model leads to improved OOD detection performance and calibration on OOD data.
Parametrized training in A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition
We replace the scalar components in GSD with free trainable parameters, alpha and beta, which are allowed to change freely during training as shown in Eq.1.
Equation 5: Scalar parametrization
Parametrized Training in Exploring Covariate and Concept Shift for Out-of-Distribution Detection and Calibration
To further improve performance, we parametrize alpha and beta using a single-layer feedforward neural network, alpha(x) and beta(x), and make them dependent on the input. Again, alpha(x) and beta(x) are independent of the norm and angle from features produced by the model. Because this model generalizes a prior work, Generalized ODIN, we call it Geometric ODIN.
Equation 6: Neural network parametrization
Performance
The new model can be trained identically as the vanilla network without additional hyperparameter tuning and extended training time. Combined with the score functions, Geometric ODIN achieves state-of-the-art OOD detection and calibration performance. Please refer to the papers for a full description of OOD detection and calibration experiments. We present the partial results here in table 1 and table 2.
Table 1: Calibration results on CIFAR10 and CIFAR10C averaged over 5 seeds.
Table 2: AUROC for out-of-distribution detection results. Results are averaged over 5 seeds
Conclusion
In this section, we incorporated the geometric sensitivity decomposition theory into training. This is achieved by parametrizing the variance terms from the decomposition as standalone scalars or networks. Features from the new parametrized network will be more sensitive to both covariate shift and concept shift because their norms and angles encode the sensitive variance terms now. The intuition goes back to when we derived the score functions and hypothesized that improving the sensitivity of components in the score functions could lead to overall sensitivity improvement. This completes the last section of the project. More resources are provided on the introduction page.