A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition
Motivation
The covariate score, g(x), is a function of feature norms and the concept score, h(y,x), is a function of feature angles. Consequently, improving the sensitivity of feature norms and feature angles to distribution shifts seems to be the natural next step to improve OOD detection. This section derives the proposed geometric sensitivity decomposition (GSD) which extracts the sensitive components from feature norms and angles.
Norm Decomposition
We decompose the norm of a feature into two components: a scalar offset (denoted by a script C) and a variance norm (denoted by a delta symbol), which we define in Fig.3. The role of the scalar offset is to minimize the loss on the entire training set and the variance component accounts for differences in samples. Therefore, if we can disentangle the scalar offset from the variance norm we can obtain a norm that is very sensitive to the hardness of data.
Figure 3: Norm decomposition
Angle Decomposition
Similarly, we relax the angles such that the predicted angular similarity does not need to be close to one on the training data, i.e., making the angles larger. To achieve this, we introduce a scalar angular offset and a variance angle as shown in Fig.4. Analogous to the norm decomposition, the scalar serves solely to minimize the training loss while the variance angle accounts for differences in samples. Because we need to account for the sign of the angle, we put an absolute value on it.
Figure 4: Angle decomposition
Geometric Sensitivity Decomposition
With decomposed components, we can rewrite the inner product in the linear layer by plugging in the equations of decomposed norm and angle as shown in Eq.4.
Equation 4: Geometric Sensitivity Decomposition
Conclusion
In this section, we approached the problem of distribution shifts from a modeling perspective as motivated on the introduction page. Specifically, we model distribution shifts by extracting the sensitive components from feature norms and angles because the score functions for covariate shift and concept shift are functions of norms and angles respectively. This implies that improving the sensitivity of norms and angles to distribution shift could potentially lead to overall sensitivity improvement. However, it remains unclear how to incorporate the decomposition theory in practice. Next, we will introduce a simple parametrized training scheme that utilizes the proposed decomposition theory to improve the sensitivity of norms and angles.