Geometric Sensitivity Decomposition

A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition

Motivation

The covariate score, g(x), is a function of feature norms and the concept score, h(y,x), is a function of feature angles. Consequently, improving the sensitivity of feature norms and feature angles to distribution shifts seems to be the natural next step to improve OOD detection. This section derives the proposed geometric sensitivity decomposition (GSD) which extracts the sensitive components from feature norms and angles.

Norm Decomposition

We decompose the norm of a feature into two components: a scalar offset (denoted by a script C) and a variance norm (denoted by a delta symbol), which we define in Fig.3. The role of the scalar offset is to minimize the loss on the entire training set and the variance component accounts for differences in samples. Therefore, if we can disentangle the scalar offset from the variance norm we can obtain a norm that is very sensitive to the hardness of data.

Figure 3: Norm decomposition

Angle Decomposition

Similarly, we relax the angles such that the predicted angular similarity does not need to be close to one on the training data, i.e., making the angles larger. To achieve this, we introduce a scalar angular offset and a variance angle as shown in Fig.4. Analogous to the norm decomposition, the scalar serves solely to minimize the training loss while the variance angle accounts for differences in samples. Because we need to account for the sign of the angle, we put an absolute value on it.

Figure 4: Angle decomposition

Geometric Sensitivity Decomposition

With decomposed components, we can rewrite the inner product in the linear layer by plugging in the equations of decomposed norm and angle as shown in Eq.4.

Equation 4: Geometric Sensitivity Decomposition

Conclusion

In this section, we approached the problem of distribution shifts from a modeling perspective as motivated on the introduction page. Specifically, we model distribution shifts by extracting the sensitive components from feature norms and angles because the score functions for covariate shift and concept shift are functions of norms and angles respectively. This implies that improving the sensitivity of norms and angles to distribution shift could potentially lead to overall sensitivity improvement. However, it remains unclear how to incorporate the decomposition theory in practice. Next, we will introduce a simple parametrized training scheme that utilizes the proposed decomposition theory to improve the sensitivity of norms and angles.

Page updated

Google Sites

Report abuse