Initial Guessing Bias

If we pass a binary dataset through an untrained network,

does it assign half of the examples to each class, or does it privilege one class?

The answer depends on the architecture.

Things that we already know:

Initial Guessing Bias: How Untrained Networks Favor Some Classes, Emanuele Francazi, Aurelien Lucchi, Marco Baity-Jesi, ICML 2024 [arXiv link ] [talk] [GitHub project page]

We prove that untrained neural networks can unevenly distribute their guesses among different classes. This is due to a node-permutation symmetry breaking, caused by architectural elements such as activations, depth and max-pooling.

Some of the questions we want to address:

Effect of IGB on the training dynamics:

Once we understand how architecture design causes predictive bias on an untrained neural network, a question that naturally arises is how the bias affects the learning dynamics. In this direction, it may be interesting to exploit Initial Guessing Bias (IGB) to counterbalance other effects (e.g., class imbalance) to improve performance.

The role of the dataset:

Our work shows how the design of a neural network can bring out IGB. At the same time, however, our analysis shows how input distribution also plays a key role in the phenomenon. A more in-depth study of the interplay between these two elements, in addition to further complementing the understanding of IGB, may be an important first step in including modeling of real data in the analysis. Understanding the role of dataset distribution could also inform about pre-processing procedures (e.g., how to standardize the dataset).

Extention of the analysis on real data and CNNs:

Our analysis succeeds in quantitatively describing IGB on multilayer perceptron (MLP) and using a Gaussian blob as input. Since the phenomenon is empirically observed on broader settings, the natural next step is to extend our analysis on them.

IGB and Regularizations:

The condition driving the emergence of IGB suggests how some forms of network regularization might be effective in eliminating IGB. Understanding the effect of regularizations on the phenomenon can then inform about whether or not they should be included in the network design, depending on whether or not we want to eliminate IGB.