Zero Curvature Initialization of Neural Networks
Zero Curvature Initialization
Neural networks typically have to be randomly initialized for gradient descent to work well.
However no amount of training can wash out all the noise resulting in very chaotic decision boundaries.
Especially the boundaries far from the training examples are erratic and do not provide the generalization they could.
One way to reduce the impact is to randomly initialize the network in such a way that it acts as a simple linear random projection (a simple square matrix) for any input, before training.
That is possible with ReLU or some other switching type activation functions.
Since ReLU is one sided you have to coordinate pairs of ReLU and Weights to get zero initial curvature. It is likely you would need both positive type (y=x x>=0, zero otherwise) and negative type (y=x x<=0, zero otherwise) ReLU functions for conventional neural networks.
The writer has only used zero curvature initialization with unconventional neural networks.