ReLU as a Switch

The ReLU (rectified linear unit) is an activation function used in artificial neural networks.

The ReLU function is defined as follows:

f(x) = max(0, x)

For any input value (x), the function outputs the value itself if it's positive (max(0, x) = x). If the input is negative, the function outputs zero. Graphically, this creates a straight line at zero for negative inputs and a diagonal line with a slope of 1 for positive inputs.

However the ReLU function is such a simple operation that an alternative viewpoint exist.

An electrical switch conducts 1 to 1 when on (eg. 1 volt in gives 1 volt out, 2 volts in gives 2 volts out) and give zero volts out when off. You can then say that ReLU is a switch with (x>=0)? as the switching decision. 

The switching viewpoint can remove certain obfuscations in understanding ReLU based neural networks. It makes clear that in a ReLU neural network weighted sums are being connected to and disconnected from each other. And once the switching states in part of the network become known the composite weighted sum for each neuron that depend on those states can be simplified by linear algebra to a simple weighted of the input variable. 

Each switch state decision in the network is only ever dependent on "some" simple weighted sum of the input. However that simple weighted sum is built up from other prior switching decisions.

Nevertheless the output of a particular neuron for a particular input is some simple weighted sum of the input values. You can reverse engineer that to see what in the input is supporting the particular neuron output.

For a particular input the ReLU network collapses to a simple square matrix. You can kind of say that of any type of network - for a particular input, but it is more especially useful concept for the ReLU type.

In electrical engineering 2 way switching is common up to N way switching.

Using the switching view point it becomes obvious that if you used a 2 way switch that connected to an additional contact when the other switch contact disconnected you can, so to say, 2 side ReLU and connect to an alternative set of forward connected weights that you place additionally in the next layer.

You can also observe that max pooling is 1 of N switching.