Activation Functions

Takeaways

  1. Always use an activation function. Otherwise the network won't be deep.
  2. ReLU causes bias shift because the mean output is > 0 and has 0 gradient in half the places.
  3. The best options right now are PReLU and ELU.
  4. Good hyperparameters are:
    1. Leaky ReLU 0.01 slope
    2. PReLU 0.25 initial slope
    3. RRelu uniform random slope between 1/8 and 1/3 at train time. (1/8 + 1/3)/2 at test time.
Activation Functions

Code