Gelu

ReLu is commonly used in CNN models

Gelu (Gaussian Error Linear Unit) is cmmonly used in NLP models, e.g. GPT

Relu alleviates vanishing gradient problem, however it can cause Dying Relu problem, in which some neurons stuck with zero and unlikely to recover.

Gelu is a smoother version of Relu, but more expensive to calculate.

Unlike ReLU, which gates inputs by their sign, GELU weights inputs by their percentiles. This means that GELU allows small negative values when the input is less than zero, providing a richer gradient for backpropagation.

Page updated

Google Sites

Report abuse