Gelu
ReLu is commonly used in CNN models
Gelu (Gaussian Error Linear Unit) is cmmonly used in NLP models, e.g. GPT
Relu alleviates vanishing gradient problem, however it can cause Dying Relu problem, in which some neurons stuck with zero and unlikely to recover.
Gelu is a smoother version of Relu, but more expensive to calculate.
Unlike ReLU, which gates inputs by their sign, GELU weights inputs by their percentiles. This means that GELU allows small negative values when the input is less than zero, providing a richer gradient for backpropagation.