A model converges when additional training will not improve the model. There will be cases when we may perceive that convergence happened, although in reality it is not.
If you like to know them and how to handle them, then this document is for you.
To “converge” in machine learning is to have an error so close to local/global minimum. Refer here to know criteria for convergence.
Unfortunately, gradients often get smaller and smaller as the algorithm progresses down to the lower layers. As a result, the Gradient Descent update leaves the lower layer connection weights virtually unchanged, and training never converges to a good solution. This is called the vanishing gradients problem.
In some cases, the opposite can happen: the gradients can grow bigger and bigger, so many layers get insanely large weight updates and the algorithm diverges. This is the exploding gradients problem, which is mostly encountered in recurrent neural networks.
Gradient problem
It happens due to improper learning rate value causing either convergence slow or never happening.
If loss function is convex, then algorithm will converge to global minima. However for non-convex function, it may convergence to local minima.
Modern ML architecture are designed to avoid it. Moreover, ReLu activation function avoids it. Note that ReLu will not solve the issue. Careful use of feature normalisation(refer here) can help as well.
Gradient problem
It is solved using optimiser(Refer here)
Refer here for detail about solving this.
Also, converting non-convex function to convex function is called convexifying. This discussion talks about solving this issue by convexifying using Lagrangian. This is another article for the same.
https://images.app.goo.gl/BjjNL1PmdNCJptaNA
https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291
https://www.linkedin.com/posts/dpkumar_optimization-machinelearning-datascience-activity-6751430811356147712-oYo6
https://docs.paperspace.com/machine-learning/wiki/convergence
https://www.linkedin.com/posts/dpkumar_convergence-machinelearningmodels-datasciences-activity-6769803533836521472-rjnu
https://images.app.goo.gl/CB3v4TAHe4TdNjXZ8
https://math.stackexchange.com/questions/2872861/how-to-convexify-a-non-convex-function
https://www.prateekjain.org/publications/all_papers/JainK17_FTML.pdf
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-normalisation-in-machine-learning#TOC-Relevance-in-neural-network