Handling practical issues in ML convergence

Introduction

Laymen explanation

A model converges when additional training will not improve the model. There will be cases when we may perceive that convergence happened, although in reality it is not.

If you like to know them and how to handle them, then this document is for you.

Technical explanation

To “converge” in machine learning is to have an error so close to local/global minimum. Refer here to know criteria for convergence.

Practical issues

Vanishing/Exploding gradient problem

Unfortunately, gradients often get smaller and smaller as the algorithm progresses down to the lower layers. As a result, the Gradient Descent update leaves the lower layer connection weights virtually unchanged, and training never converges to a good solution. This is called the vanishing gradients problem.

In some cases, the opposite can happen: the gradients can grow bigger and bigger, so many layers get insanely large weight updates and the algorithm diverges. This is the exploding gradients problem, which is mostly encountered in recurrent neural networks.

Gradient problem

It happens due to improper learning rate value causing either convergence slow or never happening.

Non-convex cost function affecting convergence

If loss function is convex, then algorithm will converge to global minima. However for non-convex function, it may convergence to local minima.

Solutions to above issues

Vanishing/Exploding gradient problem

Modern ML architecture are designed to avoid it. Moreover, ReLu activation function avoids it. Note that ReLu will not solve the issue. Careful use of feature normalisation(refer here) can help as well.

Gradient problem

It is solved using optimiser(Refer here)

Non-convex cost function affecting convergence

Refer here for detail about solving this.

Also, converting non-convex function to convex function is called convexifying. This discussion talks about solving this issue by convexifying using Lagrangian. This is another article for the same.

Reference

https://images.app.goo.gl/BjjNL1PmdNCJptaNA

https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291

https://www.linkedin.com/posts/dpkumar_optimization-machinelearning-datascience-activity-6751430811356147712-oYo6

https://docs.paperspace.com/machine-learning/wiki/convergence

https://www.linkedin.com/posts/dpkumar_convergence-machinelearningmodels-datasciences-activity-6769803533836521472-rjnu

https://images.app.goo.gl/CB3v4TAHe4TdNjXZ8

https://math.stackexchange.com/questions/2872861/how-to-convexify-a-non-convex-function

https://www.prateekjain.org/publications/all_papers/JainK17_FTML.pdf

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-normalisation-in-machine-learning#TOC-Relevance-in-neural-network

Page updated

Google Sites

Report abuse