With the rapid development of the Internet and hardware technology, the scale of data is now rapidly increasing. The scale of large-scale machine learning and deep learning problems has also become larger and larger. Therefore, traditional optimization algorithms, such as gradient descent and Newton's method, can no longer be applied to large-scale problems efficiently.
Stochastic optimization algorithms, such as the stochastic gradient descent algorithm, calculate the gradient by randomly selecting a sample or mini-batch sample, which can effectively reduce the computational cost of the algorithm execution process.
However, due to the stochasticity of the gradient calculation, such an algorithm will introduce the variance of the gradient estimation, and the variance will usually be large. In order to further reduce the variance of the stochastic gradient and speed up the convergence speed, generally the variance is reduced to obtain an improved algorithm for stochastic gradient descent.
Stochastic optimization algorithms have been widely applied to large-scale machine learning and deep learning problems. However, in practical problems, there are often missing or noisy data, which affects the convergence speed and accuracy of the optimization algorithm. Thus, for such problems with missing data or noise-polluted data, it is meaningful to research how to design effective stochastic optimization algorithms.
Differentiable programming algorithms combine traditional machine learning optimization algorithms with deep networks. Its main idea is to expand the traditional optimization algorithm into the form of a neural network, and train some of its parameters through backpropagation.
This effectively speeds up the convergence speed of traditional optimization algorithms, so that algorithms that originally require thousands of iterations to achieve convergence can achieve the same convergence results through more than a dozen layers of networks.
In addition, because the differentiable programming network is obtained by expanding the traditional machine learning optimization algorithm into a network, it also improves the interpretability of the deep network to a certain extent.
With the rapid development of the Internet and hardware technology, the scale of data is now rapidly increasing. The scale of large-scale machine learning and deep learning problems has also become larger and larger. Traditional optimization algorithms such as gradient descent method can only meet the solution of conventional problems, such as Generative Adversarial Networks and Neural Architecture Search. But when the problem scale becomes larger, ordinary optimization algorithms will face many limitations, especially in the solution of large-scale machine learning problems, Bilevel optimization will occupy an irreplaceable position.
In recent years, with the rapid development of artificial intelligence fields such as machine learning, deep learning, data mining, computer vision, and intelligent signal processing driven by big data, large-scale Bilevel optimization has been successfully applied to solve various practical problems.