Beyond First order methods in ML Systems
Friday, July 17th - Timezone: PDT
Description: In the last few decades, much effort has been devoted to the development of first-order methods. These methods enjoy a low per-iteration cost and have optimal complexity, are easy to implement, and have proven to be effective for most machine learning applications.
In contrast, higher-order methods, such as Newton, quasi-Newton and adaptive gradient descent methods, are extensively used in many scientific and engineering domains. At least in theory, these methods possess several nice features: they exploit local curvature information to mitigate the effects of ill-conditioning, they avoid or diminish the need for hyper-parameter tuning, and they have enough concurrency to take advantage of distributed computing environments. However, often higher-order methods are “undervalued.”
This workshop will attempt to shed light on this statement. Topics of interest include, but are not limited to, second-order methods, adaptive gradient descent methods, regularization techniques, as well as techniques based on higher-order derivatives. This workshop will bring machine learning and optimization researchers closer, in order to facilitate a discussion with regards to underlying questions such as the following:
Why are they not omnipresent?
Why are higher-order methods important in machine learning?
What advantages can they offer? What are their limitations and disadvantages?
How should (or could) they be implemented in practice?