Accepted papers

1. Adaptive diagonal curvature: a quasi-newton method for stochastic optimization. David Saxton (DeepMind); Eshaan Nichani (MIT)

Paper Video

2. Scalable Derivative-Free Optimization for Nonlinear Least-Squares Problems. Lindon Roberts (Australian National University); Coralia Cartis (Oxford University); Tyler Ferguson (Oxford University)

Paper Video

3. A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees. Michael O'Neill (University of Wisconsin-Madison), Stephen Wright (University of Wisconsin-Madison)

Paper Supplementary Video

4. A stochastic cubic regularisation method with inexact function evaluations and random derivatives for finite sum minimisation. Gianmarco Gurioli (Università degli Studi di Firenze); Stefania Bellavia (Università degli Studi di Firenze); Benedetta Morini (Università degli Studi di Firenze); Philippe L. Toint (University of Namur)

Paper

5. Deep Learning with a Stochastic Quasi-Gauss-Newton Method. Christopher Thiele (Rice University); Mauricio Araya-Polo (Total E&P RT); Detlef Hohl (Shell International Exploration and Production, Inc.)

Paper Video

6. A Distributed Cubic-Regularized Newton Method for Smooth Convex Optimization over Networks. Cesar A Uribe (Massachusetts Institute of Technology); Ali Jadbabaie (Massachusetts Institute of Technology)

Paper Video

7. Analysis of SGD with Biased Gradient Estimators. Ahmad Ajalloeian (University of Lausanne); Sebastian Stich (EPFL)

Paper Video

8. * A Second-Order Optimization Algorithm for Solving Problems Involving Group Sparse Regularization. Daniel P Robinson (Lehigh University); Frank Curtis (Lehigh); Yutong Dai (Lehigh University)

Paper Video (at 2:57:20)

9. Distributed Newton Method Over Graphs: Can Sharing of Second-order Information Eliminate the Condition Number Dependence? Erik J Berglund (KTH Royal Institute of Technology); Sindri Magnusson (KTH)

Paper Supplementary Video

10. Sketched Newton-Raphson. Rui YUAN (Facebook AI Research); Alessandro Lazaric (FAIR); Robert M Gower (Telecom Paris Tech)

Paper Video

11. Automatic Differentiation Friendly Complexity Guarantees. Vincent Roulet (University of Washington); Zaid Harchaoui (University of Washington)

Paper Supplementary Video

12. Hyperfast Second-Order Method for Distributed Convex Optimization. Pavel Dvurechensky (Weierstrass Institute ); Dmitry Kamzolov (Moscow Institute of Physics and Technology); Soomin Lee (Yahoo! Research); Erik Ordentlich (Verizon Media); Cesar A Uribe (Massachusetts Institute of Technology); Alexander Gasnikov (Moscow Institute of Physics and Technology)

Paper Video

13. Sparse Communication for Training Deep Networks. Negar Foroutan (EPFL); Martin Jaggi (EPFL)

Paper

14. TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning? Joshua Romoff (McGill University); Peter Henderson (Stanford University); David Kanaa (Mila, University of Montreal); Emmanuel Bengio (McGill University); Ahmed Touati (MILA); Pierre-Luc Bacon (Stanford University); Joelle Pineau (McGill / Facebook)

Paper Video

15. Complexity of Projected Newton Methods for Bound-Constrained Optimization. Yue Xie (Wisconsin Institute for Discovery); Stephen J Wright (University of Wisconsin-Madison)

Paper Video

16. Adaptive Braking for Mitigating Gradient Delay. Abhinav Venigalla (Cerebras); Atli Kosson (Cerebras); Vitaliy Chiley (Cerebras); Urs Koster (Cerebras Systems)

Paper

17. Tunable Subnetwork Splitting for Model-parallelism of Neural Network Training. Junxiang Wang (George Mason University); Zheng Chai (George Mason University); Yue Cheng (George Mason University); Liang Zhao (George Mason university)

Paper Video

18. * Ridge Riding: Finding diverse solutions by following eigenvectors of the Hessian. Jack Parker-Holder (University of Oxford); Cinjon Resnick (NYU); Luke Metz (Google Brain); Hengyuan Hu (Facebook); Adam Lerer (Facebook AI Research); Alistair HP Letcher (None); Alex Peysakhovich (Facebook AI); Aldo Pacchiano (UC Berkeley); Jakob Foerster (Facebook AI Research)

Paper Supplementary Video (at 3:06:53)

19. A High Probability Analysis of Adaptive SGD with Momentum. Xiaoyu Li (Boston University); Francesco Orabona (Boston University)

Paper Video

20. A Randomised Subspace Gauss-Newton Method for Nonlinear Least-Squares. Coralia Cartis (Oxford University); Jaroslav Fowkes (University of Oxford); Zhen Shao (University of Oxford)

Paper

21. Dimensionality reduction techniques for global optimization of functions with low effective dimensionality. Adilet Otemissov (University of Oxford and the Alan Turing Institute); Coralia Cartis (Oxford University); Estelle Massart (University of Oxford - NPL)

Paper Video

22. * PyHessian: Neural Networks Through the Lens of the Hessian. Zhewei Yao (University of California, Berkeley); Amir Gholami (UC Berkeley); Kurt Keutzer (UC Berkeley); Michael Mahoney ("University of California, Berkeley")

Paper Video (at 3:18:10)

23. Smoothing of point clouds using Riemannian optimization. Florentin Goyens (University of Oxford); Stephane Chretien (National Physical Laboratory); Coralia Cartis (Oxford University)

Paper

24. A Multilevel Approach to Training. Vanessa Braglia (Università della Svizzera italiana); Alena Kopanicakova (Università della Svizzera italiana); Rolf Krause (University of Lugano)

Paper Supplementary Video

25. Input Hessian Regularization of Neural Networks. Waleed Mustafa (TU Kaiserslautern); Robert A Vandermeulen (Technische Universität Berlin); Marius Kloft (University of Southern California)

Paper

26. Sparse sketching for sparse linear least squares. Zhen Shao (University of Oxford); Coralia Cartis (Oxford University); Jan Fiala (Numerical Algorithm Group Ltd.)

Paper

27. To interact or not? The convergence properties of interacting stochastic mirror descent. Anastasia Borovykh (Imperial College); Nikolas Kantas (Imperial College London); Panos Parpas (Imperial College London); Grigorios Pavliotis (Imperial College London)

Paper Video

28. Stochastic Recursive Variance-Reduced Cubic Regularization Methods. Dongruo Zhou (UCLA); Quanquan Gu (University of California, Los Angeles)

Paper

29. Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent. Bao Wang (University of Utah); Tan Minh Nguyen (Rice University); Tao Sun (National U. of Defense Tech.); Andrea L. Bertozzi (UCLA); Richard Baraniuk (Rice University); Stanley Osher (UCLA)

Paper

30. Near-Optimal Hyperfast Second-Order Method for Convex Optimization. Dmitry Kamzolov (Moscow Institute of Physics and Technology); Alexander Gasnikov (Moscow Institute of Physics and Technology)

Paper

31. Near-optimal tensor methods for minimizing the gradient norm of convex functions. Pavel Dvurechenskii (WIAS Germany); Alexander Gasnikov (Moscow Institute of Physics and Technology); Peter Ostroukhov (Moscow Institute of Physics and Technology); Cesar A Uribe (Massachusetts Institute of Technology); Anastasiya Ivanova (HSE)

Paper Supplementary Video

32. * MomentumRNN: Integrating Momentum into Recurrent Neural Networks. Tan Minh Nguyen (Rice University); Richard Baraniuk (Rice University); Andrea L. Bertozzi (UCLA); Stanley Osher (UCLA); Bao Wang (University of Utah)

Paper Video (at 5:59:25)

33. Distributed Newton Can Communicate Less and Resist Byzantine Workers. Raj Kumar Maity (University of Massachusetts Amherst); Avishek Ghosh (University of California, Berkeley); Arya Mazumdar (University of Massachusetts Amherst )

Paper

34. Inverse classification with logistic and softmax classifiers: efficient optimization. Miguel A Carreira-Perpinan (UC Merced); Suryabhan Singh Hada (UC Merced)

Paper

35. * Step-size Adaptation Using Exponentiated Gradient Updates. Ehsan Amid (UCSC & Google); Rohan Anil (Google Brain); Christopher Fifty‎ (Google Brain); Manfred K. Warmuth (UC Santa Cruz & Google Inc.)

Paper Video (at 6:09:53)

36. Globally Optimal Training of Two-layer Neural Networks. Mert Pilanci (Stanford); Tolga Ergen (Stanford University)

Paper Poster

37. Escaping Saddle Points in Ill-Conditioned Matrix Completion with a Scalable Second Order Method. Christian Kümmerle (Johns Hopkins University); Claudio M. Verdun (Technical University of Munich)

Paper Video

38. Newton Dual Extrapolation for Non-monotone Variational Inequality. Chaobing Song (Tsinghua University); Yong Jiang (Tsinghua University); Yi Ma (UC Berkeley)

Paper

39. *Competitive Mirror Descent. Florian T Schaefer (Caltech); Animashree Anandkumar (Caltech); Houman Owhadi (Caltech)

Paper Video (at 6:16:56)

40. Phase Retrieval via Second-Order Nonsmooth Optimization. Zhong Zhuang (University of Minnesota); Gang Wang (University of Minnesota); Yash Travadi (University of Minnesota); Ju Sun (University of Minnesota)

Paper Video

41. An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions. Jannis Kurtz (RWTH Aachen University); Bubacarr Bah (AIMS South Africa Research Centre)

Paper