Accepted Papers
Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks
Manuela Girotti (Mila, Universite de Montreal, Concordia University)*; Ioannis Mitliagkas (Mila, University of Montreal); Gauthier Gidel (Mila, Université de Montréal)
FedNL: Making Newton-Type Methods Applicable to Federated Learning
Mher Safaryan (KAUST)*; Rustem Islamov (Moscow Institute of Physics and Technologies); Xun Qian (King Abdullah University of Science and Technology); Peter Richtarik (KAUST)
Training of residual networks with stochastic MG/OPT
Cyrill von Planta (Università della Svizzera Italiana)*; Rolf Krause (University of Lugano); Alena Kopanicakova (Università della Svizzera italiana)
EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization
Ondrej Bohdal (The University of Edinburgh)*; Yongxin Yang (University of Edinburgh ); Timothy Hospedales (Edinburgh University)
A FISTA-type average curvature accelerated composite gradient method for nonconvex optimization problems
Jiaming Liang (Georgia Institute of Technology)*
Paper
Newton-type Methods for Minimax Optimization
Guojun Zhang (University of Waterloo)*; Kaiwen Wu (University of Waterloo); Pascal Poupart (University of Waterloo); Yaoliang Yu (University of Waterloo)
Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation
Bingbin Liu (Carnegie Mellon University)*; Elan Rosenfeld (Carnegie Mellon University); Pradeep Ravikumar (Carnegie Mellon University); Andrej Risteski (CMU)
Information pruning: a regularization method based on a simple interpretability property of neural networks
Raphaël M.J.I. Larsen (Institut Mines-Télécom (IMT) Atlantique)*; Marc-Oliver Pahl (IMT Atlantique)
Bilevel Optimization: Convergence Analysis and Enhanced Design
Kaiyi Ji (The Ohio State University)*; Junjie Yang (The Ohio State University); Yingbin Liang (The Ohio State University)
Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning
Kaiyi Ji (The Ohio State University)*; Junjie Yang (The Ohio State University); Yingbin Liang (The Ohio State University)
Zeroth-Order Algorithms for Stochastic Nonconvex Minimax Problems with Improved Complexities
Zhongruo Wang (university of california, davis); Krishna Balasubramanian (University of California, Davis); Shiqian Ma (UC Davis)*; Meisam Razaviyayn (USC)
Optimizing Combinatorial and Non-decomposable Metrics with ExactBoost
Daniel Csillag (IMPA); Carolina Piazza (Princeton); Thiago Ramos (IMPA); João Vitor Romano (IMPA); Roberto Oliveira (IMPA); Paulo Orenstein (IMPA)*
Principled Curriculum Learning using Parameter Continuation Methods.
Harsh Nilesh Pathak (WORCESTER POLYTECHNIC INSTITUTE)*; Randy Paffenroth (Worcester Polytechnic Institute)
Computing the Newton-step faster than Hessian accumulation
Akshay Srinivasan (SonyAI)*; Emanuel Todorov (University of Washington)
Using Bifurcations for Diversity in Differentiable Games
Jonathan P Lorraine (University of Toronto)*; Jack Parker-Holder (University of Oxford); Paul Vicol (University of Toronto); Aldo Pacchiano (UC Berkeley); Luke Metz (Google Brain); Tal Kachman (Radboud university ); Jakob Foerster (Facebook AI Research)
On the Hardness of Computing Near-Approximate-Stationary Points of Clarke Regular Nonsmooth Nonconvex Problems and Certain DC Programs
Lai Tian (The Chinese University of Hong Kong)*; Anthony Man-Cho So (The Chinese University of Hong Kong)
Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size
Adityanarayanan Radhakrishnan (MIT)*; Mikhail Belkin (UC San Diego); Caroline Uhler (MIT)
Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization
Junxiang Wang (Emory University)*; Hongyi Li (Xidian University); Yongchao Wang (Xidian University); Liang Zhao (Emory university)
SLIM-QN: A Stochastic, Light, Momentumized Quasi-Newton Optimizer for Deep Neural Networks
Yue Niu (University of Southern California)*; Zalan Fabian (University of Southern California); Sunwoo Lee (University of Southern California); Mahdi Soltanolkotabi (University of Southern California); Salman Avestimehr (University of Southern California)
Optimizing interacting Langevin dynamics using spectral gaps
Anastasia Borovykh (Imperial College)*; Nikolas Kantas (Imperial College London); Panos Parpas (Imperial College London); Grigorios Pavliotis (Imperial College London)
When are Iterative Gaussian Processes Reliably Accurate?
Wesley Maddox (New York University)*; Sanyam Kapoor (New York University); Andrew Gordon Gordon Wilson (New York University)
On the Oracle Complexity of Higher-Order Smooth Non-Convex Finite-Sum Optimization
Nicolas Emmenegger (ETH Zürich)*; Rasmus Kyng (ETH Zurich); Ahad N. Zehmakan (ETH Zurich)
On Second-order Optimization Methods for Federated Learning
Sebastian Bischoff (Technical University of Munich)*; Stephan Günnemann (Technical University of Munich); Martin Jaggi (EPFL); Sebastian Stich (EPFL)
A Variable Sample-size Stochastic Quasi-Newton Method for Smooth and Nonsmooth Stochastic Convex Optimization
Afrooz Jalilzadeh (University of Arizona )*; Uday V Shanbhag (Pennsylvania State University); Farzad Yousefian (Oklahoma State University); Angelia Nedich (Arizona State University)
On Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
Ryan D'Orazio ( Université de Montréal)*; Nicolas Loizou ( Mila, Université de Montréal ); Issam Hadj Laradji (McGill); Ioannis Mitliagkas (Mila, Université de Montréal)
Faster Convergence of AdaGrad with Shuffling in Non-convex Over-parametrized Models
Jiaqi Zhang (Massachusetts Institute of Technology)*; Xunpeng Huang (ByteDance. Inc); Hao Zhou (Bytedance); Lei Li (ByteDance AI Lab)
Efficient Optimal Transport Algorithm by Accelerated Gradient descent
Dongsheng An (Stony Brook University)*; Na Lei (Dalian University of Technology); Xianfeng GU (Stony Brook University)
Regularized Newton Method with Global O(1/k^2) Convergence
Konstantin Mishchenko (KAUST)*
COLA: Consistent Learning with Opponent-Learning Awareness
Timon Willi (University of Toronto)*; Johannes Treutlein (University of Toronto); Alistair HP Letcher (None); Jakob Foerster (Facebook AI Research)
On Generalization and Stability of Natural Gradient Langevin Dynamics
Hanif Amal Robbani (University of Indonesia)*; Junaidillah Fadlil (Samsung R&D Institute Indonesia); Risman Adnan (Samsung R&D Institute Indonesia)
Nonconvex Min-Max Bilevel Optimization for Task Robust Meta Learning
Alex Gu (MIT)*; Songtao Lu (IBM Research); Parikshit Ram (IBM Research AI); Lily Weng (MIT)
Structured second-order methods via natural-gradient descent
Wu Lin (UBC)*; Frank Nielsen (Sony CS Labs Inc.); Khan Mohammad Emtiyaz (RIKEN); Mark Schmidt (University of British Columbia)
Spacetime Neural Network for High Dimensional Quantum Dynamics
Jiangran Wang (University of Illinois at Urbana Champaign)*; Zhuo Chen (University of Illinois, Urbana-Champaign); Di Luo (University of Illinois, Urbana-Champaign); Zhizhen Zhao (University of Illinois at Urbana-Champaign); Vera Hur (University of Illinois, Urbana-Champaign); Bryan Clark (University of Illinois, Urbana-Champaign)
Nonlinear Least Squares for Large-Scale Machine Learning using Stochastic Jacobian Estimates
Johannes J Brust (University of California, San Diego (formerly Argonne Natl. Lab.))*
An Adaptive Heavy-Ball Method
Samer S Saab Jr (The Pennsylvania State University)*; Shashi Phoha (The Pennsylvania State University); Minghui Zhu (The Pennsylvania State University); Asok Ray (The Pennsylvania State University)
Implicit Regularization in Overparameterized Bilevel Optimization
Paul Vicol (University of Toronto)*; Jonathan P Lorraine (University of Toronto); David Duvenaud (University of Toronto); Roger B Grosse (University of Toronto)