Optimization in Deep Networks: Beyond Backprop

Alternating minimization, EM etc.

Miguel A. Carreira-Perpinan, Weiran Wang. Distributed Optimization of Deeply Nested Systems,    AISTATS 2014   [suppl material] [slides] [video] [poster] [Matlab code (coming soon)]

Gavin Taylor, Ryan Burmeister, Zheng Xu, Bharat Singh, Ankit Patel, Tom Goldstein. Training Neural Networks Without Gradients:  A Scalable ADMM Approach. Proc. ICML 2016   

Ankit B. PatelMinh Tan NguyenRichard Baraniuk. A Probabilistic Framework for Deep Learning, Proc. NIPS 2016.

I. Rish, G. Grabarnik, G. Cecchi, F. Pereira, GJ Gordon. Closed-form supervised dimensionality reduction with generalized linear models. ICML 2008.

G. Zhang & W. B. Kleijn. Training Deep Neural Networks via Optimization Over Graphs. Arxiv


Sachin Ravi, Hugo Larochelle.
Optimization as a Model for Few-Shot Learning

Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio. Mollifying Networks

Alternating minimization and Dictionary learning (below and here as well Sparse Coding and Dictionary Learning )

Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux: Dictionary Learning for Massive Matrix Factorization. ICML 2016 [code]