For Class
ADAM: A method for stochastic optimization. D. Kingma, J. Ba. ICLR15. Ronghang Hu
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.S. Ioffe and C. Szegedy. ICML15 Samaneh Azadi
Generative Adversarial Nets I. Goodfellow et al. NIPS 2014 Jeff Donahue
For background on stochastic gradient descent feel free to check out Stochastic Gradient Tricks by Bottou.
Organizer: Evan Shelhamer
To read in your copious free time or hope that we cursorily cover if we have time:
Hyperparameter Optimization
Gradient-based Hyperparameter Optimization through Reversible Learning. ICML15
Scalable Bayesian Optimization Using Deep Neural Networks. ICML15
2nd Order
Deep Learning via Hessian-free Optimization J. Martens. ICML10
Natural Neural Networks G. Desjardins et al. NIPS15.
Optimizing Neural Networks with Kronecker-factored Approximate Curvature. J. Martens and R. Grossse. ICML15.