video lectures
7.307:40

Opening Remarks video

Organizers

7:408:20

Keynote: Convex Relaxations for Probabilistic Models with Latent Variables video slides

Francis Bach

8:208:40

Contributed talk: Fast Deterministic Dropout Training video slides paper

Sida Wang

8:408:50

Contributed short talk: Sparse Gaussian Conditional Random Fields video slides paper

Matt Wytock

8:509:00

Contributed short talk: Improving
Training Time of Deep Belief Networks Through Hybrid PreTraining And
Parallel Stochastic Gradient Descent video slides paper

Tara N. Sainath

9:009:30

Coffee Break and Poster Session

see contributed talks + Avishi Carmi

9:3010:10

Keynote: A Deep Architecture Incorporating Kernel Learning slides

Li Deng

10:10 – 10:40

Keynote: Online Algorithms for Exponential Family Models, with Application to Speech Processing

Fei Sha

10:4015:30

Ski Break


15:3016:10

Keynote: Newton Methods for Large Scale Optimization of Matrix Functions video slides

Peder Olsen

16:1016:20

Contributed short talk: Second Order Methods for Sparse Inverse Covariance Clustering video slides poster

Steven Rennie

16:2017:00

Keynote: Stochastic Approximation and Fast MessagePassing in Graphical Models video slides

Martin
Wainwright

17:0017:30

Coffee break and Poster Session

see contributed talks

17:3018:10

Keynote: LogLinear Modelling in Human Language Technology slides

Hermann Ney

18:1018:20

Contributed short talk: Smoothing
Dynamic Systems With StateDependent Covariance Matrices video slides paper

James Burke

18:2018:40

Contributed talk: Exploiting Convexity for Largescale Loglinear Model Estimation video slides paper

Theodoros Tsiligkaridis

Overview
Exponential functions are core mathematical constructs that are the key to many important applications, including speech recognition, patternsearch and logistic regression problems in statistics, machine translation, and natural language processing. Exponential functions are found in exponential families, loglinear models, conditional random fields (CRF), entropy functions, neural networks involving sigmoid and soft max functions, and Kalman filter or MMIE training of hidden Markov models. Many techniques have been developed in pattern recognition to construct formulations from exponential expressions and to optimize such functions, including growth transforms, EM, EBW, Rprop, bounds for loglinear models, largemargin formulations, and regularization. Optimization of loglinear models also provides important algorithmic tools for machine learning applications (including deep learning), leading to new research in such topics as stochastic gradient methods, sparse / regularized optimization methods, enhanced firstorder methods, coordinate descent, and approximate secondorder methods. Specific recent advances relevant to loglinear modeling include the following.  Effective optimization approaches, including stochastic gradient and Hessianfree methods.
 Efficient algorithms for regularized optimization problems.
 Bounds for loglinear models and recent convergence results.
 Recognition of modeling equivalences across different areas, such as the equivalence between Gaussian and loglinear models/HMM and HCRF, and the equivalence between transfer entropy and Granger causality for Gaussian parameters.
Though exponential functions and loglinear models are well established, research activity remains intense, due to the central importance of the area in frontline applications and the rapid expanding size of the data sets to be processed. Fundamental work is needed to transfer algorithmic ideas across different contexts and explore synergies between them, to assimilate the influx of ideas from optimization, to assemble better combinations of algorithmic elements for tackling such key tasks as deep learning, and to explore such key issues as parameter tuning. The workshop will bring together researchers from the many fields that formulate, use, analyze, and optimize loglinear models, with a view to exposing and studying the issues discussed above. Topics of possible interest for talks at the workshop include, but are not limited to, the following:  Loglinear models.
 Using equivalences to transfer optimization and modeling methods across different applications and different classes of models.
 Comparison of optimization / accuracy performance of equivalent model pairs.
 Convex formulations.
 Bounds and their applications.
 Stochastic gradient, firstorder, and approximatesecondorder methods.
 Efficient nonGaussian filtering approach (that exploits equivalence of Gaussian generative and loglinear models and projecting on exponential manifold of densities).
 Graphic and Network inference models.
 Missing data and hidden variables in loglinear modeling.
 Semisupervised estimation in loglinear modeling.
 Sparsity in loglinear models.
 Block and novel regularization methods for loglinear models.
 Parallel, distributed and largescale methods for loglinear models.
 Information geometry of Gaussian densities and exponential families.
 Hybrid algorithms that combine different optimization strategies.
 Connections between loglinear models and deep belief networks.
 Connections with kernel methods.
 Applications to speech / naturallanguage processing and other areas.
 Empirical contributions that compare and contrast different approaches.
 Theoretical contributions that relate to any of the above topics.
