Theory of Learnability

Theoretical Understanding of (Deep) Learnability:

I believe that finding a small model is closely connected with learning the most accurate model. This is because the model can be viewed as a compressor of the training data (for unsupervised learning the Kolmogorov Complexity is the best model that produces the data). Thus there is a tight connection between compressing input data, small models and learnability of functions.

An important question is which class of functions are learnable by deep learning. Certainly not all functions can be learn't -- for example cryptographic hash functions can't be learn't by any algorithm let alone deep learning. Thus it is important to understand what it is about natural real world functions that seem to make then learnable via deep learning. Here are some of my papers that investigate classes of functions that can be learn't by deep learning.


Learning Polynomials with Neural Networks

Convergence Results for Neural Networks via Electrodynamics

Recovering the Lowest Layer of a Network with high threshold activations

Sparse Matrix Factorization

The Mind Grows Circuits

Other papers on Learnability (warning: personal favorites):

Provable Bounds for Learning Some Deep Representations

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Matrix Completion has No Spurious Local Minimum

Convergence Analysis of Two-layer Neural Networks with ReLU activation