Modern deep neural networks involve billions of parameters and appear to benefit greatly from scaling up. Many techniques have been discovered empirically that allow to train large models faster and better. I am interested in developing a phenomenological approach to deep learning that would allow to explain quantitatively why some popular practices are effective, and guide discovery and design of new architectures that can be trained better, faster, cheaper.