The Grand Vision:
I might call my thesis something like "In Search of a Unifying Theory (and Practice) of Non-Linear Approximation." Many methods fall under “non-linear approximation.” Some from statistics and signal processing, like compressive sensing, low-rank matrix recovery, and "super-resolution," are well understood. Meanwhile, others from machine learning like kernel approximation, Gaussian mixture models, and shallow neural networks are less understood and even more interesting.
Researchers in these areas (aside from applied machine learning engineers, who don't care) are mostly aware of the fact that these methods are intuitively analogous. However, theoretical results linking them in a concrete manner are mysteriously absent. My work aims to build a theoretical framework which is a sufficiently general that results -- and, more importantly, tools -- from the well-understood cases can be translated to create new results/tools in the less understood cases, where such advancements are sorely needed. Essentially, we're trying to redesign neural networks from square one, borrowing ideas from signal processing to reformulate them in a manner that they'll be more well-behaved.
If successful, this would result in the first ever theoretical approach to training a neural network, in contrast to the contemporary reliance exclusively on empirical methods. This is important, not only because models are more trustworthy when supported theoretically, but also because empirical methods are a variable cost (each model you want to fit needs its hyper-parameters tuned) whereas theory-building is a fixed cost, which is more efficient in the long run (e.g., if you could come up with a formula for the "optimal" learning rate, then no one would need to tune it ever again, resulting in a lot of time and money saved as years go on). In fact, we have already solved this problem for the very simplest ReLU networks (unpublished).