Thomas Winckelman Professional Site

I might call my thesis something like "In Search of a Unifying Theory (and Practice) of Non-Linear Approximation." Many methods fall under the umbrella of “non-linear approximation.” Some from statistics and signal processing are quite well understood, like compressive sensing, low-rank matrix recovery, and the more recent "super-resolution." Meanwhile, others from machine learning are less understood and thus even more interesting, like kernel-based methods, Gaussian mixture models, and shallow neural networks.

Researchers in these areas are mostly aware of the fact that these methods are intuitively analogous. However, theoretical results linking them in a concrete manner are mysteriously absent. My work aims to build a theoretical framework which is sufficiently general that results -- and, more importantly, tools -- from the well-understood cases can be translated to create new results/tools in the less understood cases, where such advancements are sorely needed.

Let's talk about this in the context of neural networks. Basically, my advisor and I are trying to redesign neural networks from square one, reformulating them to be more well behaved by borrowing ideas from signal processing. If successful, this would result in the first ever theoretical approach to training a neural network, in contrast to the contemporary reliance exclusively on empirical methods. This is important partially because models are more trustworthy when supported by theory. However, what's even more important is that empirical techniques have a variable cost (each model you want to fit needs its hyper-parameters tuned) whereas theory-building has a fixed cost. This means that designing mathematically (i.e., theoretically) tractable methods can be more efficient in the long run. To give an example from machine learning, imagine you could come up with a mathematical formula for the "optimal" learning rate. Then, no one would need to tune that hyper-parameters ever again, resulting in a lot of time and money saved, over the years.

Instead of deriving a formula for the optimal learning rate, my advisor and I have done something that I feel is even more promising: we've been able to redesign the very simplest ReLU networks such that, not only is their error on unseen data provably tied to their training error but, moreover, they can be trained using an algorithm that does not have hyper-parameters (unpublished). To be clear, this result is not yet developed enough to be useful in practice. However, it is a good example of the type of result my research aims to prove. If you want to help me "scale up" such results, please reach out to me, as I need someone to carry the torch after I sell out and get an industry job.

While neural networks may sound trendy, in reality my work on neural networks is extremely unfashionable within the social norms of the machine learning community. My research is not interested in chatbots or transformers. Rather, to the extent that my research studies neural networks at all, it studies the "ancient ancestors" (from >30 years ago, which is ancient by comp sci standards) of the models that people are using nowadays. Even those older, simpler models are still not mathematically understood.

Finally, just to reiterate, my research is not solely focused on neural networks. For instance, some of my scratch work suggests that "making kernel-based methods more powerful" may be easier than "making neural networks more stable." The bigger picture is "non-linear approximation."