About Me:
In August 2021, I entered into the mathematics PhD program at Texas A&M where I'm supported by a university merit fellowship. A few points of order:
My advisor is professor Simon Foucart.
I'll graduate no earlier than May 2026, although May 2027 is more likely.
You can reach me by email at winckelman@tamu.edu
Basically, my goal is to study machine learning from a theoretical and mathematical perspective. The holy grail would be to "do for machine learning what Kolmogorov did for statistics," though, of course, that's not a realistic goal as stated. For more, see "About My Research" below.
I'm increasingly prioritizing software development as I anticipate transitioning into industry after my PhD. I'll always have much to learn, but my foundations are very solid in real analysis and probability theory, as well as adequate in the standard tools from statistics and computer science, both of which my math background helps me pick up quickly. I also have a degree in economics and some work experience in business and agriculture. At networking events, I like to ask people "do you want to predict or estimate anything? [and they obviously respond 'yes'] Well, then I can help you."
On a personal level, all I'll say here is that I'm a super conversational person. Say hi!
About Data Science:
I'm not a source of authority on the matter, nor am I an expert (although, I'm studying to be one), but these are my unofficial impressions, which officially reflect no one's views on anything, other than my own. Please pardon the inevitable generalizations. Most people who say they "do data science" are, either, developing algorithms or implementing algorithms. However, mostly in academia, a third group is trying to understand why existing algorithms work (as a mathematician would say, "establish error bounds" or "prove convergence").
Our approach to data science has been much like our ancestors' approach to fire. We've discovered its usefulness before discovering why it works. So far, what can be said within the bounds of theoretical credibility is very limited. Drawing a conclusion based on the prediction of a neural network would, arguably, be like inferring causation based on correlation. A few problems with this are the following:
For agents who can tolerate the risk of an incorrect conclusion, quantifying the risk of error is typically extremely important and, so far, we're not good at doing this. Common practice is to measure the rate of error on validation data, but this is somewhat ad hoc and, subjectively, not very satisfying from a theoretical perspective (there are good reasons why statisticians historically used p-value, rather than "validation").
For agents who virtually cannot tolerate risk, the available data science algorithms are far too limited. One classic example of such an agent is a civil engineer who's designing a bridge.
Because theory is absent, in order to actually engineer machine learning algorithms that work well in practice, it is necessary to rely to some degree on trial and error, which is inefficient. Unlike in classical statistics where your model is certified by theory, in modern machine learning, development time is drawn out by the need to run a portfolio of simulations to verify whether or not the model is credible/reliable, resulting in slow and expensive product development whenever AI is involved.
Said reliance on trial and error, also, makes it very difficult to fine tune the behavior of machine learning models. Blind compute force is just too blunt an instrument to achieve very high precision.
The good news is that, at bare minimum, data science is already a powerful brainstorming tool, which can be used to detect subtle patterns, and then we can defer to a specialist to investigate these patterns. This is already safe and useful, provided we rely only on the expert to draw conclusions, and not on the data science algorithm, itself.
The bad news is that data science (usually) cannot "safely" be used for much beyond brainstorming. My area of research is the aforementioned third, neglected area of trying to improve our understanding of what data science algorithms really "tell us." Historically, the short answer is "not much." No wonder this pessimistic area of research has been neglected! However, many of us are still interested in trying to discover inherent meaning in existing data science algorithms and, failing that, to invent "better" data science tools: ones which do have inherent meaning.
About My Research:
In order to train a neural network (i.e., fit a neural network to some data), you have to "solve" a math problem, which can be simply explained as minimizing the loss function. It's bad enough that this a difficult problem to solve (because the loss function is non-convex) but, worse yet, it's the wrong problem entirely! Indeed, most architectures are capable of achieving a "training loss" of zero. However, this would almost certainly entail overfitting. Therefore, one secretly hopes to not exactly minimize the loss function, relying on a validation dataset to assess whether or not the loss has gotten "too small."
Suffice it to say, the whole thing is just highly ill-posed from a mathematical viewpoint. In practice, this ill-posedness means there isn't a reliable recipe for training an AI: it's not as concrete as solving a math problem. Instead, experts with years of experience in hyperparameter tuning and whatnot need to convince the problem to not exactly solve itself in precisely the right manner for the neural network to actually work. Because of this, machine learning sometimes feels like more of an art than a science. The cost of development is high, due to the necessity of expert guidance, and there can always be unexpected failure cases, due to the mathematical ill-posedness.
My advisor has proposed a more difficult math problem than the one currently in use. Loosely speaking, the problem is more difficult because it is well-posed: cutting corners isn't allowed. This would result in more trustworthy, reliable AI if you solved it instead of the conventional problem. The issue is... it's a really hard math problem! If I could solve the problem for shallow ReLU networks, that would be my PhD thesis. For context, shallow ReLU networks are, arguably, the simplest neural networks that are powerful enough for basic science/enterprise purposes (recall: they're "powerful enough" to be universal approximators).