Research

My current research

My research mainly focuses on understanding the inductive bias of overparametrized deep neural networks, especially in the context of multi-task learning, feature learning, and transfer learning. In the case of L2-regularized parameters, we have proven a theorem that helps to understand the inductive bias towards multi-task learning of deep infinitely wide ReLU-NNs (https://arxiv.org/abs/2112.15577). The proof of this theorem led us to the discovery of a fast, almost lossless compression method for ReLU NNs, which we have not tested well enough in practice yet (https://openreview.net/pdf?id=9GUTgHZgKCH).

Further, we developed a method to estimate the model uncertainty of NN’s prediction (https://proceedings.mlr.press/v162/heiss22a.html), and we are combining epistemic and aleatoric uncertainty in a novel way to obtain better prediction intervals (https://arxiv.org/abs/2507.08150).

We repeatly improved the SOTA of multiple market design (combinatorial auction) benchmarks 1st by developing a NN architecture that enforces monotonicity constraints and implementing it in an auction mechanism (https://www.ijcai.org/proceedings/2022/0077.pdf) and 2nd by promoting exploration into our auction mechanism in Bayesian optimization fashion by using our estimation of epistemic uncertainty (https://doi.org/10.1609/aaai.v37i5.25726), where our simulations suggest that revenue could be increased by more than 200 million USD for an auction comparable to the Canadian 4G spectrum auction, but it is still a very long way until this mechanism could be implanted in such large auctions. Then, we modified our mechanism to use more practical demand queries instead of value queries (https://doi.org/10.1609/aaai.v38i9.28850). Recently, we substantially improved the performance by combining demand queries and value queries in an ML-based combinatorial auction (https://icml.cc/virtual/2025/oral/47260).

In another project, we extended the theory and methodology for Path-Dependent Neural Jump ODEs to deal with noisy, irregularly observed time series (https://openreview.net/forum?id=0T2OTVCCC1, https://doi.org/10.1515/strm-2025-0001).

Unpublished projects I am working on right now: Outlier-robust NNs; Deep probabilistic calibration of financial models; better theoretical understanding of the Cold Posterior phenomena of Bayesian Neural Networks; Further improving theoretical understanding of the inductive bias of various ML methods (e.g., extending https://www.research-collection.ethz.ch/entities/publication/1cdcda72-3b0d-4390-94be-a4026438c505); extending our techniques for combinatorial auctions.

My opinion on the generalization strengths of deep neural networks

I am very excited about the empirical fact that deep learning methods can generalize surprisingly well to unseen data points. I am extremely curious to understand their inductive bias that allows them to do so, and especially how this inductive bias is influenced by design choices of the architecture and hyperparameters, especially in the context of multi-task learning, transfer learning, representation learning, and feature learning. For some specific design choices, I derived a theory to understand the inductive bias of NNs. In general, I see the following 4 main strengths in the inductive bias of neural networks:

Deep Learning can strongly benefit from multi-task learning, transfer learning, representation learning, and feature learning.
NNs with standard activations functions (such as ReLU) have an inductive bias towards flat/simple/smooth/nonoscillating functions because of implicit (and explicit) regularization.
Some architectures (such as transformers, CNNs, RNNs, GNNs) have (soft) invariances/symmetries that are helpful for certain domains
The flexibility of architectures and training algorithms allows for many ways to manipulate the inductive bias by hand-crafting tricks such as specific forms of data augmentation.

Talks

ICML 2025 in Vancouver, 2025, Prices, Bids, Values: One ML-Powered Combinatorial Auction to Rule Them All (12' presentation) (5' video)
Xin Guo's research seminar in Berkeley, 2025, Loss of Plasticity in (Reinforcement) Learning: Theory and Prevention.
Workshop on Uncertainty Quantification in Neural Network Models at BIRS in Banff (Canada), 2025, Inductive Bias of Neural Networks
PhD defence at ETH Zürich, 2024, Inductive bias of neural networks and selected applications
12th Bachelier World Congress of the Bachelier Finance Society in Rio de Janeiro, 2024, Path-dependent Neural Jump ODEs and their Application to Stochastic Filtering
ETH – Hong Kong – Imperial Mathematical Finance Workshop at Imperial College London, 2024, Deep Learning Theory on Multi-task Learning
AMLD EPFL 2024 in Lausanne, 2024, NOMU: Neural Optimization-based Model Uncertainty (video)
ETH AI Center Fellows x Associated Researchers Meetup in Zürich, 2024, Deep Learning Theory on Multi-task Learning (photo)
SfS-PhD Talk in Zürich, 2023, How to forecast consistently based on noisy incomplete observations at irregular observation times (video)
Oxford ETH Workshop in Oxford, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
Post/Doctoral Seminar in Mathematical Finance in Zürich, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
SfS-PhD Talk in Zürich, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
Yu Group meeting in Berkeley, 2023, Theory on understanding the inductive bias of deep neural networks towards multi-task learning and methods to reduce the number of neurons
AAAI’23 in Washington DC, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
Uncertainty reading group via zoom, 2022, NOMU: Neural Optimization-based Model Uncertainty (video)
Oxford ETH Workshop in Zürich, 2022, How Infinitely Wide Neural Networks Benefit from Multi-task Learning - an Exact Macroscopic Characterization
SfS-PhD Talk in Zürich, 2022, How Infinitely Wide Neural Networks Benefit from Multi-task Learning - an Exact Macroscopic Characterization
Post/Doctoral Seminar in Mathematical Finance in Zürich, 2020, How implicit Regularization of Artificial Neural Networks Characterized the Learned function or a Mathematical Point of View on the Psychology of Artificial Neural Networks
ViZuS in Vienna, 2019, How implicit regularization of neural networks affects the learned function
FWZ Seminar in Padova, 2019, Randomized shallow neural networks and their use in understanding gradient descent

My research topics include

Theory of the inductive bias on infinitely wide deep neural networks (including multi-task learning and transfer learning). Epistemic and aleatoric uncertainty and generalization of neural networks. Bayesian optimization with the help of neural networks. Deep Learning in Market Design. Compression of neural networks. Monotonic neural networks. Bayesian neural networks. Outlier-Robust neural network. Irregularly observed time series. Neural Jump ODEs.

Page updated

Google Sites

Report abuse