Time: Friday 12:30 pm
February 11, 2022, Qin Li (University of Wisconsin, Madison)
Title: Mean field theory in Inverse Problems: from Bayesian inference to overparameterization of networks
Abstract: Bayesian sampling and neural networks are seemingly two different machine learning areas, but they both deal with many particle systems. In sampling, one evolves a large number of samples (particles) to match a target distribution function, and in optimizing over-parameterized neural networks, one can view neurons particles that feed each other information in the DNN flow. These perspectives allow us to employ mean-field theory, a powerful tool that translates dynamics of many particle system into a partial differential equation (PDE), so rich PDE analysis techniques can be used to understand both the convergence of sampling methods and the zero-loss property of over-parameterization of ResNets. We showcase the use of mean-field theory in these two machine learning areas, and we also invite the audience to brainstorm other possible applications.
February 4, 2022, Shuhao Cao (Washington University St. Louis)
Title: Galerkin Transformer (video recording)
Abstract: The Transformer in "Attention Is All You Need" is now the ubiquitous architecture in every state-of-the-art model in Natural Language Processing (NLP) and Computer Vision (CV). At its heart and soul is the "attention mechanism". We study how to apply the attention mechanism the first time to a data-driven operator learning problem related to partial differential equations. Inspired by Fourier Neural Operator, an effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. It is demonstrated that the widely-accepted "indispensable" softmax normalization in the scaled dot-product attention is sufficient but not necessary. Without the softmax normalization, the representation capability of a linearized Transformer variant can be proved to be on par with a Petrov-Galerkin projection layer-wise. Some simple changes mimicking projections in Hilbert spaces are applied to the attention mechanism, and it helps the final model achieve remarkable accuracy in end-to-end operator learning tasks with unnormalized data, surpassing the evaluation accuracy of the classical Transformer applied directly by 100 times. Meanwhile in many other experiments, the newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both speed and evaluation accuracy over its softmax-normalized counterparts, as well as other concurrently proposed linearizing variants.
December 10, 2021, Riccardo Barbano (UCL) and Johannes Leuschner (University of Bremen)
Title: Is deep image prior in need of a good education? (video recording)
Abstract: Deep image prior was recently introduced as an effective prior for image reconstruction. It represents the image to be recovered as the output of a deep convolutional neural network, and learns the network's parameters such that the output fits the corrupted observation. Despite its impressive reconstructive properties, the approach is slow when compared to learned or traditional reconstruction techniques. Our work develops a two-stage learning paradigm to address the computational challenge: (i) we perform a supervised pretraining of the network on a synthetic dataset; (ii) we fine-tune the network's parameters to adapt to the target reconstruction. We showcase that pretraining considerably speeds up the subsequent reconstruction from real-measured micro computed tomography data of biological specimens.
October 29, 2021, Jong Chul Ye (KAIST)
Title: Noise2Score, Optimal transport-driven CycleGAN: two unsupervised learning frameworks for inverse problems (video recording)
Abstract: Recently, deep learning approaches have become the main research frontier for image reconstruction and enhancement problems thanks to their high performance, along with their ultra-fast inference times. However, due to the difficulty of obtaining matched reference data for supervised learning, there has been increasing interest in unsupervised learning approaches that do not need paired reference data. In particular, self-supervised learning and generative models have been successfully used for various inverse problem applications. In this talk, we overview these approaches from a coherent perspective in the context of classical inverse problems and discuss their various applications. In particular, the cycleGAN approach and a recent Noise2Score approach for unsupervised learning will be explained in detail using optimal transport theory and Tweedie’s formula with score matching.
October 22, 2021, Yufei Zhang (London School of Economics)
Title: Understanding Deep Architectures with Reasoning Layer
Abstract: Recently, there has been a surge of interest in combining deep learning models with reasoning in order to handle more sophisticated learning tasks. In many cases, a reasoning task can be solved by an iterative algorithm. This algorithm is often unrolled, and used as a specialized layer in the deep architecture, which can be trained end-to-end with other neural components. Although such hybrid deep architectures have led to many empirical successes, the theoretical foundation of such architectures, especially the interplay between algorithm layers and other neural layers, remains largely unexplored. In this paper, we take an initial step towards an understanding of such hybrid deep architectures by showing that properties of the algorithm layers, such as convergence, stability and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model. Furthermore, our analysis matches closely our experimental observations under various conditions, suggesting that our theory can provide useful guidelines for designing deep architectures with reasoning layers.
October 15, 2021 Reinhard Heckel (Technical University of Munich and Rice University)
Abstract: Traditional algorithms for reconstructing images from few and noisy measurements are handcrafted. Today, algorithms in form of deep networks learned on training data outperform traditional, handcrafted algorithms in computational cost and image quality. However, recent works have raised concerns that deep-learning-based image reconstruction methods are sensitive to peturbations and are less robust than traditional, handcrafted, methods: Neural networks may be sensitive to small, yet adversarially-selected perturbations, may perform poorly under distribution shifts, and may fail to recover small but important features in an image. To understand the sensitivity to such perturbations, we measured the robustness of a variety of deep network based and traditional methods. Perhaps surprisingly, in the context of accelerated magnetic resonance imaging, we find no evidence that deep learning based algorithms are less robust than classical, un-trained methods. Even for natural distribution shifts, we find that classical algorithms with a single hyper-parameter tuned on a training set compromise as much in performance than a neural network with 50 million parameters. Our results indicate that the state-of-the-art deep-learning-based image reconstruction methods provide improved performance than traditional methods without compromising robustness.
November 5, 2021, Hui Ji (National University of Singapore)
Title: Self-supervised Deep Learning for Solving Inverse Problems in Imaging (video recording)
Abstract: Deep learning has become a prominent tool for solving many inverse problems in imaging sciences. Most existing SOTA solutions are built on supervised learning with a prerequisite on the availability of a large-scale dataset of many degraded/truth image pairs. In recent years, driven by practical need, there is an increasing interest on studying deep learning methods under limited data resources, which has particular significance for imaging in science and medicine. This talk will focus on the discussion of self-supervised deep learning for solving inverse imaging problems, which assumes no training sample is available. By examining deep learning from the perspective of Bayesian inference, we will present several results and techniques on self-supervised learning for MMSE (minimum mean squared error) estimator. Built on these techniques, we will show that, in several applications, the resulting dataset-free deep learning methods provide very competitive performance in comparison to their SOTA supervised counterparts. While the demonstrations only cover image denoising, compressed sensing, and phase retrieval, the presented techniques and methods are quite general which can used for solving many other inverse imaging problems.