Yuval Bahat

I am a postdoctoral researcher passionate about using machine learning for solving computer vision and audio problems. 

Currently, I collaborate with Prof. Felix Heide at Princeton, having previously worked with Prof. Tomer Michaeli at the Technion. I completed my PhD at the Weizmann Institute of Science, where I was supervised by  Prof. Michal Irani. During my PhD I worked on both low-level vision (Image dehazing and image deblurring) and high-level vision (Image classification) problems. My academic journey began at the EE department of the Technion, where I completed my B.Sc. as well as my M.Sc., advised by Prof. Yoav Y. Schechner


e-mail: yuval dot bahat at gmail dot com

Selected Publications

Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions

Gene Chou, Yuval Bahat & Felix Heide, ICCV 2023

Probabilistic diffusion models have achieved state-of-the-art results for image synthesis, inpainting and text-to-image tasks. However, they are still in the early stages of generating complex 3D shapes. This work proposes Diffusion-SDF, a generative model for shape completion, single-view reconstruction and reconstruction of real-scanned point clouds. We use neural signed distance functions (SDFs) as our 3D representation to parameterize the geometry of various signals (e.g., point clouds, 2D images) through neural networks. Neural SDFs are implicit functions and diffusing them amounts to learning the reversal of their neural network weights, which we solve using a custom modulation module. Extensive experiments show that our method is capable of both realistic unconditional generation and conditional generation from partial inputs. This work expands the domain of diffusion models from learning 2D, explicit representations, to 3D, implicit representations.

Seeing Through Obstructions with Diffractive Cloaking

Zheng Shi, Yuval Bahat, Seung-Hwan Baek, Qiang Fu, Hadi Amata, Xiao Li, Praneeth Chakravarthula, Wolfgang Heidrich & Felix Heide, SIGGRAPH 2022

Unwanted camera obstruction can severely degrade captured images, including both scene occluders near the camera and partial occlusions of the camera cover glass. Such occlusions can cause catastrophic failures for various scene understanding tasks such as semantic segmentation, object detection and depth estimation. Existing camera arrays capture multiple redundant views of a scene to see around thin occlusions. Such multi-camera systems effectively form a large synthetic aperture, which can suppress nearby occluders with a large defocus blur, but significantly increase the overall form factor of the imaging setup. In this work, we propose a monocular single-shot imaging approach that optically cloaks obstructions by emulating a large array. Instead of relying on different camera views, we learn a diffractive optical element (DOE) that performs depth-dependent optical encoding, scattering nearby occlusions while allowing paraxial wavefronts to be focused. We computationally reconstruct unobstructed images from these superposed measurements with a neural network that is trained jointly with the optical layer of the proposed imaging system. We assess the proposed method in simulation and with an experimental prototype, validating that the proposed computational camera is capable of recovering occluded scene information in the presence of severe camera obstruction.

Neural Point Light Fields

Julian Ost, Issam Laradji, Alejandro Newell, Yuval Bahat & Felix Heide, CVPR 2022

We introduce Neural Point Light Fields that represent scenes implicitly with a light field living on a sparse point cloud. Combining differentiable volume rendering with learned implicit density representations has made it possible to synthesize photo-realistic images for novel views of small scenes. As neural volumetric rendering methods require dense sampling of the underlying functional scene representation, at hundreds of samples along a ray cast through the volume, they are fundamentally limited to small scenes with the same objects projected to hundreds of training views. Promoting sparse point clouds to neural implicit light fields allows us to represent large scenes effectively with only a single implicit sampling operation per ray. These point light fields are as a function of the ray direction, and local point feature neighborhood, allowing us to interpolate the light field conditioned training images without dense object coverage and parallax. We assess the proposed method for novel view synthesis on large driving scenarios, where we synthesize realistic unseen views that existing implicit approaches fail to represent. We validate that Neural Point Light Fields make it possible to predict videos along unseen trajectories previously only feasible to generate by explicitly modeling the scene.

What's in the Image?
Explorable Decoding of Compressed Images

Yuval Bahat & Tomer Michaeli, CVPR 2021 (oral presentation)

The ever-growing amounts of visual contents captured on a daily basis necessitate the use of lossy compression methods in order to save storage space and transmission bandwidth. While extensive research efforts are devoted to improving compression techniques, every method inevitably discards information. Especially at low bit rates, this information often corresponds to semantically meaningful visual cues, so that decompression involves significant ambiguity. In spite of this fact, existing decompression algorithms typically produce only a single output, and do not allow the viewer to explore the set of images that map to the given compressed code. In this work we propose the first image decompression method to facilitate user-exploration of the diverse set of natural images that could have given rise to the compressed input code, thus granting users the ability to determine what could and what could not have been there in the original scene. Specifically, we develop a novel deep-network based decoder architecture for the ubiquitous JPEG standard, which allows traversing the set of decompressed images that are consistent with the compressed JPEG file. To allow for simple user interaction, we develop a graphical user interface comprising several intuitive exploration tools, including an automatic tool for examining specific solutions of interest. We exemplify our framework on graphical, medical and forensic use cases, demonstrating its wide range of potential applications. 

Explorable Super Resolution

Yuval Bahat & Tomer Michaeli, CVPR 2020 (oral presentation)

Single image super resolution (SR) has seen major performance leaps in recent years. However, existing methods do not allow exploring the infinitely many plausible reconstructions that might have given rise to the observed low-resolution (LR) image. These different explanations to the LR image may dramatically vary in their textures and fine details, and may often encode completely different semantic information. In this paper, we introduce the task of explorable super resolution. We propose a framework comprising a graphical user interface with a neural network backend, allowing editing the SR output so as to explore the abundance of plausible HR explanations to the LR input. At the heart of our method is a novel module that can wrap any existing SR network, analytically guaranteeing that its SR outputs would precisely match the LR input, when downsampled. Besides its importance in our setting, this module is guaranteed to decrease the reconstruction error of any SR network it wraps, and can be used to cope with blur kernels that are different from the one the network was trained for. We illustrate our approach in a variety of use cases, ranging from medical imaging and forensics, to graphics.

Classification Confidence Estimation with Test-Time Data-Augmentation

Yuval Bahat & Gregory Shakhnarovich, arXiv 2020

Machine learning plays an increasingly significant role in many aspects of our lives (including medicine, transportation, security, justice and other domains), making the potential consequences of false predictions increasingly devastating. These consequences may be mitigated if we can automatically flag such false predictions and potentially assign them to alternative, more reliable mechanisms, that are possibly more costly and involve human attention. This suggests the task of detecting errors, which we tackle in this paper for the case of visual classification. To this end, we propose a novel approach for classification confidence estimation. We apply a set of semantics-preserving image transformations to the input image, and show how the resulting image sets can be used to estimate confidence in the classifier's prediction. We demonstrate the potential of our approach by extensively evaluating it on a wide variety of classifier architectures and datasets, including ResNext/ImageNet, achieving state of the art performance. This paper constitutes a significant revision of our earlier work in this direction (Bahat & Shakhnarovich, 2018).

Natural and Adversarial Error Detection using Invariance to Image Transformations

Yuval Bahat, Michal Irani & Gregory Shakhnarovich, arXiv 2019

We propose an approach to distinguish between correct and incorrect image classifications. Our approach can detect misclassifications which either occur unintentionally (``natural errors''), or due to intentional adversarial attacks (``adversarial errors''), both in a single unified framework. Our approach is based on the observation that correctly classified images tend to exhibit robust and consistent classifications under certain image transformations (e.g., horizontal flip, small image translation, etc.). In contrast, incorrectly classified images (whether due to adversarial errors or natural errors) tend to exhibit large variations in classification results under such transformations. Our approach does not require any modifications or retraining of the classifier, hence can be applied to any pre-trained classifier. We further use state of the art targeted adversarial attacks to demonstrate that even when the adversary has full knowledge of our method, the adversarial distortion needed for bypassing our detector is no longer imperceptible to the human eye. Our approach obtains state-of-the-art results compared to previous adversarial detection methods, surpassing them by a large margin.

Confidence from Invariance to Image Transformations

Yuval Bahat & Gregory Shakhnarovich, arXiv 2018

We develop a technique for automatically detecting the classification errors of a pre-trained visual classifier. Our method is agnostic to the form of the classifier, requiring access only to classifier responses to a set of inputs. We train a parametric binary classifier (error/correct) on a representation derived from a set of classifier responses generated from multiple copies of the same input, each subject to a different natural image transformation. Thus, we establish a measure of confidence in classifier's decision by analyzing the invariance of its decision under various transformations. In experiments with multiple data sets (STL-10,CIFAR-100,ImageNet) and classifiers, we demonstrate new state of the art results for the error detection task. In addition, we apply our technique to novelty detection scenarios, where we also demonstrate state of the art results.

Non-Uniform Blind Deblurring by Reblurring

*Yuval Bahat, *Netalee Efrat  & Michal Irani, ICCV 2017 (*Equal contributors)

We present an approach for blind image deblurring, which handles non-uniform blurs. Our algorithm has two main components: (i) A new method for recovering the unknown blur-field directly from the blurry image, and (ii) A method for deblurring the image given the recovered nonuniform blur-field. Our blur-field estimation is based on simple spectral analysis of blurry image patches. Being unrestricted by any training data, it can handle a large variety of blur sizes, yielding superior blur-field estimation results compared to training-based deep-learning methods. Our non-uniform deblurring algorithm is based on the internal image-specific patch-recurrence prior. It attempts to recover a sharp image which, on one hand – results in the blurry image under our estimated blur-field, and on the other hand – maximizes the internal recurrence of patches within and across scales of the recovered sharp image. The combination of these two components gives rise to a blinddeblurring algorithm, which exceeds the performance of state-of-the-art CNN-based blind-deblurring by a significant margin, without the need for any training data.

Blind Dehazing Using Internal Patch Recurrence

Yuval Bahat & Michal Irani, ICCP 2016

Images of outdoor scenes are often degraded by haze, fog and other scattering phenomena. In this paper we show how such images can be dehazed using internal patch recurrence. Small image patches tend to repeat abundantly inside a natural image, both within the same scale, as well as across different scales. This behavior has been used as a strong prior for image denoising, super-resolution, image completion and more. Nevertheless, this strong recurrence property significantly diminishes when the imaging conditions are not ideal, as is the case in images taken under bad weather conditions (haze, fog, underwater scattering, etc.). In this paper we show how we can exploit the deviations from the ideal patch recurrence for ``Blind Dehazing'' - namely, recovering the unknown haze parameters and reconstructing a haze-free image. We seek the haze parameters that, when used for dehazing the input image, will maximize the patch recurrence in the dehazed output image. More specifically, pairs of co-occurring patches at different depths (hence undergoing different degrees of haze) allow recovery of the airlight color, as well as the relative-transmission of each such pair of patches. This in turn leads to dense recovery of the scene structure, and to full image dehazing.

Self-Content-Based Audio Inpainting

Yuval Bahat, Yoav Y. Schechner & Michael Elad, Signal Processing 2015

The popularity of voice over internet protocol (VoIP) systems is continuously growing. Such systems depend on unreliable internet communication, in which chunks of data often get lost during transmission. Various solutions to this problem were proposed, most of which are better suited to small rates of lost data. This work addresses this problem by filling in missing data using examples taken from prior recorded audio of the same user. Our approach also harnesses statistical priors and data inpainting smoothing techniques. The effectiveness of the proposed solution is demonstrated experimentally, even in large data-gaps, which cannot be handled by standard packet loss concealment techniques.