Authors: Derya Akkaynak, Tali Treibitz
Abstract:
Robust recovery of lost colors in underwater images remains a challenging problem. We recently showed that this was partly due to the prevalent use of an atmospheric image formation model for underwater images and proposed a physically accurate model. The revised model showed: 1)~the attenuation coefficient of the signal is not uniform across the scene but depends on object range and reflectance, 2)~the coefficient governing the increase in backscatter with distance differs from the signal attenuation coefficient. Here, we present the first method that recovers color with our revised model, using RGBD images. The Sea-thru method estimates backscatter using the dark pixels and their known range information. Then, it uses an estimate of the spatially varying illuminant to obtain the range-dependent attenuation coefficient. Using more than 1,100 images from two optically different water bodies, which we make available, we show that our method with the revised model outperforms those using the atmospheric model. Consistent removal of water will open up large underwater datasets to powerful computer vision and machine learning algorithms, creating exciting opportunities for the future of underwater exploration and conservation.
Authors: Or Isaacs, Oran Shayer and Micha Lindenbaum
Abstract:
Current successful approaches for generic (non- semantic) segmentation rely mostly on edge detection and have leveraged the strengths of deep learning mainly by im- proving the edge detection stage in the algorithmic pipeline. This is in contrast to semantic and instance segmentation, where deep learning has made a dramatic affect and DNNs are applied directly to generate pixel-wise segment repre- sentations. We propose a new method for learning a pixel- wise representation that reflects segment relatedness. This representation is combined with an edge map to yield a new segmentation algorithm. We show that the representa- tions themselves achieve state-of-the-art segment similarity scores. Moreover, the proposed, combined segmentation al- gorithm provides results that are either the state of the art or improve it, for most quality measures.
Authors: Chen Bar, Marina Alterman, Ioannis Gkioulekas and Anat Levin
Abstract:
We present a Monte Carlo rendering framework for the physically-accurate simulation of speckle patterns arising from volumetric scattering of coherent waves. These noise-like patterns are characterized by strong statistical properties, such as the so-called memory effect. These properties are at the core of imaging techniques for applications as diverse as tissue imaging,motion tracking, and non-line-of-sight imaging. Our rendering framework can replicate these properties computationally, in a way that is orders of magnitude more efficient than alternatives based on directly solving the wave equations. At the core of our framework is a path-space formulation for the covariance of speckle patterns arising from a scattering volume, which we derive from first principles. We use this formulation to develop two Monte Carlo rendering algorithms, for computing speckle covariance as well as directly speckle fields. While approaches based on wave equation solvers require knowing the microscopic position of wavelength-sized scatterers,our approach takes as input only bulk parameters describing the statistical distribution of these scatterers inside a volume. We validate the accuracy of our framework by comparing against speckle patterns simulated using wave equation solvers, use it to simulate memory effect observations that were previously only possible through lab measurements, and demonstrate its applicability for computational imaging tasks
Authors: Dvir Ginzburg and Dan Raviv
Abstract:
We present the first utterly self-supervised network for dense correspondence mapping between non-isometric shapes.The task of alignment in non-Euclidean domains is one of the most fundamental and crucial problems in computer vision. As 3D scanners can generate highly complex and dense models, the mission of finding dense mappings between those models is vital. The novelty of our solution is based on a cyclic mapping between metric spaces, where the distance between a pair of points should remain invariant after the full cycle. As the same learnable rules that generate the point-wise descriptors apply in both directions, the network learns invariant structures without any labels while coping with non-isometric deformations. We show here state-of-the-art-results by a large margin for a variety of tasks compared to known self-supervised and supervised methods.
Authors: Roman Beliy, Guy Gaziv, Assaf Hoogi, Francesca Strappini, Tal Golan, Michal Irani
Abstract:
Reconstructing observed images from fMRI brain recordings is challenging. Unfortunately, acquiring sufficient "labeled" pairs of {Image, fMRI} (i.e., images with their corresponding fMRI responses) to span the huge space of natural images is prohibitive for many reasons. We present a novel approach which, in addition to the scarce labeled data (training pairs), allows to train fMRI-to-image reconstruction networks also on "unlabeled" data (i.e., images without fMRI recording, and fMRI recording without images). The proposed model utilizes both an Encoder network (image-to-fMRI) and a Decoder network (fMRI-to-image). Concatenating these two networks back-to-back (Encoder-Decoder & Decoder-Encoder) allows augmenting the training with both types of unlabeled data. Importantly, it allows training on the unlabeled test-fMRI data. This self-supervision adapts the reconstruction network to the new input test-data, despite its deviation from the statistics of the scarce training data.
Authors: Tom Tirer, Shady Abu-Hussein, Raja Giryes
Abstract:
Ill-posed inverse problems appear in many image processing applications, such as deblurring, super-resolution and compressed sensing. Traditional reconstruction strategies, which involve minimizing a composition of fidelity and prior terms, exhibit limited performance due to the hardness in the mathematical modeling of natural images. Recently, many works have mitigated this difficulty by (exhaustively) training deep neural networks to learn the inverse mappings of given observation models. However, these methods suffer from a huge performance drop when the observation model used in training is inexact. In this talk, I focus on a promising line of work that uses deep learning models, such as CNN denoisers and GANs, for handling only the prior in inverse problems, and is therefore not restricted by assumptions made in the training phase. Our contributions include a back-projection (BP) fidelity term, which is an alternative for the traditional least squares (LS) objective. Using the simple proximal gradient method with the BP term and off-the-shelf denoisers (a scheme that we term IDBP) gives excellent results, requires less parameter tuning than LS-based methods, and is accompanied with theoretical motivations. Another contribution is an image-adaptive approach, where we tune CNN denoisers or GANs in test-time to specialize them on the image at hand. This approach leads to a significant performance boost, especially for GANs which often suffer from limited representation capabilities (known in the literature also as mode collapse).
Authors: Gil Shamai, Ron Slossberg and Ron Kimmel
Abstract:
Question: Can molecular markers of cancer be extracted from tissue morphology as seen in hematoxylin-eosin–stained images?
Findings: In this diagnostic study of tissue microarray hematoxylin-eosin–stained images from 5356 patients with breast cancer, molecular biomarker expression was found to be significantly associated with tissue histomorphology. A deep learning model was able to predict estrogen receptor expression solely from hematoxylin-eosin–stained images with non-inferior accuracy to standard immunohistochemistry.
Meaning: These results suggest that deep learning models may assist pathologists in molecular profiling of cancer with practically no added cost and time.
Based on recent JAMA paper, see https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2739045 for more details.
Authors: Itai Lang, Oren Dovrat, Asaf Manor, Shai Avidan
Abstract:
There is a growing number of tasks that work directly on point clouds. As the size of the point cloud grows, so do the computational demands of these tasks. A possible solution is to sample the point cloud first. A popular sampling technique is Farthest Point Sampling (FPS). However, FPS is agnostic to a downstream application (classification, retrieval, etc.). The underlying assumption seems to be that minimizing the farthest point distance, as done by FPS, is a good proxy to other objective functions.
We show that it is better to learn how to sample. SampleNet is a neural network that takes a point cloud and produces a subset of the points that is optimized to a downstream task. The key challenge is that sampling is a non-differentiable operation. We overcome this by introducing a differentiable approximation.
This approximation scheme leads to consistently good results on various applications such as classification, retrieval, and geometric reconstruction. We also show that the proposed sampling network can be used as a front to a point cloud registration network. This is a challenging task since sampling must be consistent across two different point clouds.
Authors: Dror Simon and Michael Elad
Abstract:
Sparse representation with respect to an overcomplete dictionary is often used when regularizing inverse problems in signal and image processing. In recent years, the Convolutional Sparse Coding (CSC) model, in which the dictionary consists of shift-invariant filters, has gained renewed interest. While this model has been successfully used in some image processing problems, it still falls behind traditional patch-based methods on simple tasks such as denoising.
In this work we provide new insights regarding the CSC model and its capability to represent natural images, and suggest a Bayesian connection between this model and its patch-based ancestor. Armed with these observations, we suggest a novel feed-forward network that follows an MMSE approximation process to the CSC model, using strided convolutions. The performance of this supervised architecture is shown to be on par with state of the art methods while using much fewer parameters.
Authors: Oshri Halimi, Or Litany, Emanuel Rodola, Alex Bronstein, Ron Kimmel
Abstract:
We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it by a purely geometric criterion. The resulting learning model is class-agnostic, and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize on the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.
Authors: Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman and Daniel Cohen-Or
Abstract:
Polygonal meshes provide an efficient representation for 3D shapes. They explicitly capture both shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this talk, I discuss how we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of MeshCNN on various learning tasks applied to 3D meshes.
Authors: Tamar Rott Shaham, Tali Dekel, Tomer Michaeli
Abstract:
We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.
Authors: Mor Avi Aharon, Tammy Riklin-Raviv
Abstract:
We present the HueNet - a novel Deep Learning framework for Intensity-based Image-to-Image Translation. The key idea is a new technique, we term `Network Augmentation' which allows
a differential construction of intensity histograms from images. We further introduce differential losses for 1D, 2D and cyclic histograms and show
their applicability to several image-to-image translation tasks, including contrast adjustment by histogram equalization, color transfer (see figure) and unsupervised image colorization.
The incorporation of histogram losses in addition to an adversarial loss enables the construction of semantically meaningful and realistic images.
Authors: Idan Schwartz, Seunghak Yu, Tamir Hazan, Alexander Schwing
Abstract:
Visual question answering, learning and generating audio-visual scene-aware dialog and visual dialog tasks pave the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. While dialogs present an effective way to reason and exchange information, very little is known to date about how to effectively extract details and nuances from the multiple sensors that pound the computational engine of those devices. Therefore, in this talk we propose a novel form of attention mechanism which operates on any number of data utilities and differentiates useful signals from distracting ones. To this end, we design a factor graph based attention mechanism which combines any number of utility representations, such as video-frame, audio-piece, dialog-interaction, image, question and candidate answer. We illustrate the applicability of the proposed approach on the challenging and recently introduced VisDial and AVSD datasets, outperforming recent state-of-the-art of visual dialog by more than 6% on MRR and state-of-the-art of audio-visual scene-aware dialog by more than 20% on CIDEr.
Authors: Guy Hacohen, Leshem Choshen, Daphna Weinshall
abstract:
One of the unresolved questions in deep learning is the nature of the solutions that are being discovered. We investigate the collection of solutions reached by the same network architecture, with different random initialization of weights and random mini-batches. These solutions are shown to be rather similar - more often than not, each train and test example is either classified correctly by all the networks, or by none at all. Surprisingly, all the network instances seem to share the same learning dynamics, whereby initially the same train and test examples are correctly recognized by the learned model, followed by other examples which are learned in roughly the same order. When extending the investigation to heterogeneous collections of neural network architectures, once again examples are seen to be learned in the same order irrespective of architecture, although the more powerful architecture may continue to learn and thus achieve higher accuracy. This pattern of results remains true even when the composition of classes in the test set is unrelated to the train set, for example, when using out of sample natural images or even artificial images. To show the robustness of these phenomena we provide an extensive summary of our empirical study, which includes hundreds of graphs describing tens of thousands of networks with varying NN architectures, hyper-parameters and domains. We also discuss cases where this pattern of similarity breaks down, which show that the reported similarity is not an artifact of optimization by gradient descent. Rather, the observed pattern of similarity is characteristic of learning complex problems with big networks. Finally, we show that this pattern of similarity seems to be strongly correlated with effective generalization.
Authors: Roy Uziel, Meitar Ronen, and Oren Freifeld
Abstract:
Superpixels provide a useful intermediate image representation. Existing superpixel methods, however, suffer from at least some of the following drawbacks: 1) topology is handled heuristically; 2) the number of superpixels is either predefined or estimated at a prohibitive cost; 3) lack of adaptiveness. As a remedy, we propose a novel probabilistic model, self-coined Bayesian Adaptive Superpixel Segmentation (BASS), together with an efficient inference. BASS is a Bayesian nonparametric mixture model that also respects topology and favors spatial coherence. The optimization-based and topology-aware inference is parallelizable and implemented in GPU. Quantitatively, BASS achieves results that are either better than the state-of-the-art or close to it, depending on the performance index and/or dataset. Qualitatively, we argue it achieves the best results; we demonstrate this by not only subjective visual inspection but also objective quantitative performance evaluation of the downstream application of face detection
Authors: Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri
Abstract:
We address the problem of recovering camera matrices from a collection of fundamental (or essential) matrices in a multiview setting. We make two main contributions. First, given a collection of n choose 2 fundamental (or essential) matrices, associated with n images, we provide a complete algebraic characterization in the form of conditions that are both necessary and sufficient to enable the recovery of camera matrices. These constraints are based on stacking the fundamental (or essential) matrices as blocks in a single matrix, called the n-view fundamental (essential) matrix, and characterizing this matrix in terms of its rank and the signs of its eigenvalues. Secondly, based on these algebraic constrains, we formulate an optimization problem and propose an efficient optimization algorithm that given a complete or partial collection of measured fundamental (essential) matrices, it globally recovers the camera matrices. Our experiments indicate that our method achieves state of the art performance in both accuracy and run-time.
Authors: Noam Maleli and Yosi Keller
Abstract:
We present a Deep Learning approach for learning the joint semantic embeddings of images and captions in a Euclidean space, such that the semantic similarity is approximated by the L₂ distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. A differentiable quantization scheme is then applied to the semantic centers to derive the semantic embedding of semantically similar concepts in Euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favourably with contemporary state-of-the-art approaches.