• July 2021 : We've launched our latest denoising and blur reduction models in Google Photos editor. Read the Blog post. To learn more about the details of the tech described in this article, check out our papers on --

  • July 2021 -- We'll have 5 papers in ICCV 2021

    • "Learning to Resize Images for Computer Vision Tasks": Front-end resizers in deep networks are simple filters. They’re an afterthought — but they shouldn’t be. Deep computer vision models can benefit greatly from replacing these fixed linear resizers with well-designed, learned, nonlinear resizers. The resizer is jointly trained w/ baseline model loss. No pixel/perceptual loss on the resizer means images are task-optimized, not optimized for visual quality. Structure of the learned resizer is specific; not just adding more generic convolutional layers to the baseline model. Our work shows that a generically deeper model can be improved upon w/ a well-designed front-end, task-optimized, processor.

    • "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data": We propose a procedure to generate realistic Dual-Pixel (DP) data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Leveraging these realistic synthetic DP images, we introduce a new recurrent convolutional network (RCN) architecture that can improve deblurring results and is suitable for use with single-frame and multi-frame data captured by DP sensors. Finally, we show that our synthetic DP data is useful for training DNN models targeting video deblurring applications where access to DP data remains challenging.

    • "Multi-scale Transformer for Image Quality Assessment": we develop a multi-scale Transformer for IQA to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets

    • "COMISR: Compression-Informed Video Super-Resolution": We propose a new compression-informed video super-resolution model. The proposed model consists of three modules for video super-resolution: bi-directional recurrent warping, detail-preserving flow estimation, and Laplacian enhancement. All these three modules are used to deal with compression properties such as the location of the intra-frames in the input and smoothness in the output frames. We show that our method not only recovers high-resolution content on uncompressed frames from the widely-used benchmark datasets, but also achieves state-of-the-art performance.

    • "Patch Craft: Video Denoising by Deep Modeling and Patch Matching": This work proposes a novel approach for leveraging self-similarity in the context of video denoising, while still relying on a regular convolutional architecture. We introduce a concept of patch-craft frames -- artificial frames that are similar to the real ones, built by tiling matched patches. Our algorithm augments video sequences with patch-craft frames and feeds them to a CNN. We demonstrate the substantial boost in denoising performance obtained with the proposed approach.

  • June 2020 -- We have 3 papers in CVPR 2020 -- two regular conference and one workshop. Here are the summaries and links to relevant material are below.

    • "GIFnets: Differentiable GIF Encoding Framework": We introduce (to our knowledge), the first differentiable GIF encoding pipeline. It includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each provides an important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to reduce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images.

    • "Distortion Agnostic Deep Watermarking": We develop a framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training. Instead, the robustness of our system comes from two sources: adversarial training and channel coding. Compared to training on a fixed set of distortions and noise levels, our method achieves comparable or better results on distortions available during training, and better performance overall on unknown distortions.

    • "LIDIA: Lightweight Learned Image Denoising with Instance Adaptation": We use a combination of supervised and unsupervised training, where the first stage gets a general denoiser and the second does instance adaptation. LIDIA produces near state-of-the-art quality, while having relatively very small number of parameters as compared to the leading methods

  • July 2019 -- Our paper on Handheld Multi-frame Super-resolution was presented at SIGGRAPH 2019. You can find our paper, supplementary material and a short video describing the work at the project website. This technology powers the Super-Res Zoom and Night Sight (merge) features on Pixel phones.