July 2021 : We've launched our latest denoising and blur reduction models in Google Photos editor. Read the Blog post. To learn more about the details of the tech described in this article, check out our papers on --
July 2021 -- We'll have 5 papers in ICCV 2021
"Learning to Resize Images for Computer Vision Tasks": Front-end resizers in deep networks are simple filters. They’re an afterthought — but they shouldn’t be. Deep computer vision models can benefit greatly from replacing these fixed linear resizers with well-designed, learned, nonlinear resizers. The resizer is jointly trained w/ baseline model loss. No pixel/perceptual loss on the resizer means images are task-optimized, not optimized for visual quality. Structure of the learned resizer is specific; not just adding more generic convolutional layers to the baseline model. Our work shows that a generically deeper model can be improved upon w/ a well-designed front-end, task-optimized, processor.
"Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data": We propose a procedure to generate realistic Dual-Pixel (DP) data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Leveraging these realistic synthetic DP images, we introduce a new recurrent convolutional network (RCN) architecture that can improve deblurring results and is suitable for use with single-frame and multi-frame data captured by DP sensors. Finally, we show that our synthetic DP data is useful for training DNN models targeting video deblurring applications where access to DP data remains challenging.
"Multi-scale Transformer for Image Quality Assessment": we develop a multi-scale Transformer for IQA to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets
"COMISR: Compression-Informed Video Super-Resolution": We propose a new compression-informed video super-resolution model. The proposed model consists of three modules for video super-resolution: bi-directional recurrent warping, detail-preserving flow estimation, and Laplacian enhancement. All these three modules are used to deal with compression properties such as the location of the intra-frames in the input and smoothness in the output frames. We show that our method not only recovers high-resolution content on uncompressed frames from the widely-used benchmark datasets, but also achieves state-of-the-art performance.
"Patch Craft: Video Denoising by Deep Modeling and Patch Matching": This work proposes a novel approach for leveraging self-similarity in the context of video denoising, while still relying on a regular convolutional architecture. We introduce a concept of patch-craft frames -- artificial frames that are similar to the real ones, built by tiling matched patches. Our algorithm augments video sequences with patch-craft frames and feeds them to a CNN. We demonstrate the substantial boost in denoising performance obtained with the proposed approach.
June 2020 -- We have 3 papers in CVPR 2020 -- two regular conference and one workshop. Here are the summaries and links to relevant material are below.
"GIFnets: Differentiable GIF Encoding Framework": We introduce (to our knowledge), the first differentiable GIF encoding pipeline. It includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each provides an important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to reduce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images.
"Distortion Agnostic Deep Watermarking": We develop a framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training. Instead, the robustness of our system comes from two sources: adversarial training and channel coding. Compared to training on a fixed set of distortions and noise levels, our method achieves comparable or better results on distortions available during training, and better performance overall on unknown distortions.
"LIDIA: Lightweight Learned Image Denoising with Instance Adaptation": We use a combination of supervised and unsupervised training, where the first stage gets a general denoiser and the second does instance adaptation. LIDIA produces near state-of-the-art quality, while having relatively very small number of parameters as compared to the leading methods
July 2019 -- Our paper on Handheld Multi-frame Super-resolution was presented at SIGGRAPH 2019. You can find our paper, supplementary material and a short video describing the work at the project website. This technology powers the Super-Res Zoom and Night Sight (merge) features on Pixel phones.
December 2018: Our Teams' work on Google Pixel3 wins Innovation of the Year Award from DPReview! "Pixel 3 is the first smartphone camera to truly challenge traditional cameras." We're improving on HDR+ multi-frame fusion, by instead using super res fusion for more detail, without the need for demosaicing. Our Super Res technique allows Pixel3's image quality in 'Night Sight' mode (at full FOV) to rival cameras with Four Thirds sensors in all light conditions; meanwhile also making possible digital zoom that rivals modest optical zoom modules on other phones.
November 2018: Night Sight mode on Pixel 3 uses our Super Res Zoom technology to merge images (whether you zoom or not) for detailed, clear, and vivid shots in low light, and produces super-resolved images that are higher quality than the main camera, in bright light.
October 2018: Super Res Zoom in launched in Pixel 3 phones, bringing multi-frame super-resolution to mobile photography.
Charged: Pixel 3, improving on incredible
Android Central: Google's latest phones shine a light at just how good Android can be
September 2018: The patent for learning-based jpeg artifact removal has been issued by the USPTO.
June 2018: The patent for RAISR image upscaling has been issued by the USPTO.
January 2018: CNET: Google Pixel 2 photos get AI boost for digital zoom
Android Authority: Google AI can now tell which photos you’ll think are beautiful
October 2017: RAISR is launched for digital zoom on Pixel 2/XL phones
The Verge: Google Pixel 2: Plainly Great
October 2017: RAISR is launched in Google Clips for high quality image export
September 2017: I gave a master class (slides and video) at the Summer School on Signal Processing Meets Deep Learning, in Capri, Italy
August 2017: Graduate Student internships in Google Research : Apply Here
February 2017: A nice summary video describing our work on RAISR
January 2017: RAISR is launched in MotionStills ,
January 2017: RAISR launched in Google+ bringing high quality and bandwidth savings to images on the web.
Android Headlines: Google’s New RAISR Image Algorithm Coming To Google+
July 2015: I gave a plenary talk at the International Conference on Multimedia and Expo in Torino, Italy. More details here, including slides of my talk.
April 2015: devCam: Open-source Android Camera Control for Algorithm Development and Testing
October 2014: I gave a plenary talk at the SPIE Optics and Photonics Conference in San Diego. Here is the video of my talk.