High-resolution image reconstruction with latent diffusion models from human brain activity

Accepted at CVPR 2023

1. Graduate School of Frontier Biosciences, Osaka University, Japan

2. CiNet, NICT, Japan


Reconstructing visual experiences from human brain activity offers a unique way to understand how the brain represents the world, and to interpret the connection between computer vision models and our visual system. While deep generative models have recently been employed for this task, reconstructing realistic images with high semantic fidelity is still a challenging problem. Here, we propose a new method based on a diffusion model (DM) to reconstruct images from human brain activity obtained via functional magnetic resonance imaging (fMRI). More specifically, we rely on a latent diffusion model (LDM) termed Stable Diffusion. This model reduces the computational cost of DMs, while preserving their high generative performance. We also characterize the inner mechanisms of the LDM by studying how its different components (such as the latent vector Z, conditioning inputs C, and different elements of the denoising U-Net) relate to distinct brain functions. We show that our proposed method can reconstruct high-resolution images with high fidelity in straightforward fashion, without the need for any additional training and fine-tuning of complex deep-learning models. We also provide a quantitative interpretation of different LDM components from a neuroscientific perspective. Overall, our study proposes a promising method for reconstructing images from human brain activity, and provides a new framework for understanding DMs.

Reconstructing visual experiences from human brain activity with Stable Diffusion

We demonstrate that our simple framework can reconstruct high-resolution images from brain activity with high semantic fidelity, without the need for training or fine-tuning of complex deep generative models. 

Left: Overview of our framework. Right: Presented images (redbox, top row) and images reconstructed from human brain activity (grey box, bottom row).

How does it work?

We reconstructed visual images from functional Magnetic Resonance Imaging (fMRI) signals using a latent diffusion model named Stable Diffusion.

Visualization of denoising process conditioned with human brain activity

Understanding internal process of Stable Diffusion with encoding models of brain activity

We quantitatively interpret each component of an LDM from a neuroscience perspective, by mapping specific components to brain regions.

We also present an objective interpretation of how the text-to-image conversion process implemented by an LDM incorporates the semantic information expressed by the conditional text, while at the same time maintaining the appearance of the original image

We can further improve reconstruction via multiple decoded inputs

Based on our the above work, we further examined the extent to which various additional decoding techniques affect the performance of reconstructing visual experience in a following Technical Paper. We confirmed that adding several techniques contribute to improving the accuracy from Takagi and Nishimoto CVPR 2023. Please see technical paper for the detail.

In the figure below, examples of the presented (red box) and reconstructed images using additional techniques are shown. The decoded text, image produced by GAN, and decoded depth are shown below the reconstructed image

In the figure below, for each method, three generated images from different stochastic noise were randomly chosen.

@article {Takagi2022.11.18.517004,

   author = {Takagi, Yu and Nishimoto, Shinji},

   title = {High-resolution image reconstruction with latent diffusion models from human brain activity},

   elocation-id = {2022.11.18.517004},

   year = {2022},

   doi = {10.1101/2022.11.18.517004},

   publisher = {Cold Spring Harbor Laboratory},

   URL = {https://www.biorxiv.org/content/early/2022/11/21/2022.11.18.517004},

   eprint = {https://www.biorxiv.org/content/early/2022/11/21/2022.11.18.517004.full.pdf},

   journal = {bioRxiv}