High-resolution image reconstruction with latent diffusion models from human brain activity
Accepted at CVPR 2023
1. Graduate School of Frontier Biosciences, Osaka University, Japan
2. CiNet, NICT, Japan
[ Paper | Code (Coming soon!) | FAQ (English) | FAQ(日本語)]
Abstract
Reconstructing visual experiences from human brain activity offers a unique way to understand how the brain represents the world, and to interpret the connection between computer vision models and our visual system. While deep generative models have recently been employed for this task, reconstructing realistic images with high semantic fidelity is still a challenging problem. Here, we propose a new method based on a diffusion model (DM) to reconstruct images from human brain activity obtained via functional magnetic resonance imaging (fMRI). More specifically, we rely on a latent diffusion model (LDM) termed Stable Diffusion. This model reduces the computational cost of DMs, while preserving their high generative performance. We also characterize the inner mechanisms of the LDM by studying how its different components (such as the latent vector Z, conditioning inputs C, and different elements of the denoising U-Net) relate to distinct brain functions. We show that our proposed method can reconstruct high-resolution images with high fidelity in straightforward fashion, without the need for any additional training and fine-tuning of complex deep-learning models. We also provide a quantitative interpretation of different LDM components from a neuroscientific perspective. Overall, our study proposes a promising method for reconstructing images from human brain activity, and provides a new framework for understanding DMs.
Reconstructing visual experiences from human brain activity with Stable Diffusion
We demonstrate that our simple framework can reconstruct high-resolution images from brain activity with high semantic fidelity, without the need for training or fine-tuning of complex deep generative models.
Left: Overview of our framework. Right: Presented images (redbox, top row) and images reconstructed from human brain activity (grey box, bottom row).
How does it work?
We reconstructed visual images from functional Magnetic Resonance Imaging (fMRI) signals using a latent diffusion model named Stable Diffusion.
Visualization of denoising process conditioned with human brain activity
Understanding internal process of Stable Diffusion with encoding models of brain activity
We quantitatively interpret each component of an LDM from a neuroscience perspective, by mapping specific components to brain regions.
We also present an objective interpretation of how the text-to-image conversion process implemented by an LDM incorporates the semantic information expressed by the conditional text, while at the same time maintaining the appearance of the original image
BibTeX
@article {Takagi2022.11.18.517004,
author = {Takagi, Yu and Nishimoto, Shinji},
title = {High-resolution image reconstruction with latent diffusion models from human brain activity},
elocation-id = {2022.11.18.517004},
year = {2022},
doi = {10.1101/2022.11.18.517004},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/11/21/2022.11.18.517004},
eprint = {https://www.biorxiv.org/content/early/2022/11/21/2022.11.18.517004.full.pdf},
journal = {bioRxiv}
}