Reconstructing Visual Images from Human Brain Activity

In a mind-boggling (or is it mind-reading?) paper posted on bioRχiv, two researchers from the Osaka University Graduate School of Frontier Biosciences in Japan, “High-resolution image reconstruction with latent diffusion models from human brain activity”, report that they have been able to reconstruct images from human brain activity detected by functional magnetic resonance imaging (fMRI), using the brain scan information to drive the Stable Diffusion latent diffusion model image synthesis system.


The results are stunning.


Here is the abstract from the paper.

Reconstructing visual experiences from human brain activity offers a unique way to understand how the brain represents the world, and to interpret the connection between computer vision models and our visual system. While deep generative models have recently been employed for this task, reconstructing realistic images with high semantic fidelity is still a challenging problem. Here, we propose a new method based on a diffusion model (DM) to reconstruct images from human brain activity obtained via functional magnetic resonance imaging (fMRI). More specifically, we rely on a latent diffusion model (LDM) termed Stable Diffusion. This model reduces the computational cost of DMs, while preserving their high generative performance. We also characterize the inner mechanisms of the LDM by studying how its different components (such as the latent vector of image Z, conditioning inputs C, and different elements of the denoising U-Net) relate to distinct brain functions. We show that our proposed method can reconstruct high-resolution images with high fidelity in straight-forward fashion, without the need for any additional training and fine-tuning of complex deep-learning models. We also provide a quantitative interpretation of different LDM components from a neuroscientific perspective. Overall, our study proposes a promising method for reconstructing images from human brain activity, and provides a new framework for understanding DMs.

The project’s Web site provides additional material and illustrations describing the process. The technique requires no special training of the neural networks for a specific subject. Here is an example of presenting an image to four different subjects and then reading their minds and reconstructing the image.


Based upon the wonders we’ve seen in the domain of artificial intelligence in just the last year, it is increasingly difficult to be stunned by the results being reported, but this does the trick for me. This is something I didn’t think we’d see before 2050, if then.

I wonder if it works on images generated internally by the brain and perceived during dreams.


On the same orbit, researchers from the University of Texas at Austin create a semantic decoder capable of reconstructing continuous language from fMRI scans: “Semantic reconstruction of continuous language from non-invasive brain recordings” published May 1, 2023, in Nature Neuroscience.

A brain–computer interface that decodes continuous language from non-invasive recordings would have many scientific and practical applications. Currently, however, non-invasive language decoders can only identify stimuli from among a small set of words or phrases. Here we introduce a non-invasive decoder that reconstructs continuous language from cortical semantic representations recorded using functional magnetic resonance imaging (fMRI). Given novel brain recordings, this decoder generates intelligible word sequences that recover the meaning of perceived speech, imagined speech and even silent videos, demonstrating that a single decoder can be applied to a range of tasks. We tested the decoder across cortex and found that continuous language can be separately decoded from multiple regions. As brain–computer interfaces should respect mental privacy, we tested whether successful decoding requires subject cooperation and found that subject cooperation is required both to train and to apply the decoder. Our findings demonstrate the viability of non-invasive language brain–computer interfaces.

Here we introduce a decoder that takes non-invasive brain recordings made using functional magnetic resonance imaging (fMRI) and reconstructs perceived or imagined stimuli using continuous natural language. To accomplish this, we needed to overcome one major obstacle: the low temporal resolution of fMRI. Although fMRI has excellent spatial specificity, the blood-oxygen-level-dependent (BOLD) signal that it measures is notoriously slow—an impulse of neural activity causes BOLD to rise and fall over approximately 10 s (ref. 13). For naturally spoken English (over two words per second), this means that each brain image can be affected by over 20 words. Decoding continuous language thus requires solving an ill-posed inverse problem, as there are many more words to decode than brain images. Our decoder accomplishes this by generating candidate word sequences, scoring the likelihood that each candidate evoked the recorded brain responses and then selecting the best candidate.

Popular press article: