Research
My general research interest is how the brain represents visual information for our perception and decisions. I have been investigating the computational mechanism of perceptual decisions and spatiotemporal structure of visual representation underlying ensemble perception and object recognition, using a multi-faceted approach, including psychophysical experiments, computational modeling, EEG, fMRI recordings, and deep neural networks.
I am now fascinated by the potential of deep neural networks as a tool to provide critical insights into neural representations for object and scene recognition. In my ongoing project, I am using fMRI experiments, deep neural networks and publicly available large-scale datasets to deepen an understanding of object and scene representations in the brain.
In this page, you will find an overview of both my past work and ongoing research projects, along with links to papers for further details. If any of these topics attract your interest, please feel free to contact me. I am happy to have a discussion with you.
Keywords: neural decoding / encoding model / perceptual decision making / ensemble perception / computational modeling / EEG / fMRI / artificial neural network / generative model
Contact: ryuto-yashiro (at) g.ecc.u-tokyo.ac.jp
Peak-at-end rule: how the brain uses visual information to make a perceptual decision
Humans make decisions based on sensory information that fluctuates over time. However, we cannot focus on all pieces of information to make decisions given the limited capacity of our visual system. This naturally leads us to the following question: what information do we use for making decisions?
Using a simple perceptual decision task, we found that humans overweight outliers occurring later in time and underweight outliers occurring earlier in time. We also found that a simple evidence accumulation model (leaky integration model) can account for this tendency. Such time-dependent decision weighting can be described as "peak-at-end" rule (similar to the peak-end rule proposed in behavioral economics), which may potentially underlie the general mechanism of perceptual decisions.
Paper: Peak-at-end rule: adaptive mechanism predicts time-dependent decision weighting
If you have broad interest in the computational mechanism of perceptual decisions, these papers may also match your interest: Perception and decision mechanisms involved in average estimation of spatiotemporal ensembles
Prospective decision making for randomly moving visual stimuli
Human participants were presented with a sequence of Gabor patterns and asked to judge if the temporal average orientation was tilted clockwise or counterclockwise relative to the vertical.
A simple leaky integration model predicts time-dependent decision weighting, with outliers presented earlier underweighted and those presented later overweighted. We found that human observers weight orientation information at each temporal frame in a qualitatively similar manner to the model's prediction.
Temporal dynamics of ensemble perception
The visual system is capable of computing an average of multiple pieces of visual information in the environment (ensemble perception). However, it remains unclear how and when ensemble perception is formed in the brain, because most previous studies on ensemble perception only used behavioral experiments.
To address this question, we decoded the temporal dynamics of orientation representations from EEG signals while human participants judge an average of multiple orientations. The decoded orientation representation showed that orientation ensemble perception is formed over approximately 600-700 ms after stimulus onset.
Paper: Decoding time-resolved neural representations of orientation ensemble perception
We used inverted encoding models to decode the representational strength of average orientation (central row in the colormap) from EEG signals. The average orientation was strongly represented in EEG signals from 400 to 700 ms after stimulus onset, as shown in the right panel.
Understanding the representation of category-selective regions underlying object and scene recognition using encoding models and large-scale datasets
A number of neuroimaging studies have reported visual regions responding selectively to certain object categories, such as fusiform face area (FFA) and parahippocampal place area (PPA). However, it remains controversial what these category-selective regions truly represent.
We are currently seeking for a deeper understanding of the representational content of these regions by leveraging large-scale neural datasets (Natural Scenes Dataset). Specifically, using language embeddings derived from captions of numerous natural images, we construct encoding models that predict the responses of category-selective regions to those images, aiming to uncover what captions potentially elicit the highest response in these regions.
The semantic content of a natural image is fully captured by a sentence (caption). In recent years, with the advent of large language models, we can extract fixed-length embeddings (left) that encode comprehensive contextual and semantic information, which can be used to predict the response of category-selective regions (right) to scene images.