Reward Learning

In ecological environments, many decisions are not based on information explicitly shown, but learned from experience over time. We tackled this question in a series of experiments across species using functional resonance imaging (fMRI) and magnetic resonance spectroscopy (MRS).

In ecological environments, many decisions are not based on information explicitly shown, but learned from experience over time. In macaques, we investigated the mechanisms of tracking surprising reward experiences that should be conducive to new reward learning, identifying OFC

This was in contrast to other types of surprise such as a traditional reward prediction error or spatial surprise.

Reference: Grohn*, Schüffelgen*, Neubert, Bongioanni, Verhagen, Sallet, Kolling*, Rushworth*(2020): Multiple systems in macaques for tracking prediction errors and other types of surprise, PLOS Biology doi: https://doi.org/10.1371/journal.pbio.3000899. 

While the use of learnt reward and effort magnitudes is essential for adaptive behaviour, it often needs to be weighted up against other explicitly shown or abstract information. However, despite it being essential for high level behavioural flexibility, we know very little about how this process works and the role of neurotransmitter balance in it. Thus we collected GABA and glutamate levels in dorsal anterior cingulate cortex (dACC) during a reward and effort learning task that combined learnt (magnitudes) and presented (probability) information during binary choices between gambles. 

dACC appeared to be concerned with titrating how much a person should be affected by the learnt over explicitly shown information. This appeared to be implemented through differences in the GABA to glutamate balance. 

As we only looked at between subject correlations of GABA/Glu and behaviour, future research needs to test whether a flexible use of learnt reward and effort information can be implemented through dynamic shifts in relative transmitter concentration or whether it is more of a static "brain trait".

Reference: Scholl*, Kolling*, Nelissen, Stagg, Harmer, Rushworth (2017). Excitation and inhibition in anterior cingulate predict use of past experiences. eLife doi: https://doi.org/10.7554/eLife.20365.


Environments in the real world are messy. This means it is essential to concurrently track the good and the bad, and to know what is (or is not) relevant at any given moment. To find out how the brain manages this remarkable feat, we recorded neural activity using fMRI while participants learned about varying amounts of reward and effort, which the brain tracked prominently. However, sometimes the rewards were “real” i.e. actually received, while other times they were only “hypothetical”, as it was only randomly paid out in a subset of trials. While this information should be irrelevant, it did affect participants' choices by increasing people’s choices of options that had just received a “real” reward. Importantly, although many regions were more active during real than hypothetical reward receipt, activity in different regions might have quite distinct effects. Participants with more activity in vmPFC were more affected by the realness of the reward experience, while increased anterior PFC/frontal polar activity prevented this bias by the irrelevant. Importantly, aPFC had all decision relevant information more when there was risk of interference by the irrelevant. Dorsal anterior cingulate cortex was concerned with titrating how much a person should be affected by the learnt over explicitly shown information.

Overall, this work suggests a push and pull between different systems concerned with trying to drive behaviour in different ways, frontal pole trying to contextualize reward and void bias by the irrelevant, while vmPFC attempts to drive behaviour by whatever the agent has actually experienced and dACC determines the use of learnt reward and effort information.

Reference: 

Scholl*, Kolling*, Nelissen, Wittmann, Harmer, Rushworth (2015) The Good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Reward and Effort. *equal contribution Journal of Neuroscience doi: https://doi.org/10.1523/JNEUROSCI.0396-15.2015.