Vinci Project

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 845176

Summary of the context and overall objectives of the project

Opportunities to influence the world through one’s own actions are a major predictor of high quality of life in humans and other animals. For example employees who can choose how to organize their daily tasks show lower levels of stress and longer life expectancy than those without this control. This improved quality of life may result directly from better outcomes due to choices made, but it also raises the possibility that choice opportunities may themselves have evolved to be desirable. Thus choice-seeking behavior may be intrinsically motivated by a kind of intrinsic reward (IR, also refer to as cognitive reward) distinct from extrinsic rewards (ER - e.g., food or money), with the former not necessarily leading immediately to the achievement of extrinsic goals. IRs may facilitate survival and well-being by improving long-run ER intake. One way this has been proposed to work is that behaviors such as choice-seeking promote the search for states that maximize the agent’s ability to survive and reproduce, thereby selecting individuals with an innate bias (or preference) for these types of behaviors. Experimental evidences also support the idea that choice opportunities are themselves rewarding. For example in one experiment, human subjects were asked to purchase one of their preferred items. They were first asked to choose between two options, one that yields a preferred but relatively inexpensive item and another that yields the opportunity to choose between a non-preferred item and another item which is similarly preferred but more expensive than the item in the previous option. Surprisingly subjects tend to choose the latter option at extra cost, presumably because they were able to choose it. Similar preference for choice opportunities has been measured in humans and other animals such as rats, pigeons and monkeys. This choice-seeking preference may be related to the broader desire for a “sense of control”, which is mediated by the opportunity to choose and whose impairment has been linked to neuropsychiatric disorders. However none of these experiments clearly determines how learning and representation of ERs relate to choice-seeking, nor whether choice-seeking changes once ER associations are learned. Despite the importance of choice-seeking and desire for control in our daily lives, how IRs such as choice opportunities are encoded in the brain remains a major unanswered question. This research project seeks to better characterize the behavioral and neural mechanisms involved in the encoding of IRs promoting intrinsic motivation.

Improving control over ER intake builds on a main strategy: choosing the most valuable option to maximize gains. It is therefore unclear whether IRs can be easily dissociated behaviorally from ERs. If they can be, understanding how these different reward types interact would be revealing about how extrinsic (triggered by ER) and intrinsic (triggered by IR) motivations are combined to guide real-world actions. Evidence suggests that IRs may be encoded by the dopamine (DA) system in a way similar to how DA neurons encode ERs. Midbrain DA neurons send massive projections to the prefrontal cortex (PFC) and the striatum and are central in coding and learning of ERs. These neurons encode a large range of rewarding experiences and generate “reward prediction errors” to signal unpredicted outcomes and changes from conditioned expected value . Under the neural common currency theory, the DA system may compare expected outcomes on a common value scale in order to evaluate which option is preferable. I hypothesize that IRs are encoded in the DA system in a manner consistent with neural common currency theory; that is DA activity should consistently reflect subjects’ preferences when IRs are pitted against different ERs. There is suggestive evidence that basal ganglia nuclei (receiving dense DA inputs), such as the striatum, are involved in encoding preference for choice and IRs. For example humans expressing more DARPP-32, a gene linked to DA plasticity, show a stronger bias for choosing items they were free to explore compared to items they were forced to sample. Interestingly Parkinson’s disease patients show deficits in contexts where different choices of ERs were possible relative to healthy control subjects.

Despite evidence suggesting a role of the DA system in the encoding of choice, none of the aforementioned studies systematically dissociated choice availability from ER intake or tested the dedicated neural networks involved in representing these different forms of reward and motivation. Resolving these questions could have broad implications for better understanding of reward systems and cognitive control in general. We proposed to developed a behavioral task that permits dissociating the value of choice opportunities and ER intake. In the research project, subjects performed two-stage trials where they made an initial decision to accept or reject the opportunity to choose between ERs. We independently manipulated choice-seeking by varying both IRs (choice availability) and ERs, allowing us to quantify the relative contribution of choice availability to behavior under a range of ER contexts. We rigorously explored these different forms of rewards in primates (humans and monkeys, two species with a comparable DA system that differs from rodents). The goal was to test whether IRs can trigger a distortion of reward intake like a ‘DA bonus’ (i.e. a DA effect not explained by ER expectation), and how this recruits similar or different neural networks.

The aims of the proposal were two-fold:

Aim 1. Design and test behavioral experiments to characterize primate preference for intrinsic reward, such as choice opportunity, and measure its relative value using quantitative model.

Aim 2. Identify the neural network and mechanisms involved in encoding of intrinsic reward and investigate the specific function of the DA system in its valuation.


Work performed during the project

Despite the fact that the sense of control is a crucial component of our lives and have been linked to mental and neurological disorders when impaired, its associated neural mechanisms remain poorly understood. Improving control requires performing good choice to gather impending outcomes. Thus contexts with the opportunity to choose between several options should be preferred by agents over uncertain situations where choice is absent. Therefore choice-opportunity could be considered as IR. To explore these questions and address my aims, I defined and pursue different tasks that I described below.

We first developed a behavioral task dissociating preference for choice opportunity and ERs and hypothesize that these different rewards are encoded by the DA system in a manner consistent with the common currency theory. This theory postulates that when the DA system needs to evaluate which of several variables is preferable, it must compare them on a common value scale. Thus DA activity should be consistent with subject preferences when IRs and ERs are pitted against each other. In humans and monkeys we quantified the relative contribution of this IR to the observed behaviors and investigate the neural networks involved in these mechanisms.

The key idea was to use a two-stage design where we can manipulate IRs and ERs independently in order to characterize the conditions under which IR influence behavior. Our main design involves subjects making sequential decisions within the two-stage structure. In the first stage, subjects make an initial decision (meta-choice) to accept (free trials) or reject (forced trials) the opportunity to freely choose between fractal images associated to ERs during a second-stage. We independently manipulated choice availability, and the two-stage structure allowed us to separate the behavioral influence of IRs (choice) from that of ERs. Each block was divided in a train phase follow by a test phase to ensure that the subjects learned the associations between the different fractal targets and extrinsic reward probabilities before performing the test phase where they express their preference for the presence or absence of choice.

Our results show a clear preference for free over forced trials in humans (n=94) since subjects selected in about 70% of the case free trials during the first-stage of the task. This preference was independent from the ER accumulated by the subjects. Modeling investigation using reinforcement learning models shows that it was necessary to incorporate overvaluation of extrinsic rewards obtained from free actions to account for choice-seeking behavior. Behavioral results in monkey, performing a similar task, suggested the same pattern.

To test whether DA system can be involved in the encoding of IRs we first asked how deficits of this neural system can affect preference for IR. We first investigated performance of Parkinson’s disease (PD) patients (who suffer from a loss of DA neurons) with and without their treatment using the same task than the one described previously. These patients were subjects using DA medication to compensate their deficits (DT group) or PD patients who underwent deep brain stimulation (DBS) surgery of the subthalamic nucleus.

We carried out our main experiment in 44 patients where 24 patients were tested ON and OFF DT, and 20 patients were tested ON and OFF DBS one year after they underwent DBS surgery. Since the impaired DA system of PD patients could result in deficits in motivation and extrinsic reward assessment, it was important to first characterize the choice of the PD patients during the second-stage in the free trials (i.e. targets with small vs. high probability of extrinsic reward). Here we found that patients selected accurately the most rewarded target during the second-stage whether they were ON or OFF treatment. However data indicates that during the first-stage overall preference of the patients for free trials increase after reestablishment of their treatments (i.e. ON > OFF). This set of results postulate a key role of the DA system in intrinsic reward assessment and intrinsic motivation.

Finally we have been developing an experimental set-up that will ultimately allow us, in non-human primates, to investigate specifically the calcium-activity of the midbrain DA neurons along with simultaneous electrophysiological recordings of neurons associated to the DA system. The goal is to gain insights about the correlation between the reward encoding by DA neurons and the associated modulation of downstream neurons involved in motivational and learning mechanisms. The first step was thus to target specifically the dopaminergic system of the primate brain which we achieved by designing a vector cocktail that allows specific expression of the DA neurons.


Progress beyond the state of the art and potential impacts

Gaining insights about intrinsic reward mechanisms through this project could have broad implications for better understanding of reward systems, of cognitive control, and of many cognitive dysfunctions such as motivational deficits linked with impaired dopaminergic system.

We combine several major innovative approaches. We designed a novel rigorous behavioral task to observe how primate are intrinsically motivated by choice opportunity and how different neural networks participate to the encoding of intrinsic rewards. More specifically we have shown that primates show a preference for choice opportunity irrespectively of the amount of extrinsic reward (e.g. money) at stake in the task. We then tested Parkinsonian patients with selective deficiency of the targeted neural system (dopamine) which allowed controlled test of our hypotheses about the neural system dedicated to intrinsic reward processing. Finally this task has been designed to delineate, at the single cell level in non-human primates, the dynamics of such mechanisms. Indeed little is known about how dopaminergic reward signals are processed between brain areas in primates. Thus the approach we have started to develop in this project will allow us to target specifically the dopaminergic system and investigate how this system encodes intrinsic rewards thanks to state-of-the-art neurophysiology techniques.

Investigating mechanisms of intrinsic reward and intrinsic motivation in primates is a crucial, but under-explored, question in the field of dopamine and reward systems and this multidisciplinary approach allows us to address it. Importantly these systems gather a very large scientific community (e.g. decision-making or learning mechanisms) and are also severely impaired in many diseases. It will provide new insights in primate neurophysiology but also for neurological patients, for whom we show impaired valuation of intrinsic rewards. This could allow for example deep brain stimulation neurosurgeons to improve electrode implantation for patients with several types of deficits such as motor but also motivational dysfunctions.