D. Ognibene et al. "Addiction in a Bounded Rational Model: the Role of Exploration and Environment Structure" IMPERFECT DECISION MAKERS: ADMITTING REAL-WORLD RATIONALITY, NIPS 2016 Workshop

Post date: Dec 26, 2016 3:59:20 PM

D. Ognibene, V. G. Fiore, X. Gu "Addiction in a Bounded Rational Model: the Role of Exploration and Environment Structure" IMPERFECT DECISION MAKERS: ADMITTING REAL-WORLD RATIONALITY, NIPS 2016 Workshop

Current Reinforcement Learning (RL) models of addiction are usually set in the `dual-system theory of decision making'. These models often assume that the habitual system, modelled as a model-free (MF) RL, plays a key role by overtaking the goal-oriented system, modelled as model-based (MB). These computational models hypothesise that the generation of aberrantly strong habits may result from drug induced bio-chemical effects hijacking the standard model-free temporal difference (TD) learning mechanism.

Nonetheless, these models are limited in explaining: 1) addictive behaviors which are not associated with drug intake (e.g. gambling); 2) imaging data showing increased activity in areas usually associated with goal-directed processes in drug addict individuals; 3) the heterogeneity of drug seeking behaviors, as habitual or model-free process seem to account only for compulsive-type and stereotyped behaviors.

These issues call for a role in addiction of the model-based process. Its implementation in the brain is still poorly understood currently blocking the development of detailed hypotheses about drug induced malfunctioning. On the other-hand the effects of environment complexity combined with computational limits of an agent have received limited attention in addiction modelling studies.

We present a set of simulations with integrated bounded model-based and standard TD learning model-free components.

In each environment, addiction-like behaviours,such as long-lasting, repeated, pursuing of sub-optimal goals, appear as a bifurcation depending on the evolution of the stochastic explorative process. As a result, identically parametrised subjects, i.e. level of influence of the model based component, in identical environments can show opposite behaviours due to few initial random choices.

The suboptimal behaviours are induced by the environment structure where high immediate reward follows drug consumption and limited punishment is spread over many successive states. This characterization affects the accuracy of the bounded model-based computations as well as learning in both model-based and model-free process.

The relevance of such explorative processes raises when the environment undergoes substantial changes which are known to result in high level of stress. We present preliminary results showing the triggering of addicted behaviours in these specific conditions. This phenomenon cannot be explained by current habit dominance models. The results may help modelling new prevention therapies.