Associative learning mechanisms underpinning the transition from recreational drug use to addiction Lee Hogartha, Bernard W Balleineb, Laura H Corbitc Simon Killcrossa a. School of Psychology, School of Psychology, University of New South Wales, Sydney, NSW, 2052, Australia. b. Brain & Mind Research Institute, University of Sydney, 100 Mallet St, Camperdown, Sydney, NSW, 2050, Australia c. Laura H. Corbit, School of Psychology, Brennan MacCallum Building, University of Sydney, Sydney, NSW, 2006, Australia. Author for correspondence: Lee Hogarth School of Psychology University of New South Wales Sydney NSW 2052 Australia. Phone: +61 (0)2 93853038 Fax: +61 (0) 9385 3641 Email: l.hogarth@unsw.edu.au Short title: Abnormal learning underpinning dependence Acknowledgements: This work was supported by MRC grant #G0701456 to LH, NHMRC grant #633268 to BB&SK, and NHMRC grant #568872 to SK&BB 1 Abstract Learning theory proposes that drug-seeking is a synthesis of multiple controllers. Whereas goal-directed drug-seeking is determined by the anticipated incentive value of the drug, habitual drug-seeking is elicited by stimuli which have formed a direct association with the response. Moreover, drug-paired stimuli can transfer control over separately trained drug-seeking responses by retrieving an expectation of the drug’s identity (specific transfer) or incentive value (general transfer). This review covers outcome devaluation and transfer of stimulus-control procedures in humans and animals which isolate the differential governance of drug-seeking by these four controllers following various degree of contingent and noncontingent drug exposure. The neural mechanisms underpinning these four controllers are also reviewed. These studies suggest that although initial drug use is goal-directed, chronic drug exposure confers a progressive loss of control over action selection by specific outcome representations (impaired outcome-devaluation and specific transfer), and a concomitant increase in control over action selection by antecedent stimuli (enhanced habit and general transfer). The prefrontal cortex and mediodorsal thalamus may play a role in this drug induced transition to behavioural autonomy. Key words: addiction; learning theory; goal; cue-reactivity; habit. 2 Introduction A recurrent theme in addiction theory is that drug-seeking has multiple determinants. Wikler1 argued that the euphoric effects of the drug maintained initial drug use whereas addiction itself stemmed from the emergence of a withdrawal syndrome. Tolerance2 and opponent-process theories3 elaborated this notion of a shift from positive to negative reinforcement. Subsequently, however, the importance of negative reinforcement was questioned by the observation that drug self-administration engages dopamine, the brain substrate of reward,4 and by the lawful relationship between the frequency of drugseeking and the magnitude of drug reward.5 But by denying the importance of negative reinforcement (but see 6), positive reinforcement theorists were put at pains to explain the transition between recreational drug use and addiction. Tiffany7 answered this question from a cognitive viewpoint, arguing that drug-seeking may be mediated by desire, or elicited automatically by drug cues, and the latter controller predominates in addiction. Robinson and Berridge8 made a similar argument from a behavioural neuroscience perspective, stating that drug-seeking may be driven by hedonic anticipation of the drug (liking), or autonomous cue-locked conditioned behaviour (wanting), thus accounting for addicts’ paradoxical continuation of drug use despite their declared desire to quit. Contemporary addiction theories have elaborated these themes. The behavioural economists have garnered evidence that human drug dependence is a choice recruited by the reinforcement value of the drug,9 but is also accompanied by an inability to utilise knowledge of abstract future consequences in decision making.10 Similarly, animal learning theorists have substantiated evidence that drug self-administration is a function 3 of the reinforcement value of the drug11, 12 but also undergoes a transition to automatic control by drug paired stimuli.13 Finally, cognitive neuroscientists have shown that drug liking is associated with drug induced dopamine activation14 and that clinically diagnosed addiction is accompanied by hypofrontality and executive dysfunction.15 The common theme in all of these frameworks, therefore, is that initial drug use is mediated by the drug acting as a positive reinforcer, whereas the transition to clinical dependence is linked to a loss of intentional regulation and concomitant emergence of automatic control over drug-seeking. Learning theory and addiction The current review aims to detail this transitional theory of addiction by inspecting human and animal learning research which has tested the differential governance of behaviour at various stages of drug exposure. The ideas developed here were first introduced by Norman White who drew a link between the role of the striatum in memory and addictive behaviour16, 17. The formal associative learning account was then outlined by Anthony Dickinson during symposium proceedings from empirical work with natural rewards18. These ideas were then translated to behavioural neuroscience research with addictive drugs in collaboration with Trevor Robbins and Barry Everitt 19-21 . Simultaneously, behavioural neuroscience research continued with natural rewards which clarified the associative mechanisms outlined here22-24, and which are depicted schematically in Figure 1. According to this perspective, experience of the drug outcome is encoded separately in terms of its specific sensory correlates or perceptual identity (Oi) and its consummatory, post-ingestive or incentive value (Ov), and these two 4 representations of the drug can differentially enter into associations.25, 26 As a consequence, the agent (person or animal) acquires four forms of associative knowledge. (1) Goal-directed learning. The agent acquires knowledge of the instrumental contingency between the drug-seeking response and the drug’s identity and value (R-Oiv). Moreover, the representation of the drug’s value is updated by internal states, such as deprivation or satiety, which predict the experienced value of the drug. Consequently, retrieval of the representation of the drug and its current value (Oiv) determines the propensity to select the associated drug-seeking responses from amongst competing outcome choices based on a comparison of their relative values.27 Thus, a higher value drug produces a greater proportion of intentional choice of that outcome from amongst alternative rewards.28 (2) Habit learning. The agent forms an association between external stimuli (S) and the drug-seeking response (R) in proportion to the contingent co-occurrence of these two events prior to drug reinforcement and the reinforcement value of that outcome (Ov).29 This S-R/reinforcement process enables the drug stimulus, when reencountered, to elicit the drug-seeking response directly without retrieving any representation of the drug outcome. Such habitual drug-seeking accords with the clinical characterisation of addiction as reflecting a loss of intentional regulation of behaviour. (3) Specific transfer. External stimuli also acquire an association with the drug outcome in accordance with the predictive contingency between these events, enabling stimuli to retrieve a representation of the drug’s identity and/or value. Retrieval of the outcome’s identity (S-Oi) can, in turn, elicit separately trained instrumental responses that are 5 associated with that same outcome via a bidirectional O-R, or ideomotor, connection (SOi-R).30 (4) General transfer. By contrast, retrieval of the outcome’s affective value (S-Ov) elicits a motivational state akin to the drug itself, which exerts a general excitatory effect on prevailing responses controlled by the other associations ((S-Ov)-R).31 The claim made in the current paper is that these various forms of behavioural control interact to determine the propensity to engage in drug-seeking at any given moment. Our claim is that continuing drug exposure impairs retrieval or utilisation of the representation of specific outcome identities (Oi), thus impairing control of action by knowledge of specific outcome (R-Oiv and S-Oi-R) towards the more general control of actions by antecedent stimuli (S-R and (S-Ov)-R). We now turn to empirical evidence for this psychological account of addiction. [Insert Figure 1 here] 1. Goal-directed drug-seeking The outcome devaluation procedure provides the principal method for identifying goaldirected control.32 A version of this procedure is presented in Table 1. In this procedure, rats learn that two different lever press responses (R) produce different rewarding outcomes (O). For example, one lever may produce drug reward such as alcohol or cocaine (O1) whereas the other lever produces an alternative natural reward such as sucrose (O2). The drug is then devalued by pairing it with lithium chloride induced gastrointestinal sickness, specific satiety, or related manipulation, such that the value of 6 the drug is diminished. The critical test then comes when the animal is again given the opportunity to press the two levers in an extinction test where the responses no longer produce their respective rewards. The question at stake is whether the animal will reduce responding for the drug outcome (R1<R2). Because the outcomes are not presented in the extinction test, any such devaluation effect cannot be attributed to S-R/reinforcement (habit) learning, that is, by experience of the drug outcome modulating the capacity of contextual cues to elicit drug-seeking response. Furthermore, because the procedure contains no stimuli that differentially signal the two outcomes, a devaluation effect cannot be attributed to a change in capacity of such cues to elicit responding for their associated outcomes (S-Oiv-R). Instead, any reduction in drug choice in the extinction test must be mediated by animals’ integration of knowledge of the R-Oiv contingencies acquired during instrumental training, with knowledge of current low value of the drug outcome (Ov) acquired during the devaluation treatment, which together determine the propensity to select that response. In other words, a devaluation effect in the extinction test demonstrates that drug-seeking is goal-directed in that it is determined by the anticipated reward value of the drug. [Insert table 1 here] Two studies illustrate the outcome-devaluation procedure in demonstrating goal-directed control of drug-seeking. In the study by Olmstead et al.33, rats were trained on a seekingtaking chain in which they had to press a seeking lever to gain access to a taking lever, which in turn delivered intravenous cocaine. To test whether the seeking response was goal-directed, the taking lever was extinguished by terminating cocaine delivery. The 7 seeking lever was not present during this extinction training. The fact that this extinction training led to an immediate reduction in rats’ performance of the seeking response in extinction indicated that this response was mediated by knowledge of its consequences, i.e. the low current value of the taking lever. Hutcheson et al.34 employed a similar design. Training on a seeking-taking chain for heroin was followed by a revaluation treatment in which self-administration via the taking response was experienced in a withdrawal state to establish the high value of heroin in this state. Rats were then again given access to the seeking lever in extinction, and the finding that withdrawal produced an increase in performance of the seeking response indicated that it was goal-directed in that is was mediated by knowledge of the current high value of the heroin outcome. The outcome-devaluation procedure has also been modified for humans.35, 36 In the concurrent training stage of these experiments, mainly student smokers learned two key press responses, where R1 produced tobacco points and R2 produced chocolate points. Tobacco was then devalued by smoking to satiety or evaluation of smoking health warnings, e.g. ‘smoking causes cancer’36, or by administration of nicotine nasal spray.35 The finding that tobacco choice in the extinction test was sensitive to these devaluation treatments (R1<R2) indicated that it was goal-directed in being mediated by knowledge of the current value of the drug outcome. A key observation replicated in these human experiments was that individual variation in level of tobacco dependence was associated with a preferential selection of the tobacco over the chocolate response. Similar preferences have been established in animals11 and 8 human cocaine users37 and confirms the economic theorists’ main contention that drug dependence reflects individual differences in the reinforcement value of the drug9. The outcome-devaluation procedure qualifies this notion by distinguishing the contribution of goal-directed (R-Oiv) and habitual (S-R) drug-seeking to this drug preference. We know that choice of the drug-seeking response was goal-directed, as it was sensitive to devaluation in the extinction test. Any residual contribution of S-R learning to this drug preference would be marked by variation in sensitivity to devaluation treatment in the extinction test. As there was no systematic variation across levels of nicotine dependence in sensitivity to devaluation, it may be concluded that preferential tobacco choice was mediated entirely by valuation of the drug as a goal, and not by differential S-R formation. The conclusion, therefore, is that drug-seeking within these parameters is goal-directed, and that level of dependence, at least at this early stage of drug exposure, reflects the valuation of the drug as a specific goal (see 38-42). 2. Habitual drug-seeking As noted, the outcome devaluation procedure can evaluate the habitual status of instrumental performance32 (see Table 1). Whereas sensitivity of drug-seeking to devaluation in the extinction test (R1<R2) signifies goal-directed control, insensitivity to devaluation in the extinction test (R1=R2) demonstrates that retrieval of the current value of the drug plays no role in drug-seeking. Instead, drug-seeking is deemed to have become habitual, being elicited by contextual stimuli which have acquired a direct S-R association with drug-seeking during instrumental training, without retrieving a representation of current value of the drug. 9 Two studies illustrate the use of the outcome-devaluation procedure to demonstrate the habitual status of drug-seeking. In the first study, Dickinson et al.13 trained rats to acquire two instrumental responses, one for alcohol and one for food pellets, before one of these outcomes was devalued by pairing it with lithium chloride induced sickness. When the rats were again given the opportunity to respond for these outcomes in extinction, it was found that performance of the food-seeking response was reduced by the devaluation treatment, indicating that food-seeking was goal-directed. By contrast, performance of the alcohol-seeking response was insensitive to devaluation suggesting that alcoholseeking had become an S-R habit. A second study used a similar design to confirm that cocaine-seeking was similarly prone to habitual control compared to natural rewardseeking.43 A question arises as to why habitual drug-seeking was established by these two procedures13, 43, whereas goal-directed drug-seeking was found in the earlier designs.33-36 In explaining these divergent results, one might appeal to a number of variables that have been demonstrated to modulate the balance between goal-directed and habitual control, including position of the response within an instrumental sequence or chain,44-46 amount of training,47, 48 number of available responses49 and/or reinforcement value of the outcome.50 The important point made by this literature, is that goal-directed and habitual actions exist in a dynamic balance that can be biased in one direction or the other by conditions of training or testing that favour acquisition/expression of the R-O versus S-R association. Our basic argument is that within this complex system, drugs exert a constant 10 pressure in favour of the S-R association by impairing retrieval or utilisation of the specific identity of outcomes. Corbit et al.51 has recently mapped the progressive transition to habitual control of drugseeking with extended training. In this study, rats acquired a self-administration response for alcohol, before alcohol was devalued by ad libitum consumption (satiety). Alcoholseeking was then tested in extinction to evaluate goal-directed control of this behaviour. The important result was that following two weeks of self-administration training, the response remained sensitive to devaluation, but by eight weeks of training, the response was insensitive to devaluation, suggesting a transition from goal-directed to habitual control had occurred with training (cocaine-seeking shows a similar transition to habit with extended training52). An important additional finding of this study was that noncontingent administration of alcohol was sufficient to accelerate habitual control over natural reward seeking responses. Thus, not only do drug self-administration responses become habitual, but non-contingent drug exposure renders contemporaneously acquired naturally rewarded instrumental actions habitual. In humans, a comparable effect of noncontingent alcohol exposure on habitual control has recently been demonstrated.53 Participants were administered with 0.4 g/kg of alcohol or placebo before instrumental training with R1 and R2 for chocolate and water respectively. Chocolate was then devalued by ad libitum consumption before choice between R1 and R2 was tested in extinction. The finding that alcohol attenuated goaldirected control over chocolate choice in the extinction test supports the translational 11 relevance of animal models, and suggests that accelerated habit learning can be demonstrated with acute drug dosing. A key study by Nelson and Killcross54 has revealed that non-contingent drug exposure enhanced habit formation via an effect at instrumental training rather than at the extinction test. They pre-exposed rats to amphetamine for 7 days before a 7 day injectionfree period. Instrumental training for sucrose was then undertaken before this outcome was devalued by specific satiety or lithium chloride induced sickness. The results from the extinction test indicated that both devaluation treatments failed to modify sucroseseeking in the amphetamine exposed rats suggesting this response had become habitual, whereas placebo rats showed goal-directed control (see also50, 55). Importantly, chronic amphetamine only accelerated habit formation if administered prior to instrumental training, but not if administered after training. Consistent with this, all the aforementioned studies which have shown effects of contingent13, 43, 51 and noncontingent51 drug exposure on habit learning have undertaken drug administration contemporaneously with instrumental training. The implication, therefore, is that during instrumental training the ability of the outcome representation to enter into new learning may be impaired by drug exposure, favouring acquisition of the S-R over the R-Oiv contingencies, but once R-Oiv learning is acquired drug-free, deployment of this knowledge at test is not impaired by drug exposure. In reconciling the aforementioned studies, one can propose a transitional model of addiction wherein initial drug-seeking is goal-directed,33-36 but following extended training comes under habitual S-R control, 13, 43 and contemporaneously acquired natural 12 reward seeking also comes under habitual control.51, 53, 54 Ultimately, the agent’s behavioural repertoire comes to be dominated by S-R habits. 3. Specific transfer of stimulus control over drug-seeking The Pavlovian to instrumental transfer procedure is the principal method for demonstrating control over responding by stimuli retrieving a representation of the specific identity of the outcome (Oi) (e.g.56 – see table 2). In this design, rats are given Pavlovian training in which one stimulus (S1) signals drug availability (O1), whereas a second stimulus (S2) signals the availability of an alternative reward, for example sucrose (O2). Separate instrumental training is then undertaken wherein rats learn that one lever produces the drug (O1) whereas the other lever produces sucrose (O2). Finally, in the Pavlovian to instrumental transfer test, each stimulus is presented for the first time while the two instrumental responses are available in extinction. The question at stake is whether each stimulus will enhance performance of the response with which it shares the same outcome (i.e. S1:R1>R2, S2:R1<R2). Such an outcome-specific transfer effect demonstrates that each stimulus retrieved a representation of its associated outcome, which in turn retrieved the response that was associated with that outcome (S-Oi-R). The effect cannot be attributed to the formation of an S-R association because the stimuli and the responses were never contingently reinforced during training, and furthermore, because the transfer test was conducted in extinction, so no S-R association can form across that period either. [Insert table 2 here] 13 There is currently only one demonstration of outcome-specific transfer of stimulus control over drug-seeking per se57 (although there are many demonstrations in natural reward learning58). In this study, mainly student smokers first learned that two arbitrary stimuli (S1 and S2) predicted tobacco points or money, respectively, before learning that two responses (R1 and R2) earned tobacco points and money respectively. In the transfer test, the two stimuli were found to selectively enhance performance of the response that had earned the same outcome. Thus, each stimulus must have retrieved a representation of its associated outcome (points) which in turn elicited the response that had produced that outcome (S-Oi-R). 4. General transfer of stimulus control over drug-seeking By contrast, in a related animal procedure, Corbit and Janak59 paired S1 and S2 with ethanol or sucrose, respectively, and then trained R1 and R2 with these same outcomes, respectively. The results showed that the ethanol stimulus enhanced the rate of both R1 and R2 equally above a no-stimulus baseline, indicating that this stimulus exerted a general excitatory effect on instrumental reward-seeking by retrieving the value (S-Ov) rather than identity (S-Oi) of the outcome. By contrast, the sucrose stimulus produced a specific transfer effect, selectively enhancing instrumental responding for sucrose over ethanol, indicating that it had retrieved the outcome’s identity (S-Oi). These data are consistent with the view that drug associated cues favour general facilitatory effects on appetitive instrumental responses compared to natural reward-paired cues (see also60, 61). The divergent results of these human and animal transfer studies may be resolved by appealing to Konorski’s view25 that outcomes are encoded separately in terms of their 14 perceptual identity (sensory correlates) and consummatory or incentive value.26 On this view, the tobacco points outcome utilised by Hogarth et al.57 was largely perceptual and minimally consummatory, and so the stimulus paired with this outcome favoured a specific transfer effect which relied on the retrieval of this outcome’s perceptual identity (S-Oi-R). By contrast the ethanol consummatory outcome employed by Corbit and Janak59 possessed a substantial pharmacological/consummatory effect, and so the stimulus paired with this event favoured a general motivational enhancement based upon retrieval of the outcome value ((S-Ov)-R). Other studies substantiate this characterisation of the specific and general forms of stimulus control.62 First, the magnitude of the specific transfer effect is determined by the reliability of the S-O contingency in training,63-65 but is insensitive to outcome devaluation.31, 66-68 Importantly, specific transfer effects by drug cues on drug-seeking in humans are similarly insensitive to devaluation achieved by drug satiety, health warnings36, 69 and pharmacotherapy.35 Moreover, the finding that drug cue effects on subjective craving70, 71 and drug-taking69 are similarly autonomous of devaluation by satiety and pharmacotherapy, supports the validity of specific (S-Oi-R) transfer effects in addiction. By contrast, general transfer effects are modulated by devaluation of the outcome,31, 72 and cross over to other reinforcers of the same hedonic category.73 Thus, general transfer effects are deemed to be mediated by the stimulus retrieving a representation of the current value (Ov) but not identity (Oi) of the outcome, and as a consequence, the effect is sensitive to changes in motivational state but is not response selective ((S-Ov)-R). 15 Not only does contingent drug exposure cause drug cues to favour general over specific transfer59, but noncontingent drug exposure may also cause natural reward cues to undergo this same transition. In a recent study, Shiflett et al.74 found that noncontingent exposure to chronic amphetamine administered following Pavlovian and instrumental training, that is, prior to the transfer test, abolished the specific transfer effect and enhanced the general transfer effect. Specifically, rats received Pavlovian training in which S1 predicted chocolate and S2 predicted grain. Instrumental training was then undertaken in which two responses, R1 and R2, earned these same outcomes, respectively. Then, half of rats were given 7 days of amphetamine administrations and the remainder placebo (akin to 54). Finally, in the transfer test the two stimuli were tested for a specific transfer effect in which each stimulus selectively enhanced responding for the same outcome, or a general transfer effect in which each stimulus enhanced responding for both outcomes equally above a pre-stimulus baseline. The remarkable finding was that amphetamine exposure prior to test abolished the specific transfer effect and enhanced the general transfer effect. A similar enhancement of the general transfer effect produced by natural reward cues on reward-seeking has been found following acute75 and chronic76 amphetamine administered prior to the testing phase, although in these latter studies no was made attempt to assess specific transfer. Overall, these studies favour a transitional model wherein early in training, drug cues retrieve the drug’s identity and thus produce specific transfer effects.35, 36, 57 Extended drug exposure, however, causes stimuli to lose contact with drug’s identity and instead make contact with the drug’s value, thus causing a transition from specific to general 16 transfer.59-61 Moreover, contemporaneously acquired natural reward cues also shift contact from their outcome’s identity to its value, causing a comparable transition from specific to general transfer.61, 74-76 Synthesis of psychological studies The transition to behavioural autonomy depicted across the studies reported here is consistent with a singular impairment in the capacity to retrieve or utilise the specific identity of outcomes as a consequence of drug exposure. This impairment can explain why drug-seeking is initially goal-directed (R-Oiv) and under specific stimulus control (SOi-R), but then becomes habitual (S-R) and under general stimulus control ((S-Ov)-R). Whereas the former two controllers require a representation of the specific identity of the drug, the latter two controllers do not. Moreover, the finding of the same transition in natural reward-seeking responses acquired contemporaneously with drug exposure suggests that the impairment in capacity to represent the specific outcomes applies to the entire class of appetitive rewards (it remains to be seen whether aversive outcomes are similarly affected). Finally, this account suggests that stress77, trait impulsivity,78, conflict,80, 81 hypofrontality82-84 and schizophrenia85, 86 79 may be linked with drug dependence and relapse because they exacerbate this impairment in capacity to represent/utilise specific outcome identities. The claim that a single impairment underpins both the loss of goal-directed control and the loss of specific transfer is challenged by a dissociation between these two effects. Specifically, goal-directed control is abolished by chronic amphetamine administered prior to training, but not administered prior to test, suggesting that chronic amphetamine 17 impairs acquisition of response-outcome knowledge during instrumental training but does not directly impair the retrieval/utilisation of outcome identify required for goal-direction control at test.54 By contrast, chronic amphetamine can abolish specific transfer when administered prior to test74 suggesting that chronic amphetamine can directly abolish retrieval/utilisation of outcome identity required for the specific transfer effect. Identifying a common learning mechanism that operates during both instrumental training and the transfer test to produce the observed transition to behavioural autonomy is arguably crucial for isolating the core psychological pathology in addiction. Neural basis of action control The following section reviews animal studies which have examined the neural basis of the four controllers underpinning natural reward and drug-seeking. The purpose of this section is to identify substrates upon which chronic drug exposure might act to produce the transition to autonomy depicted above, i.e. reduce goal-directed learning and specific transfer, and/or enhance habit learning and general transfer. 1. Neural basis of goal-directed action Lesions of the prelimbic (PL) region of the prefrontal cortex have been shown to produce precisely the same deficit in goal-directed control as chronic amphetamine.54 That is, lesions of the PL abolish goal-directed control of natural reward-seeking if they occur prior to instrumental training48, 87, 88 but not if they occur prior to test.89 Comparable effects have been found following lesions of the mediodorsal thalamus, which also abolish acquisition90 but not expression91 of goal-directed action. As the mediodorsal thalamus provides the major thalamic input to the PL it is believed that these two regions 18 form a functional circuit. The correspondence of PL, mediodorsal thalamic lesions and chronic amphetamine exposure on acquisition of goal-directed control supports these two brain regions in mediating the effect of drug exposure on transition to behavioural autonomy. By contrast, the dorsomedial striatum (DMS) has been shown to be essential for both acquisition92, 93 and expression94 of goal-directed learning. Importantly, post-training DMS inactivation has been shown to abolish goal-directed control of alcohol-seeking, suggesting common control mechanisms underpinning both natural and drug reward goal-directed learning51. Additionally, lesions of the basolateral amygdala (BLA) abolish sensitivity to outcome-devaluation whether given before95, training.91, 97 96 or after instrumental Thus the DMS and BLA, in failing to mimic the selective effect of amphetamine on loss of goal-directed learning at acquisition, may not play a direct role in drug sensitization-induced transition to behavioural autonomy. 2. Neural basis of habitual action Habitual action, by contrast, is mediated by the dorsolateral striatum (DLS) and infralimbic cortex. As noted earlier, overtraining instrumental contingencies favours a transition from R-Oiv to S-R control, that is, progressive loss of sensitivity to devaluation in the extinction test.47 However, rats with lesions to the DLS either pre- or post-training fail to develop habitual control and remain sensitive to devaluation irrespective of training, indicating that the DLS is required for the acquisition and expression habit learning.51, 98, 99 Importantly, post-training inactivation of the DLS has also been shown to abolish expression of habitual cocaine-seeking52 and alcohol-seeking51 following 19 extended training, rendering these behaviours once again goal-directed, and confirming the common control mechanisms underpinning both natural and drug reward habit learning. Additionally, pre-training functional disconnection of DLS and the amygdala central nucleus CN has also recently been shown to abolish habitual control of action and restore goal-directed control29. Finally, lesions of the infralimbic cortex made prior to instrumental training abolish the transition to habit following overtraining.48 Thus, chronic drug exposure might act on any of these regions to promote the dominance over behaviour by S-R habits. 3. Neural basis of specific transfer of stimulus control The ability of stimuli to transfer selective control over separately trained instrumental responding for the same outcome is abolished by pre-training62 and post-training lesions of the orbitofrontal cortex (OFC),100 by pre-training lesions and post-training inactivation of the nucleus accumbens (NAC) shell,101, 102 by pre-training96, 103, 104 and post-training91 lesions of the BLA, and pre-training functional disconnection between these two structures.105 In addition, post-training inactivation of the DMS106 and post-training lesions of the mediodorsal thalamus91 also eliminate the selective transfer effect. Thus, in order to impair specific transfer, chronic drug exposure may act on any of these structures. 4. Neural basis of general transfer of stimulus control The ability of conditioned stimuli to produce a general excitatory effect on separately trained instrumental responses is abolished by post-training inactivation of the DLS,106 post-training inactivation ventral tegmental area (VTA),31, 20 107 pre- or post-training inactivation of the NAC core,102, 108 and pre-training lesions of the amygdala CN.96 Thus, chronic drug exposure may influence any of these regions to enhance general transfer effects. Synthesis of behavioural neuroscience studies PL and medial dorsal thalamic lesions show a striking correspondence with chronic drug exposure in producing behavioural autonomy. Specifically, these lesions abolish acquisition but not expression of goal-directed control,48, 87-91 which matches exactly the effect of chronic amphetamine54 (see also51 for related effect with alcohol). However, lesions to the PL do not modify the specific transfer effect,88 but post-training lesions of the mediodorsal thalamus do91 matching the impact of chronic amphetamine.74 Thus, lesions of the mediodorsal thalamus produce precisely the same effect as chronic drug exposure. It is also noteworthy that lesions of the OFC abolish specific transfer62, 100 but not outcome-devaluation100 indicating that damage at this region alone could not produce the exact pattern of chronic drug exposure. Thus, although the effect of chronic drug exposure on transition to behavioural autonomy could be produced by a combination of PL and OFC damage – a view strengthened addicts’ hypofunction in these regions109, 110 – damage to the mediodorsal thalamus alone could impair both forms of behaviour control, and so has the advantage of parsimony. Conclusion To conclude, we propose that initial drug-seeking is goal-directed and tracks the anticipated value of the drug (R-Oiv) and is responsive to specific transfer (S-Oi-R) effects by drug cues. Chronic drug exposure, however, impairs capacity to retrieve or 21 utilise the specific identity of outcomes, and so produces a transition of behavioural control from goal-directed learning (R-Oiv) and specific transfer (S-Oi-R) to habit (S-R) and general transfer ((S-Ov)-R). This transition occurs in relation to both drug outcomes and natural reward outcome, resulting in a narrowing of the addict’s behavioural repertoire to general cue excitation of dominant S-R drug habits, with restricted capacity for intentional selection of alternative actions. This associative framework captures the cardinal diagnostic characteristics of heightened drug reinforcement, loss of willed regulation of drug-seeking and restricted engagement with alternative activities. Future research needs to clarify precisely how this transition to autonomy is accelerated by drugs of abuse compared to natural rewards, whether by differences in reward value50, kinetics111, neuroadaptations112, 113 or neurotoxicity114, and precisely how this alters the balance between corticostriatal circuits underpinning the four controllers.48, 115 There are several implications concerning treatment strategy. Consistent with Tiffany’s7 insight, we have argued that goal-directed action and habit exist in a dynamic balance, which may be competitive46 or hierarchical45, but switching between the two modes apparently can occur within the span of a single response sequence and/or longitudinally with training. If addiction does reflect a progressive weakening of the role of outcome retrieval/utilisation in the execution of action sequences, allowing drug habits to dominate, then treatments such as expectancy challenge116 and extinction training117 which arguably work by changing the specific representation of the drug may not provide the optimal strategy. Instead, treatments that enhance capacity to engage representations of the future such as working memory training118 combined with provision of alternative 22 reward contingencies119 may be more efficacious in redirecting addicts from their established habits. Moreover, given that capacity for goal-directed control can be reinstated by manipulations of brain function48, 51, 52 and by uncertainty which has definable neural substrates46 suggests that neuropharmacology should complement such learning approaches to install new intentional action choices. References 1. Wikler, A. 1984. Conditioning factors in opiate addiction and relapse. J. Subst. Abuse Treat. 1: 279-285. 2. Siegel, S. 1989. Pharmacological conditioning and drug effects. In Psychoactive Drugs. Tolerance and Sensitisation. Goudie, A. & M. Emmett-Oglesby, Eds.: 115-180. Humana Press. Clifton, New Jersey. 3. Solomon, R. L. & J. D. Corbit. 1974. An opponent-process theory of motivation: I. Temporal dynamics of affect. Psychol Rev. 81: 119-145. 4. Stewart, J., H. de Wit & R. Eikelboom. 1984. Role of conditioned and unconditioned drug effects in self-administration of opiates and stimulants. Psychol Rev. 63: 251-268. 5. Bickel, W. K., et al. 1991. Behavioral economics of drug self-administration: II. A unit-price analysis of cigarette smoking. J. Exp. Anal. Behav. 55: 145-154. 6. Koob, G. F. & M. Le Moal. 2001. Drug addiction, dysregulation of reward, and allostasis. Neuropsychopharmacology. 24: 97-129. 7. Tiffany, S. T. 1990. A cognitive model of drug urges and drug-use behaviour: Role of automatic and nonautomatic processes. Psychol Rev. 97: 147-168. 8. Robinson, T. E. & K. C. Berridge. 1993. The neural basis of drug craving: an incentive-sensitization theory of drug addiction. Brain Res. Rev. 18: 247-291. 9. Heyman, G. M. 2009. Addiction: A disorder of choice. Harvard University Press. Cambridge Massachusetts. 10. MacKillop, J., et al. 2011. Delayed reward discounting and addictive behavior: a meta-analysis. Psychopharmacology. 216: 305-321. 11. Ahmed, S. H. 2010. Validation crisis in animal models of drug addiction: Beyond non-disordered drug use toward drug addiction. Neurosci. Biobehav. Rev. 35: 172-184. 12. Ahmed, S. H. & G. F. Koob. 1998. Transition from moderate to excessive drug intake: Change in hedonic set point. Science. 282: 298-300. 13. Dickinson, A., N. Wood & J. W. Smith. 2002. Alcohol seeking by rats: Action or habit? Q J Exp Psychol B. 55: 331-348. 14. Drevets, W. C., et al. 2001. Amphetamine-induced dopamine release in human ventral striatum correlates with euphoria. Biol Psychiatry. 49: 81-96. 23 15. Volkow, N. D., J. S. Fowler & G. J. Wang. 2004. The addicted human brain viewed in the light of imaging studies: brain circuits and treatment strategies. Neuropharmacology. 47: 3-13. 16. White, N. M. 1989. A functional hypothesis concerning the striatal matrix and patches: Mediation of S-R memory and reward. Life Sci. 45: 1943-1957. 17. White, N. M. 1996. Addictive drugs as reinforcers: multiple partial actions on memory systems. Addiction. 91: 921-950. 18. Altman, J., et al. 1996. The biological, social and clinical bases of drug addiction: commentary and debate. Psychopharmacology. 125: 285-345. 19. Robbins, T. W. & B. J. Everitt. 1999. Drug addiction: bad habits add up. Nature. 398: 567-570. 20. Everitt, B. J., A. Dickinson & T. W. Robbins. 2001. The neuropsychological basis of addictive behaviour. Brain Res. Rev. 36: 129-138. 21. Everitt, B. J. & T. W. Robbins. 2005. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nat. Neurosci. 8: 1481-1489. 22. Balleine, B. W. & S. B. Ostlund. 2007. Still at the choice-point - Action selection and initiation in instrumental conditioning. In Reward and Decision Making in Corticobasal Ganglia Networks, Vol. 1104: 147-171. 23. de Wit, S. & A. Dickinson. 2009. Associative theories of goal-directed behaviour: a case for animal–human translational models. Psychol Res. 73: 463-476. 24. Killcross, S. & P. Blundell. 2002. Associative representations of emotionally significant outcomes. In Emotional cognition: From brain to behaviour. Oaksford, S. C. M. M., Ed.: 35-73. John Benjamins Publishing Company. Amsterdam, Netherlands. 25. Konorski, J. 1967. Integrative activity of the brain. University of Chicago Press. Chicago. 26. Balleine, B. W. & S. Killcross. 2006. Parallel incentive processing: an integrated view of amygdala function. Trends Neurosci. 29: 272-279. 27. Vlaev, I., et al. 2011. Does the brain calculate value? Trends in Cognitive Sciences. 15: 546-554. 28. Hursh, S. R. & A. Silberberg. 2008. Economic demand and essential value. Psychol Rev. 115: 186-198. 29. Lingawi, N. W. & B. W. Balleine. 2012. Amygdala central nucleus interacts with dorsolateral striatum to regulate the acquisition of habits. J Neurosci. 32: 1073-1081. 30. Hommel, B. in press. Ideomotor action control: On the perceptual grounding of voluntary actions and agents. In Tutorials in action science. Prinz, W., M. Beisert & A. Herwig, Eds. MIT Press. Cambridge, MA. 31. Corbit, L. H., P. H. Janak & B. W. Balleine. 2007. General and outcome-specific forms of Pavlovian-instrumental transfer: the effect of shifts in motivational state and inactivation of the ventral tegmental area. Eur. J. Neurosci. 26: 3141-3149. 32. Dickinson, A. 1985. Actions and Habits - the Development of Behavioral Autonomy. Philosophical Transactions of the Royal Society of London Series BBiological Sciences. 308: 67-78. 33. Olmstead, M. C., et al. 2001. Cocaine seeking by rats is a goal-directed action. Behav. Neurosci. 115: 394-402. 24 34. Hutcheson, D. M., et al. 2001. The role of withdrawal in heroin addiction: enhances reward or promotes avoidance? Nat. Neurosci. 4: 943-947. 35. Hogarth, L. 2012. Goal-directed and transfer-cue-elicited drug-seeking are dissociated by pharmacotherapy: Evidence for independent additive controllers. J. Exp. Psychol.: Anim. Behav. Processes. (in press). 36. Hogarth, L. & H. W. Chase. 2011. Parallel goal-directed and habitual control of human drug-seeking: Implications for dependence vulnerability. J. Exp. Psychol.: Anim. Behav. Processes. 37: 261-276. 37. Moeller, S. J., et al. 2009. Enhanced choice for viewing cocaine pictures in cocaine addiction. Biol Psychiatry. 66: 169-176. 38. Fergusson, D. M., et al. 2003. Early reactions to cannabis predict later dependence. Archives of General Psychiatry. 60: 1033-1039. 39. de Wit, H., E. H. Uhlenhuth & C. E. Johanson. 1986. Individual differences in the reinforcing and subjective effects of amphetamine and diazepam. Drug Alcohol Depend. 16: 341-360. 40. Scherrer, J. F., et al. 2009. Subjective effects to cannabis are associated with use, abuse and dependence after adjusting for genetic and environmental influences. Drug Alcohol Depend. 105: 76-82. 41. Stoops, W. W., et al. 2007. The reinforcing, subject-rated, performance, and cardiovascular effects of d-amphetamine: Influence of sensation-seeking status. Addict. Behav. 32: 1177-1188. 42. Pomerleau, O. 1995. Individual differences in sensitivity to nicotine: Implications for genetic research on nicotine dependence. Behav. Genet. 25: 161-177. 43. Miles, F. J., B. J. Everitt & A. Dickinson. 2003. Oral cocaine seeking by rats: Action or habit? Behav. Neurosci. 117: 927-938. 44. Balleine, B. W., et al. 1995. Motivational control of heterogeneous instrumental chains. J Exp Psychol Anim Behav Process. 21: 203-217. 45. Dezfouli, A. & B. W. Balleine. 2012. Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35: 1036-1051. 46. Daw, N. D., Y. Niv & P. Dayan. 2005. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8: 17041711. 47. Dickinson, A., et al. 1995. Motivational control after extended instrumental training. Anim. Learn. Behav. 23: 197-206. 48. Killcross, S. & E. Coutureau. 2003. Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex. 13: 400-408. 49. Kosaki, Y. & A. Dickinson. 2010. Choice and contingency in the development of behavioral autonomy during instrumental conditioning. J. Exp. Psychol.: Anim. Behav. Processes. 36 334-342. 50. Nordquist, R. E., et al. 2007. Augmented reinforcer value and accelerated habit formation after repeated amphetamine treatment. Eur. Neuropsychopharmacol. 17: 532540. 51. Corbit, L. H., H. Nie & P. H. Janak. in press. Habitual alcohol seeking: time course and the contribution of subregions of the dorsal striatum. Biol Psychiatry. 25 52. Zapata, A., V. L. Minney & T. S. Shippenberg. 2010. Shift from goal-directed to habitual cocaine seeking after prolonged experience in rats. J. Neurosci. 30: 1545715463. 53. Hogarth, L., et al. 2012. Acute alcohol impairs human goal-directed action. Biological Psychology. 90: 154-160. 54. Nelson, A. & S. Killcross. 2006. Amphetamine exposure enhances habit formation. J. Neurosci. 26: 3805-3812. 55. Schoenbaum, G. & B. Setlow. 2005. Cocaine makes actions insensitive to outcomes but not extinction: Implications for altered orbitofrontal-amygdalar function. Cereb. Cortex. 15: 1162-1169. 56. Colwill, R. M. & R. A. Rescorla. 1988. Associations between the discriminative stimulus and the reinforcer in instrumental learning. J Exp Psychol Anim Behav Process. 14: 155-164. 57. Hogarth, L., et al. 2007. The role of drug expectancy in the control of human drug seeking. J Exp Psychol Anim Behav Process. 33: 484-496. 58. Holmes, N. M., A. R. Marchand & E. Coutureau. 2010. Pavlovian to instrumental transfer: A neurobehavioural perspective. Neurosci. Biobehav. Rev. 34: 1277-1295. 59. Corbit, L. H. & P. H. Janak. 2007. Ethanol-associated cues produce general Pavlovian-instrumental transfer. Alcoholism-Clinical and Experimental Research. 31: 766-774. 60. Krank, M. D. 2003. Pavlovian Conditioning With Ethanol: Sign-Tracking (Autoshaping), Conditioned Incentive, and Ethanol Self-Administration. Alcoholism: Clinical and Experimental Research. 27: 1592-1598. 61. Glasner, S. V., J. B. Overmier & B. W. Balleine. 2005. The role of Pavlovian cues in alcohol seeking in dependent and nondependent rats. J. Stud. Alcohol. 66: 53-61. 62. Balleine, B. W., B. K. Leung & S. B. Ostlund. 2011. The orbitofrontal cortex, predicted value, and choice. Ann. N. Y. Acad. Sci. 1239: 43-50. 63. Delamater, A. R. 1995. Outcome-selective effects of intertrial reinforcement in Pavlovian appetitive conditioning with rats. Anim. Learn. Behav. 23: 31-39. 64. Gámez, A. M. & J. M. Rosas. 2005. Transfer of stimulus control across instrumental responses is attenuated by extinction in human instrumental conditioning. International Journal of Psychology & Psychological Therapy. 5: 207-222. 65. Trick, L., L. Hogarth & T. Duka. 2011. Prediction and uncertainty in human Pavlovian to instrumental transfer Journal of Experimental Psychology: Learning Memory and Cognition. 37: 757-765. 66. Rescorla, R. A. 1994. Transfer of instrumental control mediated by a devalued outcome. Anim. Learn. Behav. 22: 27-33. 67. Holland, P. C. 2004. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J Exp Psychol Anim Behav Process. 30: 258-258. 68. Colwill, R. M. & R. A. Rescorla. 1990. Effects of reinforcer devaluation on discriminative control of instrumental behavior. J Exp Psychol Anim Behav Process. 16: 40-47. 69. Hogarth, L., A. Dickinson & T. Duka. 2010. The associative basis of cue elicited drug taking in humans. Psychopharmacology. 208: 337-351. 26 70. Ferguson, S. G. & S. Shiffman. 2009. The relevance and treatment of cue-induced cravings in tobacco dependence. J. Subst. Abuse Treat. 36: 235-243. 71. Hitsman, B., et al. in press. Dissociable effect of acute varenicline on tonic versus cue-provoked craving in non-treatment motivated heavy smokers. Drug Alcohol Depend. 72. Dickinson, A. & G. R. Dawson. 1987. Pavlovian processes in the motivational control of instrumental performance. Q J Exp Psychol B. 39: 201-213. 73. Mitchell, J. B. & J. Stewart. 1990. Facilitation of sexual behaviors in the male rat in the presence of stimuli previously paired with systemic injections of morphine. Pharmacol. Biochem. Behav. 35: 367-372. 74. Shiflett, M. in press. The effects of amphetamine exposure on outcome-selective Pavlovian-instrumental transfer in rats. Psychopharmacology1-10. 75. Wyvell, C. L. & K. C. Berridge. 2000. Intra-accumbens amphetamine increases the conditioned incentive salience of sucrose reward: Enhancement of reward "wanting" without enhanced "liking" or response reinforcement. J. Neurosci. 20: 8122-8130. 76. Wyvell, C. L. & K. C. Berridge. 2001. Incentive sensitization by previous amphetamine exposure: Increased cue-triggered "wanting" for sucrose reward. J. Neurosci. 21: 7831-7840. 77. Schwabe, L., A. Dickinson & O. T. Wolf. 2011. Stress, habits and drug addiction: A psychoneuroendocrinological perspective. Exp Clin Psychopharmacol. 19: 53-63. 78. Hogarth, L. 2011. The role of impulsivity in the aetiology of drug dependence: reward sensitivity versus automaticity. Psychopharmacology. 215: 567-580. 79. Hogarth, L., H. W. Chase & K. Baess. 2012. Impaired goal-directed behavioural control in human impulsivity. Q J Exp Psychol. 65: 305-316. 80. de Wit, S., et al. 2006. Dorsomedial prefrontal cortex resolves response conflict in rats. J. Neurosci. 26: 5224-5229. 81. Ostlund, S. B., N. T. Maidment & B. W. Balleine. 2010. Alcohol-paired contextual cues produce an immediate and selective loss of goal-directed action in rats. Frontiers in Integrative Neuroscience. 4. 82. Gillan, C. M., et al. 2011. Disruption in the balance between goal-directed behavior and habit learning in obsessive-compulsive disorder. Am. J. Psychiatry. 168: 718-726. 83. Valentin, V., A. Dickinson & J. P. O’Doherty. 2007. Determining the neural substrates of goal-directed learning in the human brain. J Neurosci. 27: 4019-4026. 84. Klossek, U. M. H., J. Russell & A. Dickinson. 2008. The control of instrumental action following outcome devaluation in young children aged between 1 and 4 years. J Exp Psychol Gen. 137: 39-51. 85. Haddon, J. E., et al. 2010. Impaired conditional task performance in a high schizotypy population: Relation to cognitive deficits. The Quarterly Journal of Experimental Psychology. 64: 1-9. 86. Barch, D. M. & A. Ceaser. 2012. Cognition in schizophrenia: core psychological and neural mechanisms. Trends in Cognitive Sciences. 16: 27-34. 87. Balleine, B. W. & A. Dickinson. 1998. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 37: 407-419. 27 88. Corbit, L. H. & B. W. Balleine. 2003. The role of prelimbic cortex in instrumental conditioning. Behav. Brain Res. 146: 145-157. 89. Ostlund, S. B. & B. W. Balleine. 2005. Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning. J. Neurosci. 25: 77637770. 90. Corbit, L. H., J. L. Muir & B. W. Balleine. 2003. Lesions of mediodorsal thalamus and anterior thalamic nuclei produce dissociable effects on instrumental conditioning in rats. Eur. J. Neurosci. 18: 1286-1294. 91. Ostlund, S. B. & B. W. Balleine. 2008. Differential involvement of the basolateral amygdala and mediodorsal thalamus in instrumental action selection. J Neurosci. 28: 4398-4405. 92. Yin, H. H., B. J. Knowlton & B. W. Balleine. 2005. Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur. J. Neurosci. 22: 505-512. 93. Corbit, L. H. & P. H. Janak. 2010. Posterior dorsomedial striatum is critical for both selective instrumental and Pavlovian reward learning. Eur. J. Neurosci. 31: 13121321. 94. Yin, H. H., et al. 2005. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22: 513-523. 95. Balleine, B. W., A. S. Killcross & A. Dickinson. 2003. The effect of lesions of the basolateral amygdala on instrumental conditioning. J. Neurosci. 23: 666-675. 96. Corbit, L. H. & B. W. Balleine. 2005. Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of pavlovianinstrumental transfer. J. Neurosci. 25: 962-970. 97. Johnson, A. W., M. Gallagher & P. C. Holland. 2009. The basolateral amygdala is critical to the expression of Pavlovian and instrumental outcome-specific reinforcer devaluation effects. J Neurosci. 29: 696-704. 98. Yin, H. H., B. J. Knowlton & B. W. Balleine. 2004. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19: 181-189. 99. Yin, H. H., B. J. Knowlton & B. W. Balleine. 2006. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav. Brain Res. 166: 189-196. 100. Ostlund, S. B. & B. W. Balleine. 2007. Orbitofrontal cortex mediates outcome encoding in pavlovian but not instrumental conditioning. J. Neurosci. 27: 4819-4825. 101. Corbit, L. H., J. L. Muir & B. W. Balleine. 2001. The role of the nucleus accumbens in instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell. J. Neurosci. 21: 3251-3260. 102. Corbit, L. & B. Balleine. 2011. The general and outcome-specific forms of Pavlovian-Instrumental transfer are differentially mediated by the nucleus accumbens core and shell. J Neurosci. 31: 11786-11794. 103. Blundell, P., G. Hall & S. Killcross. 2001. Lesions of the basolateral amygdala disrupt selective aspects of reinforcer representation in rats. J. Neurosci. 21: 9018-9026. 28 104. Holland, P. C. & M. Gallagher. 2003. Double dissociation of the effects of lesions of basolateral and central amygdala on conditioned stimulus-potentiated feeding and Pavlovian-instrumental transfer. Eur. J. Neurosci. 17: 1680-1694. 105. Shiflett, M. W. & B. W. Balleine. 2010. At the limbic–motor interface: disconnection of basolateral amygdala from nucleus accumbens core and shell reveals dissociable components of incentive motivation. Eur. J. Neurosci. 32: 1735-1743. 106. Corbit, L. H. & P. H. Janak. 2007. Inactivation of the lateral but not medial dorsal striatum eliminates the excitatory impact of Pavlovian stimuli on instrumental responding. J Neurosci. 27: 13977-13981. 107. Murschall, A. & W. Hauber. 2006. Inactivation of the ventral tegmental area abolished the general excitatory influence of Pavlovian cues on instrumental performance. Learn. Memory. 13: 123-126. 108. Hall, J., et al. 2001. Involvement of the central nucleus of the and nucleus accumbens core in mediating Pavlovian influences on instrumental behaviour. Eur. J. Neurosci. 13: 1984-1992. 109. Chase, H. W., et al. 2008. The role of the orbitofrontal cortex in human discrimination learning. Neuropsychologia. 46: 1326-1337. 110. Wilson, S. J., M. A. Sayette & J. A. Fiez. 2004. Prefrontal responses to drug cues: a neurocognitive analysis. Nat. Neurosci. 7: 211-214. 111. Farré, M. & J. Camí. 1991. Pharmacokinetic considerations in abuse liability evaluation. Addiction. 86: 1601-1606. 112. Wickens, J. R., et al. 2007. Dopaminergic mechanisms in actions and habits. J. Neurosci. 27: 8181-8183. 113. Jedynak, J. P., et al. 2007. Methamphetamine-induced structural plasticity in the dorsal striatum. Eur. J. Neurosci. 25: 847-853. 114. Cunha-Oliveira, T., A. C. Rego & C. R. Oliueira. 2008. Cellular and molecular mechanisms involved in the neurotoxicity of opioid and psychostimulant drugs. Brain Res. Rev. 58: 192-208. 115. Balleine, B. W. & J. P. O'Doherty. 2010. Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 35: 48-69. 116. Jones, B. T. & R. M. Young. 2011. Changing alcohol expectancies and selfefficacy expectations. In Handbook of motivational counseling: Goal-based approaches to assessment and intervention with addiction and other problems. Cox, W. M. & E. Klinger, Eds.: 489-504. John Wiley & Sons, Ltd. 117. Bouton, M. E. 2002. Context, ambiguity, and unlearning: sources of relapse after behavioral extinction. Biol Psychiatry. 52: 976-986. 118. Bickel, W. K., et al. 2011. Remember the future: Working memory training decreases delay discounting among stimulant addicts. Biol Psychiatry. 69: 260-265. 119. Quick, S. L., et al. 2011. Loss of alternative non-drug reinforcement induces relapse of cocaine-seeking in rats: Role of dopamine D1 receptors. Neuropsychopharmacology. 36: 1015-1020. 29 30 Figure Legends Figure 1: Experience of the drug outcome is separately encoded in terms of its perceptual identity (Oi) and incentive value (Ov), and establishes learning about (1) the instrumental contingency between drug-seeking response and the drug (R-Oiv); (2) the habitual contingency between drug stimuli and the drug-seeking response (S-R); and (3) the Pavlovian contingency between drug stimuli and the drug (S-Oiv). It is argued that chronic drug exposure generates a progressive impairment in capacity to retrieve or utilise the specific identity of outcomes (Oi), which causes a transition in behavioural control from the R-Oiv and S-Oi associations to the S-R and S-Ov associations. That is, addiction reflects a loss of control over behaviour by knowledge of the consequences indexed by outcome-devaluation and specific transfer, in favour of control by antecedent stimuli indexed by devaluation insensitivity and general transfer. 31