Goal-directed and habitual decision-making computational modeling in impulsive-compulsive psychiatric disorders Zsuzsika Sjoerds [E] sjoerds.zs@gmail.com / @zsjoerds 28th ECNP Congress Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Leipzig, Germany Max-Planck Fellow Group “Cognitive and Affective Control of Behavioral Adaptation” ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Two parallel decision making systems a computational approach goal-directed habitual “model-based” “model-free” - flexible forward planning - model of environment - action – outcome associations - automatic responses - stamped in by past reinforcement - divorced from value of future outcome Addiction, OCD, binge eating, ADHD.. (Everitt and Robbins 2005; Voon 2014) Balleine and O'Doherty 2010, Neuropsychopharmacology; Dolan and Dayan 2013, Neuron ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Overview Studies on goal-directed & habitual learning: 1. Vulnerability factors - impulsivity - stress 2. Alcohol dependence 3. Obsessive-compulsive disorder ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Paradigms to measure the two systems more examples: Table 1 in Sjoerds et al., 2014 Front Psychiatry - Selective outcome devaluation Habitual behavior not influenced by outcome devaluation Valentin, Dickinson & O’doherty 2007 J Neurosci - Stimulus-response instrumental learning S-O-R contingencies; devaluation; slip-of-action De Wit & Dickinson, 2007, J Exp Psychol - Reversal learning Flexible adjustment in a changing environment O’doherty et al., 2001, Nat Neurosci; Cools et al., 2002, J Neurosci; Den Ouden et al., 2013, Neuron - Sequential decision making Forward planning in a stepwise decision-tree Computational modeling approach ECNP, Amsterdam, 31-08-2015 Daw et al. 2011, Neuron Max Planck Institute for Human Cognitive and Brain Sciences Computational modeling: reinforcement learning / reward-related decision making A system (person/computer) learns actions to maximize rewards/positive outcomes, and avoid punishments/negative outcomes Updating of values based on prediction error Prediction error δ: Action value Q obtained reward (r) - expected reward (Q) updated per trial (t), with learning rate (α) δt = rt – Q t Qt+1 = Qt + α * δt Sutton & Barto, 1998. Reinforcement Learning: An Introduction, MIT Press ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Stage 1 0s Sequential decision-making Stage 1 Stage 2 +<2s +3s 70% common +<2s +3s First-stage stay probability +1.5s 30% rare 70% common Stage 2 ω= weighting parameter, ↑ model-based ↓ model-free Daw et al. 2011, Neuron Deserno et al. 2015, PNAS ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences % repeating choice Sequential decision-making vulnerability: impulsivity (BIS) p < 0.05 whole-brain corrected Deserno et al., In Press, Translational Psychiatry ECNP, Amsterdam, 31-08-2015 parameter estimates * omega rewards – non rewards % repeating choice after low-impulsive high-impulsive n=24, mean BIS=74 n=26, mean BIS=50 model-free minus model-based MPFC IFG/OFC VS high low high low high low * Max Planck Institute for Human Cognitive and Brain Sciences Sequential decision-making vulnerability: stress cortisol stress response & decision-making main effect of stress: p = 0.023, η2 = 0.155 stress x reward x state: p > 0.6 Otto et al., 2013 PNAS N = 39 healthy males, within-subject design Radenbach, Reiter et al. 2015, Psychoneuroendocrinology ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Sequential decision-making vulnerability: stress Acute stress and chronic stress interact to decrease model-based control Life-time stress : Low High N = 39 healthy controls, within-subject design Radenbach, Reiter et al. 2015, Psychoneuroendocrinology ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Goal-directed learning in alcohol dependence ‘Fabulous Fruit Game’: S-R-O goal-directed and S-R habit-based learning Goal-directed learning: HC phase 1: learn S-R-O associations phase 2: assess R-O strength De Wit et al., 2009 J Neurosci Goal-directed learning: HC > AD * No group difference, p<0.05 above chance Main group effect: p=0.017 VMPFC (Z=3.45 [x=-4,y=58,z=18] & Z=3.29 [x=12,y=60,z=-5]) Anterior Putamen (Z=3.63 [x=-27,y=5,z=3] Sjoerds et al., 2013 Translat Psychiat. ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Reversal learning task Adaptation to an uncertain, changing environment Unchosen: Reward double update Action ´ single update Stimulus pre-reversal (stable) ECNP, Amsterdam, 31-08-2015 reversal post-reversal (stable) Chosen: Max Planck Institute for Human Cognitive and Brain Sciences Reduced behavioral adaptation in alcohol dependent patients 3-way Interaction: learning rate unchosen stimulus reward x state x group p < 0.05 model-based score (reward x state interaction) winning model to explain behavior Healthy controls n=35 Alcohol dependence n=43 Reiter et al. under review ECNP, Amsterdam, 31-08-2015 Sebold et al. 2014, Neuropsychobiology Max Planck Institute for Human Cognitive and Brain Sciences Reduced goal-directed learning signal in AD Main effect: Double-update δ (goal-directed) MPFC VS parameter estimates X = -8, Y = 62, Z = 12 Single-update δ (habitual) MPFC Controls > AD p < .05 FWE-whole-brain-corrected Healthy controls n=35 Alcohol dependence n=43 HC n=35, AD n=34 Reiter et al. under review ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences OCD: a bias towards learning habits? Gillan et al., 2011, Am. J. Psychiatry Voon et al. 2014, Molecular Psychiatry ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences OCD: a bias towards learning habits? Voxel-based morphometry: % repeating choice Controls n=30 group * ω interaction parameter estimates VS x=-16, y=7, z=9 OCD n=28 t=4.50,p < 0.05 small volume corrected bilateral nucleus accumbens omega % repeating choice reward no reward reward no reward Healthy controls n=30 OCD patients n=28 Golz, Sjoerds et al., in preparation ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Behavioral adaptation in OCD Computational modeling Exceedance Probability 0.7 0.6 0.5 0.4 HC N=35 0.3 OCD N=29 0.2 0.1 0 Single Update Hybrid Double Update Models Modeling parameters: no group differences (p’s > 0.5) - Medication status (SSRI’s) - Cognitive capacities pre-reversal reversal Healthy controls n=35 OCD patients n=29 ECNP, Amsterdam, 31-08-2015 post-reversal - fMRI correlates Sjoerds, Lüttgau et al., in preparation Max Planck Institute for Human Cognitive and Brain Sciences Conclusions I - Vulnerability factors (impulsivity, stress) influence model-based & model-free balance, but profile is qualitatively different than in patients. - Impulsivity associated with more model-free choices, but lower model-based activity in the OFC - Stress reactivity and chronic stress play a role in mediating the relationship between acute stress and decision-making - These results stimulate new insights into the pathogenesis of various psychiatric diseases involving stress, impulsivity and attenuated model-based control ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences Conclusions II - Alcohol use disorders repeatedly show reduced goal-directed/modelbased choices, and associated neural signatures in MPFC. Even with various paradigms. - How do these patterns of (monetary) goal-directed vs habit learning relate to established alcohol seeking/use habits? - Obsessive-Compulsive disorders have shown reduced model-based (or increased model-free?) choices which we so far did not replicate. But heterogeneity by compulsion type and medication, cognition needs further study. ECNP, Amsterdam, 31-08-2015 Max Planck Institute for Human Cognitive and Brain Sciences VU University Medical Center, Amsterdam Prof. Dick J. Veltman Prof. Brenda W.J.H. Penninx Prof. Aartjan T.F. Beekman Academic Medical Center, University of Amsterdam Prof. Wim van den Brink Prof. Damiaan Denys Judy Luigjes, PhD University of Amsterdam Sanne de Wit, PhD University of Cambridge Prof. Trevor W. Robbins Monash University, Melbourne Prof. Murat Yücel Klinik und Poliklinik für Psychiatrie und Psychotherapie Universität Leipzig,Ambulanz für Zwangserkrankungen Prof. Katarina Stengler Sebastian Olbrich, MD Rubicon Grant for young researchers Z. Sjoerds: 2014-2016 (#2014/05563/ALW) Netherlands Organization for Scientific Research. ECNP, Amsterdam, 31-08-2015 Thank you! Max-Planck Fellow Group “Cognitive and Affective Control of Behavioral Adaptation” Florian Schlagenhauf, MD, group leader Lorenz Deserno, MD Martin Panitz, MD Andrea Reiter, PhD student Tilmann Wilbertz, cand. med. Christoph Radenbach, cand. med. Martin Huss, cand. med. Lennart Lüttgau, BSc Laura Golz, BSc Karoline Hudl, BSc Department of neuropsychology Jan Schreiber, PhD Leibniz-Institute for Neurobiology, Magdeburg Prof. Hans-Jochen Heinze, Director Max Planck Institute for Human Cognitive and Brain Sciences