Goal-directed and habitual decision-making

advertisement
Goal-directed and habitual
decision-making
computational modeling in impulsive-compulsive
psychiatric disorders
Zsuzsika Sjoerds
[E] sjoerds.zs@gmail.com /
@zsjoerds
28th ECNP Congress
Amsterdam, 31-08-2015
Max Planck Institute
for Human Cognitive and Brain Sciences Leipzig, Germany
Max-Planck Fellow Group “Cognitive and Affective Control of Behavioral Adaptation”
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Two parallel decision making systems
a computational approach
goal-directed
habitual
“model-based”
“model-free”
- flexible forward planning
- model of environment
- action – outcome
associations
- automatic responses
- stamped in by past
reinforcement
- divorced from value of future
outcome
Addiction, OCD, binge eating, ADHD..
(Everitt and Robbins 2005; Voon 2014)
Balleine and O'Doherty 2010, Neuropsychopharmacology; Dolan and Dayan 2013, Neuron
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Overview
Studies on goal-directed & habitual learning:
1. Vulnerability factors
- impulsivity
- stress
2. Alcohol dependence
3. Obsessive-compulsive disorder
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Paradigms to measure the two systems
more examples: Table 1 in Sjoerds et al., 2014 Front Psychiatry
- Selective outcome devaluation
Habitual behavior not influenced by outcome devaluation
Valentin, Dickinson & O’doherty 2007 J Neurosci
- Stimulus-response instrumental learning
S-O-R contingencies; devaluation; slip-of-action
De Wit & Dickinson, 2007, J Exp Psychol
- Reversal learning
Flexible adjustment in a changing environment
O’doherty et al., 2001, Nat Neurosci; Cools et al., 2002, J Neurosci; Den Ouden et al., 2013, Neuron
- Sequential decision making
Forward planning in a stepwise decision-tree
Computational modeling approach
ECNP, Amsterdam, 31-08-2015
Daw et al. 2011, Neuron
Max Planck Institute for Human Cognitive and Brain Sciences
Computational modeling:
reinforcement learning / reward-related decision making
A system (person/computer) learns actions to
maximize rewards/positive outcomes,
and avoid punishments/negative outcomes
Updating of values based on prediction error
Prediction error δ:
Action value Q
obtained reward (r) - expected reward (Q) updated per trial (t), with learning rate (α)
δt = rt – Q t
Qt+1 = Qt + α * δt
Sutton & Barto, 1998. Reinforcement Learning: An Introduction, MIT Press
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Stage 1
0s
Sequential decision-making
Stage 1
Stage 2
+<2s
+3s
70%
common
+<2s
+3s
First-stage
stay probability
+1.5s
30%
rare
70%
common
Stage 2
ω=
weighting parameter,
↑ model-based
↓ model-free
Daw et al. 2011, Neuron
Deserno et al. 2015, PNAS
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
% repeating choice
Sequential decision-making
vulnerability: impulsivity (BIS)
p < 0.05 whole-brain corrected
Deserno et al., In Press, Translational Psychiatry
ECNP, Amsterdam, 31-08-2015
parameter estimates
*
omega
rewards – non rewards
% repeating choice after
low-impulsive
high-impulsive
n=24, mean BIS=74 n=26, mean BIS=50
model-free minus model-based
MPFC
IFG/OFC
VS
high low
high low
high low
*
Max Planck Institute for Human Cognitive and Brain Sciences
Sequential decision-making
vulnerability: stress
cortisol stress response & decision-making
main effect of stress: p = 0.023, η2 = 0.155
stress x reward x state: p > 0.6
Otto et al., 2013 PNAS
N = 39 healthy males, within-subject design
Radenbach, Reiter et al. 2015, Psychoneuroendocrinology
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Sequential decision-making
vulnerability: stress
Acute stress and chronic stress interact to decrease model-based control
Life-time
stress :
Low
High
N = 39 healthy controls, within-subject design
Radenbach, Reiter et al. 2015, Psychoneuroendocrinology
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Goal-directed learning in alcohol dependence
‘Fabulous Fruit Game’: S-R-O goal-directed and S-R habit-based learning
Goal-directed learning: HC
phase 1:
learn S-R-O
associations
phase 2:
assess R-O
strength
De Wit et al., 2009 J Neurosci
Goal-directed learning: HC > AD
*
No group difference,
p<0.05 above chance
Main group effect:
p=0.017
VMPFC (Z=3.45 [x=-4,y=58,z=18] & Z=3.29
[x=12,y=60,z=-5])
Anterior Putamen (Z=3.63 [x=-27,y=5,z=3]
Sjoerds et al., 2013 Translat Psychiat.
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Reversal learning task
Adaptation to an uncertain, changing environment
Unchosen:
Reward
double update
Action
´
single update
Stimulus
pre-reversal
(stable)
ECNP, Amsterdam, 31-08-2015
reversal
post-reversal
(stable)
Chosen:
Max Planck Institute for Human Cognitive and Brain Sciences
Reduced behavioral adaptation
in alcohol dependent patients
3-way Interaction:
learning rate
unchosen stimulus
reward x state x group p < 0.05
model-based score
(reward x state interaction)
winning model to
explain behavior
Healthy controls n=35
Alcohol dependence n=43
Reiter et al. under review
ECNP, Amsterdam, 31-08-2015
Sebold et al. 2014, Neuropsychobiology
Max Planck Institute for Human Cognitive and Brain Sciences
Reduced goal-directed learning signal in AD
Main effect:
Double-update δ
(goal-directed)
MPFC
VS
parameter estimates
X = -8, Y = 62, Z = 12
Single-update δ
(habitual)
MPFC
Controls > AD
p < .05 FWE-whole-brain-corrected
Healthy controls n=35
Alcohol dependence n=43
HC n=35, AD n=34
Reiter et al. under review
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
OCD: a bias towards learning habits?
Gillan et al., 2011, Am. J. Psychiatry
Voon et al. 2014, Molecular Psychiatry
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
OCD: a bias towards learning habits?
Voxel-based morphometry:
% repeating choice
Controls n=30
group * ω interaction
parameter estimates
VS x=-16, y=7, z=9
OCD n=28
t=4.50,p < 0.05 small volume corrected
bilateral nucleus accumbens
omega
% repeating choice
reward no reward
reward no reward
Healthy controls n=30
OCD patients n=28
Golz, Sjoerds et al., in preparation
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Behavioral adaptation in OCD
Computational modeling
Exceedance Probability
0.7
0.6
0.5
0.4
HC N=35
0.3
OCD N=29
0.2
0.1
0
Single Update
Hybrid
Double Update
Models
Modeling parameters: no group differences
(p’s > 0.5)
- Medication status (SSRI’s)
- Cognitive capacities
pre-reversal
reversal
Healthy controls n=35
OCD patients n=29
ECNP, Amsterdam, 31-08-2015
post-reversal
- fMRI correlates
Sjoerds, Lüttgau et al., in preparation
Max Planck Institute for Human Cognitive and Brain Sciences
Conclusions I
- Vulnerability factors (impulsivity, stress) influence model-based &
model-free balance, but profile is qualitatively different than in
patients.
- Impulsivity associated with more model-free choices, but lower
model-based activity in the OFC
- Stress reactivity and chronic stress play a role in mediating the
relationship between acute stress and decision-making
- These results stimulate new insights into the pathogenesis of various
psychiatric diseases involving stress, impulsivity and attenuated
model-based control
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
Conclusions II
- Alcohol use disorders repeatedly show reduced goal-directed/modelbased choices, and associated neural signatures in MPFC. Even with
various paradigms.
- How do these patterns of (monetary) goal-directed vs habit learning
relate to established alcohol seeking/use habits?
- Obsessive-Compulsive disorders have shown reduced model-based (or
increased model-free?) choices which we so far did not replicate. But
heterogeneity by compulsion type and medication, cognition needs
further study.
ECNP, Amsterdam, 31-08-2015
Max Planck Institute for Human Cognitive and Brain Sciences
VU University Medical Center, Amsterdam
Prof. Dick J. Veltman
Prof. Brenda W.J.H. Penninx
Prof. Aartjan T.F. Beekman
Academic Medical Center, University of Amsterdam
Prof. Wim van den Brink
Prof. Damiaan Denys
Judy Luigjes, PhD
University of Amsterdam
Sanne de Wit, PhD
University of Cambridge
Prof. Trevor W. Robbins
Monash University, Melbourne
Prof. Murat Yücel
Klinik und Poliklinik für Psychiatrie und Psychotherapie
Universität Leipzig,Ambulanz für Zwangserkrankungen
Prof. Katarina Stengler
Sebastian Olbrich, MD
Rubicon Grant for young researchers
Z. Sjoerds: 2014-2016 (#2014/05563/ALW)
Netherlands Organization for Scientific Research.
ECNP, Amsterdam, 31-08-2015
Thank you!
Max-Planck Fellow Group “Cognitive and Affective
Control of Behavioral Adaptation”
Florian Schlagenhauf, MD, group leader
Lorenz Deserno, MD
Martin Panitz, MD
Andrea Reiter, PhD student
Tilmann Wilbertz, cand. med.
Christoph Radenbach, cand. med.
Martin Huss, cand. med.
Lennart Lüttgau, BSc
Laura Golz, BSc
Karoline Hudl, BSc
Department of neuropsychology
Jan Schreiber, PhD
Leibniz-Institute for Neurobiology, Magdeburg
Prof. Hans-Jochen Heinze, Director
Max Planck Institute for Human Cognitive and Brain Sciences
Download