Slide 1

advertisement
Active inference and epistemic value
Karl Friston, Francesco Rigoli, Dimitri Ognibene, Christoph Mathys,
Thomas FitzGerald and Giovanni Pezzulo
Abstract
We offer a formal treatment of choice behaviour based on the premise that agents minimise the expected free energy of
future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic
(intrinsic) value. Minimising expected free energy is therefore equivalent to maximising extrinsic value or expected utility
(defined in terms of prior preferences or goals), while maximising information gain or intrinsic value; i.e., reducing
uncertainty about the causes of valuable outcomes. The resulting scheme resolves the exploration-exploitation dilemma:
epistemic value is maximised until there is no further information gain, after which exploitation is assured through
maximisation of extrinsic value. This is formally consistent with the Infomax principle, generalising formulations of active
vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk sensitive (KL)
control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems; ad hoc softmax
parameters become the expected (Bayes-optimal) precision of beliefs about – or confidence in – policies. We focus on the
basic theory – illustrating the minimisation of expected free energy using simulations. A key aspect of this minimisation is
the similarity of precision updates and dopaminergic discharges observed in conditioning paradigms.
Premise




All agents minimize free energy (under a generative model)
All agents possess prior beliefs (preferences)
Free energy is minimized when priors are (actively) realized
All agents believe they will minimize (expected) free energy
Perception-action cycle
Set-up and definitions: active inference
Approximate posterior
Pr  at  ut   Q(ut |  )
at  A
Definition: Active inference rests on the tuple ( P, Q, R, S , A,U , ) :
 A finite set of observations 

 A finite set of actions A
Generative process
 A finite set of hidden states S
st  S
 A finite set of control states U
 A generative process R(o, s , a)  Pr({o0 ,
, ot }  o,{s0 ,
, st }  s ,{a0 ,
, oT }  o,{s0 ,
, sT }  s ,{ut ,
st
, at 1}  a)
over observations o , hidden states s  S and action a  A
 A generative model P(o, s , u | m)  Pr({o0 ,
, uT }  u)
ot  
over observations o , hidden s  S and control u U states, with parameters  .
 An approximate posterior Q(s , u)  Pr({s0 ,
, sT }  s ,{ut ,
, uT }  u) over hidden and
control states with sufficient statistics ( s ,  ) , where   {1,
, K } is a policy that indexes
a sequence of control states (u |  )  (ut ,
action
, uT |  )
perception
( st ,  )  arg min F (o, s ,  )
Generative model
world agent
Free energy
F (o, s ,  )  EQ [ ln P(o, s , u | m)]  H [Q( s , u )]
if
  ln P(o | m)  D[Q( s , u ) || P( s , u | o)]
An example:
u1
Control states
Hidden states
P(ut 1 | ut , st )
u2
s1
s3
P(st 1 | st , ut )
s2
p
q

P( st 1 | st , ut  0)   r

0
 0
0 0 0 0
0 0 0 0 
1 1 0 0

0 0 1 0
0 0 0 1 
0
0

P( st 1 | st , ut  1)  0

1
0
Reject or stay
p
?
Low offer
q
Accept or shift
?
?
High offer
r
0 0 0 0
0 0 0 0 
0 1 1 1

0 0 0 0
1 0 0 0 
Low offer
?
High offer
The (normal form) generative model
, 
P  o, s , u ,  | a , m   P  o | s  P  s | a  P  u |   P   | m 
P  o | s   P (o0 | s0 ) P (o1 | s1 )

P(ot | st )
P  ot | st   A
Likelihood
P  s | a   P ( st | st 1 , at )
P  st 1 | st , ut   B(ut )
Action
P( s1 | s0 , a1 ) P( s0 | m)
a0 ,
Control states
, at 1
ut ,
C
, uT
Empirical priors – hidden states
P  u |     (  (Qt 1 
 QT ))
B
– control states
Q ( )  EQ ( o , s | ) [ln P(o , s |  )  ln Q( s |  )]
Hidden states
st 1
st
st 1
P  o | m   C
P  s0 | m   D
P   | m   ( ,  )
if
A
Full priors
ot 1
ot
Priors over policies
Prior beliefs about policies
ln P  u |      (Qt 1 ( ) 
 QT ( ))
Expected free energy
Q ( )  EQ ( o , s | ) [ln P(o , s |  )]  H [Q( s |  )]
 EQ ( o , s | ) [ln Q( s | o ,  )  ln P(o | m)  ln Q( s |  )]
 EQ ( o | ) [ln P(o | m)]  EQ ( o | ) [ D[Q( s | o ,  ) || Q( s |  )]]
Extrinsic value
Epistemic value
Extrinsic value
Epistemic value
  EQ ( s | ) [ H [ P(o | s )]]  D[Q(o |  ) || P(o | m)]
Predictive ambiguity
Predicted ambiguity
Predictive divergence
Predicted divergence
Bayesian surprise and Infomax
KL or risk-sensitive control
Expected utility theory
In the absence of prior beliefs about outcomes:
In the absence of ambiguity:
In the absence of posterior uncertainty or risk:
Q ( )  EQ ( s | ) [ln P( s |  )  ln Q( s |  )]
Q ( )  EQ ( o | ) [ln P(o | m)]
Q ( )  EQ ( o | ) [ D[Q( s | o ,  ) || Q( s |  )]]
Bayesian Surprise
Bayesian surprise
 D[Q( s , o |  ) || Q( s |  )Q(o |  )]
Predictive mutual information
Predicted mutual information
  D[Q( s |  ) || P( s |  )]
KL divergence
Predicted divergence
Extrinsic value
Extrinsic value
The quality of a policy corresponds to (negative) expected free energy
Q ( )  EQ ( o , s | ) [ln P(o , s |  )]  H [Q( s |  )]
Generative model of future states
P(o , s |  )  Q(s | o ,  ) P(o | m)
Future generative model of states
P(o , s |  )  P(o | s ,  ) P( s | m)
Prior preferences (goals) over future outcomes
C (o | m)  ln P(o | m)
Minimising free energy
The mean field partition
Q  s , u,  |    Q( s0 | s0 )
Q( sT | sT )Q(ut ,
, uT |  )Q( |  )
Q   |    ( ,     )
And variational updates
Q( st )  exp( EQ / st [ln P(o, s , u,  | m)])
Q( )  exp( EQ / [ln P(o, s , u,  | m)])
Q( )  exp( EQ / [ln P(o, s , u,  | m)])
Variational updates
Functional anatomy
motor Cortex
Perception st   (ln A  ot  ln(B(at 1 ) st 1 ))
at
Pr  at  ut   Q(ut |  )
Action selection    (  Q)
Precision
 

striatum
st
  Q 
ot

prefrontal Cortex
occipital Cortex
Q
Q( )  Q t 1 ( ) 

 QT ( )
Q ( )  1  ( A  ln A) s ( )  (ln As ( )  ln C )  As ( )
Predictive uncertainty
Predicted ambiguity
s ( )  B(u |  )
B(ut |  ) st
if Predicted divergence
Predictive divergence
Forward sweeps over future states
midbrain
s
hippocampus
The (T-maze) problem
or
Generative model
Prior beliefs about control
Control states
u  ut ,
, uT
YYYY
P  u | o,     (  Q( ))
Q ( )  EQ ( o | ) [ln P(o | m)]  EQ ( o | ) [ D[Q( s | o ,  ) || Q( s |  )]]
Extrinsic value
Epistemic value
location
Posterior beliefs about states
Hidden states
s  sl  sc

context
o  ol  os
YYYY
location
0 0 0
1 0 1 
 I2
0 1 0

0 0 0
0 
0 
 12 12 
 0
 0
1

1 1
 0



0
0 
0
0 
 : A  2 2,A  


,
A

,
A

4
 1 0 0  2  a 1  a  3 1  a
0
a 
A 4 







a 
1  a
 a 1 a
0
0 0
 A1
P (ot | st )  A  

Observations
0
1
P( st 1 | st , ut )  B(ut ) : B(ut  2)  
0

0
YYYY YY
location
Q( s , u | s ,  )

0
1 
0

0
P(o | m)  C   1T4  0 0 c c 
T
CS
CS US
stimulus
NS
P( s0 | m)  D   1 0 0 0   12
1
2

T
Comparing different
schemes
Performance
FE
KL
EU
80
success rate (%)
DA
60
40
20
0
0
0.2
0.4
0.6
0.8
1
Prior preference
Q ( )  EQ ( o , s | ) [ln P(o | s )  ln Q(o | u )  ln P(o | m)]
Expected utility
KL control
Expected Free energy
Sensitive to risk or ambiguity
+
+
+
success rate (%)
Simulating conditioned responses
(in terms of precision updates)
DA
60
40
20
0
0
0.2
0.4
0.6
0.8
1
Prior preference
Precision updates
11
350
10
300
250
8
Rate
Precision
9
7
200
150
6
100
5
4
Simulated (US) responses
400
50
1
2
3
0
4
1
Peristimulus time (sec)
3.5
350
3
300
2.5
250
150
1
100
0.5
50
1
2
3
4
Peristimulus time (sec)
4
200
1.5
0
3
Simulated (CS & US) responses
400
Rate
Response
Dopamine responses
4
2
2
Peristimulus time (sec)
0
1
2
3
4
Peristimulus time (sec)
Expected precision and value
 8
8

7
 
Expected precision
6
1 Q 
5
4
3
2
1
0
-8
-7
-6
-5
-4
-3
-2
-1
0
Expected value
Q 
Changes in expected precision reflect changes in expected value:
c.f., dopamine and reward prediction error
Simulating conditioned
responses
400
300
Rate
Preference (utility)
4
100
3.5
0
2.5
400
2
300
Rate
Response
3
1.5
1
2
3
4
c  1: a  0.5
200
100
1
0
0.5
0
c  0: a  0.5
200
1
1.5
2
2.5
3
3.5
4
1
2
3
4
400
4.5
Rate
Peristimulus time (sec)
c  2: a  0.5
200
0
1
2
3
4
Rate
300
Uncertainty
4
200
a  0.5: c  2
100
0
1
2
3
4
3.5
400
Rate
2.5
200
a  0.7 : c  2
2
1.5
0
1
1
2
3
4
400
0.5
0
1
1.5
2
2.5
3
3.5
4
4.5
Rate
Response
3
a  0.9: c  2
200
Peristimulus time (sec)
0
1
2
3
4
Peristimulus time
Learning as inference
Hierarchical augmentation of state space
Control states
u  ut ,
, uT
YYYY
1
1
j
1

1
Hidden states
s  sm  sl  sc
A  [ A (1) ,
, A (1) ]
 B (1) ( j1i )

B(i )  


C  C(1)
2 3 4
3 2 4 
2 4 3

4 3 2



B (1) ( j4i ) 
YYYY YYYY YY


location
maze
Bayesian belief updating between trials:
of conserved (maze) states
s0  EsT
E  P( s0 | sT , m)
 ((1  e) I 4  e)  (D(1)  1T8 )
context
Learning as inference
Exploration and exploitation
100
90
performance
Performance and uncertainty
80
70
60
50
40
30
20
10
0
uncertainty
1
2
3
4
5
6
7
8
Number of trials
Average dopaminergic response
Precision
2.5
2
1.5
1
0.5
20
40
60
80
100
120
140
160
180
160
180
Variational updates
Simulated dopaminergic response
Spikes per then
300
200
100
0
20
40
60
80
100
Variational updates
120
140
Summary
Optimal behaviour can be cast as a pure inference problem, in which valuable outcomes are defined in terms of prior
beliefs about future states.
Exact Bayesian inference (perfect rationality) cannot be realised physically, which means that optimal behaviour rests on
approximate Bayesian inference (bounded rationality).
Variational free energy provides a bound on Bayesian model evidence that is optimised by bounded rational behaviour.
Bounded rational behaviour requires (approximate Bayesian) inference on both hidden states of the world and (future)
control states. This mandates beliefs about action (control) that are distinct from action per se – beliefs that entail a
precision.
These beliefs can be cast in terms of minimising the expected free energy, given current beliefs about the state of the
world and future choices.
The ensuing quality of a policy entails epistemic value and expected utility that account for exploratory and exploitative
behaviour respectively.
Variational Bayes provides a formal account of how posterior expectations about hidden states of the world, policies and
precision depend upon each other; and may provide a metaphor for message passing in the brain.
Beliefs about choices depend upon expected precision while beliefs about precision depend upon the expected quality of
choices.
Variational Bayes induces distinct probabilistic representations (functional segregation) of hidden states, control states and
precision – and highlights the role of reciprocal message passing. This may be particularly important for expected precision
that is required for optimal inference about hidden states (perception) and control states (action selection).
The dynamics of precision updates and their computational architecture are consistent with the physiology and anatomy of
the dopaminergic system.
Download