seminars on meta-cognition, 2012–2013

advertisement
SEMINARS ON META-COGNITION, 2012–2013
November 29th 4:30 – 6:00pm Old Library
Karl Friston
Meta-cognition, prediction, precision
(Discussant, Andreas Roepstorff, Aarhus)
Abstract
Predictive coding models and the free-energy principle, suggests that cortical activity in sensory brain areas reflects the precision of
prediction errors and not just the sensory evidence or prediction errors per se. If we assume that neuronal activity encodes a
probabilistic representation of the world that optimizes free-energy in a Bayesian fashion, then, because free-energy bounds surprise
or the (negative) log-evidence for internal models of the world, this optimization can be regarded as evidence accumulation or
(generalized) predictive coding. Crucially, both predictions about the state of the world generating sensory data and the precision of
those data have to be optimized. In other words, we have to make predictions (test hypotheses) about the content of the sensorium
and predict our confidence in those hypotheses. I hope to demonstrate the meta-representational aspect of inference using
simulations of visual searches and action selection - to illustrate their nature and promote discussion about its role in high-order
cognition.
The basic idea: active inference and free energy
Beliefs about beliefs: beliefs about uncertainty
Beliefs about beliefs: beliefs about precision and agency
“Objects are always imagined as being present in the field of vision as
would have to be there in order to produce the same impression on
the nervous mechanism” - von Helmholtz
Hermann von Helmholtz
Richard Gregory
Geoffrey Hinton
From the Helmholtz machine to the
Bayesian brain and self-organization
Thomas Bayes
Richard Feynman
Hermann Haken
temperature
What is the difference between a snowflake and a
bird?
Phase-boundary
…a bird can act (to avoid surprises)
The basic ingredients
Hidden states in the world
ω 
Sensations
Internal states of the agent
s  g ( ψ, a )  ω s
Fluctuations
Posterior expectations
ψ  f ( ψ, a )  ω x
  arg min  F ( s ,  )
External states
a  arg min a F (s , μ)
Action
What we need to explain: how do we minimise the dispersion of sensory states (homoeostasis)?
  ln p(s (t ) | m)dt  H [ p(s | m)]
The principle of least free energy (minimising surprise)
F ( s ,  , m)   ln p( s | m)  DKL [q( |  ), p( | s )]
 Eq [ ln p( , s )]  H [q( |  )]
Bayesian inference
Maximum entropy principle
Ergodic theorem
 dtF (t )   dt ln p(s (t ) | m)  H [ p(s | m)]
The principle of least action
Self organisation
How can we minimize surprise (prediction error)?
sensations – predictions
Prediction error
Change sensations
Change
predictions
Action
Perception
…action and perception minimise free energy
Action as inference – the “Bayesian thermostat”
Posterior distribution
p( | s)
Prior distribution
p ( )
Likelihood distribution
p( s |  )
s
20
40
60
80
100
120

temperature
 (t )  (t )
a (t )
Perception
  arg min F ( s,  , )  arg min  s ( s(a)  g (  )) 2   (    ) 2 
Action
a  arg min F ( s,  , )  arg min  s ( s(a)  g (  )) 2   (    ) 2 


a
a
s  g ( )  
How might the brain minimise free energy (prediction error)?
Hidden states in the world
Sensations
Fluctuations
Internal states of the agent
Posterior expectations
  arg min  F ( s ,  )
External states
a  arg min a F (s , μ)
Action
…by using predictive coding (and reflexes)
Free energy minimisation
Generative model
Predictive coding with reflexes
s  g ( x, v , a )  ω v
s  g (1) ( x (1) , v (1) )  v(1)
v(i )   v(i ) v(i )   v(i ) ( v(i 1)  g (i ) (  x(i ) , v(i ) ))
x  f ( x, v , a )  ω x
x (1)  f (1) ( x (1) , v (1) )   x(1)
 x(i )   (xi ) x(i )   (xi ) (D  x(i )  f (i ) (  x(i ) , v(i ) ))
a   a F ( s ,  )
  D    F (s,  )
v ( i 1)  g ( i ) ( x ( i ) , v ( i ) )  v( i )
x (i )  f (i ) ( x ( i ) , v ( i ) )   x( i )
v(i )  Dv(i )   v (i )   (i )   v(i 1)
 x(i )  D x(i )   x (i )   (i )
a  ( a v(1) )  v(1)
From models to perception
A simple hierarchy
(3)
v(3)
v
Inward error
stream
Generative model
Dx(i )  f (i ) ( x(i ) , v (i ) )  x(i )
(2)

v (2)
v
(2)

 x(2)
x
(2)
x (2)
x
(2)

 v(2)
v
(1)

v (1)
v
x(1)x(1)
(1)
x (1)
x
(1)
v(1)
v
(0)

v (0)
v
pa (i )  i
s v
v (i 1)  g (i ) ( x(i ) , v (i ) )  v(i )
Outward
prediction stream
Model inversion (inference)
Expectations:
Predictions:
Prediction errors:
(0)
v(i )  Dv(i )   v ( i )   ( i )   v( i 1)
 x(i )  D x(i )   x (i )   (i )
g (i )  g (i ) (  x( i ) , v( i ) )
f (i )  f (i ) (  x( i ) , v( i ) )
 v(i )   (vi ) v( i )   (vi ) ( v( i 1)  g ( i ) )
 x(i )   (xi ) x( i )   (xi ) (D  x( i )  f ( i ) )
David Mumford
Predictive coding with reflexes
Action
a   a s   v(1)
oculomotor
signals
reflex
arc
proprioceptive input
pons
Perception
retinal input
Prediction error (superficial pyramidal cells)
occipital cortex
Attention
geniculate
 (i )
Top-down or backward
predictions
 v(i )   v(i ) v(i )   (vi ) ( v(i 1)  g (i ) (  x(i ) , v(i ) ))
 x(i )   (xi ) x(i )   (xi ) (D  x(i )  f (i ) (  x(i ) , v(i ) ))
Conditional predictions (deep pyramidal cells)
Bottom-up or forward
prediction error
visual cortex
 (i )
v(i )  Dv(i )   v (i )   ( i )   v(i 1)
 x(i )  D x(i )   x (i )   (i )
Biological agents resist the second law of thermodynamics
They must minimize their average surprise (entropy)
They minimize surprise by suppressing prediction error (free-energy)
Prediction error can be reduced by changing predictions (perception)
Prediction error can be reduced by changing sensations (action)
Perception entails recurrent message passing in the brain to optimize predictions
Action makes predictions come true (and minimizes surprise)
Beliefs about beliefs: beliefs about uncertainty
Perception as hypothesis testing – action as experiments
But how do we think action will change our beliefs?
Searching, salience and saccades
Where do I expect to look?
a (t )
  arg min  s ( s (a )  g (  )) 2   (    ) 2 

a  arg min  s ( s ( a )  g (  )) 2   (    ) 2 
a
  arg min ?

s  g ( )  
Sampling the world to minimise uncertainty
H ( S ,  )  H ( S | m)  H ( | S )
 Et [ ln p( s (t ) | m)]  Et [ H ( | S  s (t ))]
Free energy principle
minimise uncertainty
 (t )  arg min{H [ q( |  , )]}

 (t )  
s (t )  S
S( )   H [q( |  , )]
stimulus
visual input
salience
Perception as hypothesis testing – saccades as experiments
sampling
Hidden states in the world
ω 
Sensations
Internal states of the agent
s  g ( ψ, a )  ω s
Fluctuations
Posterior expectations
ψ  f ( ψ, a )  ω x
  arg min  F ( s ,  )
External states
a  arg min a F (s , μ)
  arg min H [q( |  , )]
Prior expectations
Action
 x, p
Parietal (where)
Frontal eye fields
u
xp
 x, p
u
Visual cortex
v ,q
sq
 x ,q
Pulvinar salience map
 x ,q
Fusiform (what)

xp
S( )
a
oculomotor reflex arc
v, p
sp
Superior colliculus
Saccadic eye movements
Saccadic fixation and salience maps
Action (EOG)
2
Hidden (oculomotor) states
0
-2
200
400
600
800
time (ms)
1000
1200
1400
1000
1200
1400
Visual samples
Posterior belief
5
Conditional expectations
about hidden (visual) states
0
-5
200
And corresponding percept
400
600
800
time (ms)
Beliefs about beliefs: beliefs about precision
If beliefs cause movement, how can I move when sensory
evidence compels me to believe that I am not moving?
Sensory attenuation, illusions and agency
Making your own sensations
s   x 
s   p    i   s
 ss   xi  xe 
 xi   vi  14  xi 
x 
  x
1
 xe  ve  4  xe 
a
 s   xi 
s   p  
  ωs
 ss   x i  v e 
x  xi   ( a )  14  xi  ω x
v 
v   i   v
ve 
sp
ω s ~ N (0, e 8 I )
ω x ~ N (0, e 8 I )
ss
s ~ N (0, e  I )
 x ~ N (0, e 4 I )
xi
Generative process
ve
  8     ( xi  vi )
v ~ N (0, e 6 I )
Generative model
x
v
sensorimotor cortex
descending predictions
prefrontal cortex
x
v
descending modulation
thalamus
ascending prediction errors
v,s
ss
v, p
a
sp
ss
Motor reflex arc
prediction and error
hidden states
2
xi
2
1.5
ss
sp
1
1.5
1
0.5
0.5
0
0
-0.5
-0.5
5
10
15
20
25
30
xe
5
10
Time (bins)
15
20
25
30
Time (bins)
High sensory attenuation
hidden causes
1
perturbation and action
1
vi
a
0.8
0.6
0.5
ve
0.4
0.2
0
0
-0.2
-0.4
-0.5
-0.6
-0.8
5
10
15
20
Time (bins)
25
30
5
10
15
20
Time (bins)
25
30
prediction and error
hidden states
2
2
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
5
10
15
20
25
30
5
10
time
15
20
25
30
time
Low sensory attenuation
hidden causes
perturbation and action
1
1
0.8
0.6
0.5
0.4
0.2
0
0
-0.2
-0.4
-0.5
-0.6
-0.8
5
10
15
time
20
25
30
5
10
15
time
20
25
30
prediction and error
2
hidden states
2
1.5
hidden states
prediction and error
2
2
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
10
20
30
40
50
60
10
20
Time (bins)
30
40
50
60
10
20
Time (bins)
30
Sensory attenuation
hidden causes
2
2
40
50
60
10
perturbation and action
hidden causes
1.5
1
1
0.5
0.5
1
0.5
0.5
0
0
0
0
-0.5
-0.5
30
40
Time (bins)
50
60
-0.5
10
20
50
60
perturbation and action
1.5
1
20
40
Force matching illusion
1.5
10
30
Time (bins)
1.5
-0.5
20
Time (bins)
30
40
Time (bins)
50
60
10
20
30
40
Time (bins)
50
60
10
20
30
40
Time (bins)
50
60
Failures of sensory attenuation, with compensatory increases in
non-sensory precision
3
Simulated
Empirical
(Shergill et al)
Self-generated(matched) force
Self-generated(matched) force
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
External (target) force
2.5
3
External (target) force
prediction and error
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
10
20
30
40
50
hidden states
3.5
60
-0.5
10
20
Time (bins)
30
40
50
60
Time (bins)
A failure of sensory attenuation and delusions of control
hidden causes
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0
0.5
-0.5
0
-1
perturbation and action
10
20
30
40
Time (bins)
50
60
-0.5
10
20
30
40
Time (bins)
50
60
Thank you
And thanks to collaborators:
Rick Adams
Andre Bastos
Sven Bestmann
Jean Daunizeau
Mark Edwards
Harriet Brown
Lee Harrison
Stefan Kiebel
James Kilner
Jérémie Mattout
Rosalyn Moran
Will Penny
Klaas Stephan
And colleagues:
Andy Clark
Peter Dayan
Jörn Diedrichsen
Paul Fletcher
Pascal Fries
Geoffrey Hinton
James Hopkins
Jakob Hohwy
Henry Kennedy
Paul Verschure
Florentin Wörgötter
And many others
Searching to test hypotheses – life as an efficient experiment
H ( S ,  )  H ( S | m)  H ( | S )
 Et [ ln p( s (t ) | m)]  Et [ H ( | S  s (t ))]
Free energy principle
minimise uncertainty
 (t )  arg min{H [ q( |  , )]}

Time-scale
Free-energy minimisation leading to…
10 3 s
Perception and Action: The optimisation of neuronal and
neuromuscular activity to suppress prediction errors (or freeenergy) based on generative models of sensory data.
100 s
103 s
106 s
1015 s
Learning and attention: The optimisation of synaptic gain and
efficacy over seconds to hours, to encode the precisions of
prediction errors and causal structure in the sensorium. This
entails suppression of free-energy over time.
Neurodevelopment: Model optimisation through activitydependent pruning and maintenance of neuronal connections that
are specified epigenetically
Evolution: Optimisation of the average free-energy (free-fitness)
over time and individuals of a given class (e.g., conspecifics) by
selective pressure on the epigenetic specification of their
generative models.
Download