Fiser-Jozsef-Sloan-Swartz

advertisement
Exploring a sampling-based framework for
probabilistic representation and computation
in the cortex
József Fiser
Brandeis University
Pietro Berkes
Brandeis University
Máté Lengyel, Gergő Orbán
University of Cambridge
Outline
 Challenges of the standard model of vision in the cortex
 Proposal: the sampling hypothesis
 How viable the sampling model is
 Empirical evidence explained
Challenges
 Behavioral: In many cases, animals' and humans' perception,
action and learning is best captured by models that assume
probabilistic representation and computation in the brain
 Physiological: High level of structured spontaneous activity
in the cortex that questions the feed forward nature of
information processing and the role of ongoing activity
 Coding: High trial-to-trial variability of cell responses even in
primary sensory cortices that strongly interferes with reliable
neural computation. Systematic variations of neural response
variance is not explained by current theories.
The sampling hypothesis
The cortex performs a probabilistic computation for inference
and learning and to achieve this it uses a sampling-based
neural representation in which spontaneous activity represents
the prior information used for inference.
 Probabilistic computation: Represents and computes with
probability distributions in order to handle uncertainty.
 Sampling-based: Distributions are represented by samples.
 Spontaneous = prior: Spontaneous activity is not noise.
Logic of sampling-based representation
trials / time
variable 2
firing rate
variable 2
orientation
intensity
feature
2 2
firing rate
neuron 1
neuron 2
variable 1
orientation
feature
intensity
1
1
variable 1
neuron 1
Functional role of spontaneous activity
 Brightness coding in the primary visual cortex is factorized
(Rossi et al. Science 1996)
 Spontaneous activity =
samples from the prior
Formalizing the relationship between
SA and EA
 Activity p(y | x) in absence of input stimuli (x) represent
samples from the prior p(y) - and this is the spontaneous
activity.
P (y) learning

  P (y | x)  Pdata (x) dx
from
Pdata ( x )
PSA (y)  PEA (y | x) Pdata ( x )
Spontaneous activity
Evoked activity
Natural scenes
Evidence of sampling based probabilistic
representation in the awake cortex
Visual
Data collection
• Awake, head-restrained ferrets
• 4 age groups (P30, P45, P90, P120)
• Multi-electrode recording with 200 µ spacing from layer 2-3 of V1
• Complete dark (spont. activity)
• Drifting grating
• Natural scene movie
• Random noise
Results I
 SA and EA in the Movie condition converge with age
 In the adult animal, they do not differ significantly
Results II
 The match between SA and EA is specific to natural scenes
Results III
 The match between SA and EA is not specific to the visual
modality
FAQ
FHS
Why sampling just cannot be the answer
 It is not clear how the neural circuitry can do it
 It requires way too much time to collect sufficient
number of samples
 It must break down if the input is non-stationary
 Limited number of samples must lead to bias in
learning
Well, let’s see…
How can sampling be inplemented in neural networks?
Quite naturally
 Gibbs sampling
 Hamiltonian Monte Carlo – Langevin sampling:
A simple model of cue combination based on sampling
 Task: given auditory and visual information where is the target?
 Hamiltonian Monte Carlo – Langevin sampling:
How many samples are needed for accurate estimation?
Surprisingly few
 With 1 sample, the variance of the estimate is just twice the
optimal (ML) variance
 Sampling might be optimized by the cortex via minimizing the
autocorrelation of successive samples:
What happens with non-static input?
Things are actually getting better not worse
 As long as the internal model is correct, the model will have no
problem tracking the signal
 Why is this happening? No need for burn-in, plus tighter prior
Can we learn with such a small number of sampels?
Yes.....
 Learning tuning curve positions and variances
 Results
 It is not clear how the neural circuitry can do it
 It requires way too much time to collect sufficient
number of samples
 It must break down if the input is non-stationary
 Limited number of samples must lead to bias in
learning
Sampling can be the answer
✔
✔
✔
✔
Experimental results
explained
Simulations
Receptive fields
 u – membrane pot.
Gain [u-Thres]1.2
 ri - Poisson rate
 ni - spike count
 Trained on 80 000 patches of size 8x8 from the van Hateren
image database
 Maximum likelihood learning using Expectation Maximization
Specific questions
 Can we reproduce our KL results?
 Does the model agree on characteristics of variability
measured in neural activity?
 Contrast
dependence of variability
 Stimulus
specificity of variability
 Onset
effects of variability
 Stimulus
specificity of co-variability
Sanity check
 Sampling reproduces the KL divergence results
Contrast dependent variability of u
Posterior
Blank
u2
P(z|x)
High
Blank
High
u1
 Sampling gives a natural link between contrast and variance
Stimulus dependent variability of u
and the effect of stimulus onset
SIMULATIONS
Nonpreferred
Preferred
time (ms)
 Sampling can reproduce both the stimulus independence of
variability-change and the quenching effect of stimulus onset
Stimulus dependent co-variability of u
EXPERIMENT
SIMULATION
(Yu & Ferster 2010 Neuron)
 Sampling produces a reduced co-variability of the membrane
potential at stimulus presentation just as it was found in vivo
Stimulus dependent co-variability of u
EXPERIMENT
SIMULATION
Similar
orientation
preference
Dissimilar
orientation
preference
(Yu & Ferster 2010 Neuron)
 Sampling can reproduce these correlational effects, too
Conclusions
• Sampling based probabilistic schemes are viable alternatives
for neural representation in the cortex providing a unified
framework to investigate learning and instantaneous
perception
• Measures based on comparing SA and EA support the idea
that the cortex uses sampling based representation and SA
has the functional role of providing prior information for
probabilistic perceptual inferences
• The sampling based representations corroborate the idea that
rather than using mostly first moments, cortical coding is
strongly related to signal co-variances and higher order
moments of the neural signal
Thank you !
The standard view on vision
 The fundamental method of information processing in the
visual system can be described as
• deterministic computation with added neural noise
• feed-forward
• based on mean firings of cells
 The function of the primary visual cortex in this framework is
• restricted to represent only visual information
• to create a faithful representation of the sensory
stimulus with a neural code that is as sparse and
independent as possible
Challenges
 Behavioral: In many cases, animals' and humans'
perception, action and learning is best captured by models
that assume probabilistic representation and computation in
the brain
Life is uncertain and ambiguous…
 2D to 3D
s: stimulus
Orientation
of stimulus
Intensity
of stimulus
 Proposal
The brain encodes both the
value and the uncertainty of the
stimulus during perception.
Probabilistic approaches in …
 Classical conditioning (Courville et al. TICS 2006)
 Perceptual processes (Kersten et al. Ann. Rev. Psych. 2006)
 Visuo-motor coordination (Kording & Wolper Nature 2004)
 Cue combination (Atkins et al. Vis Res 2001; Ernst & Banks Nature
2002)
 Decision making (Trommershäuser et al. TICS 2008)
 High-level cognitive processes (Griffiths & Tenenbaum TICS 2006)
 Visual statistical learning (Orban et al. PNAS 2008)
Is the neural computation deterministic?
Probabilistic computation in the brain
Computation
Evidence
Inference
Behavior
al
Neural
Learning
Atkins et al., 2001
Ernst & Banks, 2002
Körding & Wolpert, 2004
Dayan et al., 2000
Weiss et al., 2002
Tenenbaum et al., 2006
Yang & Shadlen, 2007
Orbán et al., 2008
Kepecs et al., 2008
Platt & Glimcher, 1999
Yang & Shadlen, 2007
Kepecs, et al., 2008
?
Challenges
 Behavioral: In many cases, animals' and humans'
perception, action and learning is best captured by models
that assume probabilistic representation and computation in
the brain
 Physiological: High level of structured spontaneous activity
in the cortex that questions the feed forward nature of
information processing and the role of ongoing activity
Spontaneous activity in the awake brain
Is it really noise?
Example: Stimulus onset quenches trial-to-trial
variability
Churchland & >25; Nat Neuro 2010
Why?
The sampling hypothesis
Probabilistic representation in the cortex
Sampling-based
Parametric
 Probabilistic Population
Codes (PPCs)
 Predecessors: Kernel Dens. Estimation
Distr. Population Codes
 Earlier proposals:
• Lee & Mumford 2003
• Hoyer & Hyvarinen 2003
 Implicit examples:
• Boltzmann machine
• Helmholtz machine
 Special cases:
• Olshausen & Field 1996
• Karklin & Lewicki 2009
A simple model of the early visual system
 e.g. Olshausen & Field 1996
Sparseness and independence
 probabilistic formulation
Perception is an inference process which is always based on
internally available prior information
Looking for supporting evidence
Predictions
With accumulating visual experience, the distribution of
spontaneous activity should become increasingly similar to
the distribution of evoked activity averaged over natural
stimuli .
 Probability of exhibiting a given neural activity pattern should
be identical
 Probability of transitioning between particular patterns should
also match
 Similarity should be specific to natural scenes but not to
other stimulus ensembles
Data analysis

KL pMovie PpSpont
pMovie x 
  pMovie x  log
pSpont x 
X

Results II
 Both SA and EAMovie become increasingly correlated
 Spatial correlations are similar between SA and EAMovie
16
P (y)   P(yi )
~
i 1
Results III
 Temporal correlations are similar between SA and EAMovie
State  transition
distributions : P(yt  | yt )
~
Control : P(yt  | yt )  P(y)
Evidence of sampling based probabilistic
representation in the awake cortex
Auditory
Data collection
• Awake, head-restrained ferrets
• One age group (adults)
• Single-electrode single-unit recording in A1
• Complete silence (spont. activity)
• White noise
(0-20 KHz)
• Natural speech
Shamma-lab University of Maryland
Stimulus dependence of spike count correlations
 Sampling can reproduce these correlational effects, too
Evidence against sparseness and independence
• No support for active sparsification
independence in the cortex
or
increase
of
Relation between MP, mean, variance and Fano
 MP variance decreases, both SC meand and variance
increase, but the mean increases stronger therefore the
Fano factor still decreases
Standard models of visual recognition
Neural bases:
(Hubel and Wiesel, 1962)
System:
(Ohki et al., 2006)
Example: Cue combination
Atkins, Fiser & Jacobs VisRes 2001; Ernst & Banks Nature 2002
Is the neural computation deterministic?
Example 2: Decision making
 Trommershäuser, Maloney & Landy TICS 2008
Probabilistic inference and learning are coupled
A fact
If instantaneous perception is a process of
probabilistic inference, learning also must be
probabilistic.
Probabilistic inference and learning
Part 3
Evidence
Probabilistic representation in the cortex: neural
Computation
Behavi
oral
Neural
Inference
Learning
Atkins et al., 2001
Ernst & Banks, 2002
Körding & Wolpert,
2004
Weiss et al., 2002
Yang & Shadlen, 2007
Kepecs et al., 2008
Dayan et al., 2000
Tenenbaum et al.,
2006
Platt & Glimcher, 1999
Yang & Shadlen, 2007
Kepecs, et al., 2008

Orbán et al., 2008
?
Alternative representational schemes
Parametric
 Probabilistic Population
Codes (PPCs)
 Predecessor: Kernel Dens. Estim.
Sampling-based
 Implicit examples:
• Boltzmann machine
• Helmholtz machine
 Special cases:
• Olshausen & Field 1996
• Karklin & Lewicki 2009
Practical issues in probabilisitic representation
s: stimulus
Parametric (PPC)
Sampling-based
Comparing the two schemes
PPCs
PPC
sampl ing-based
Sampling-based
neurons correspond to
parameters
variables
network dynamics
required
(beyond the first layer)
deterministic
stochastic
(self consistent)
representable
distributions
must correspond to a
particular parametric
form
can be arbitrary
critical factor in accuracy
of encod ing a distribution
number of neurons
time allowed for sampling
instantane ous
representation of
uncerta inty
complete, the whole
distribution is
represented at any time
partial, a sequence of
samples is required
number of neurons
needed for representing
multimodal distributions
scales exponentially
with the number of
dimensions
scales linearly with the
number of dimensions
implementation of
learning
unkno wn
well-suited
Logic of the two types of representation
Sampling-based
variable 2
variable 2
orientation
intensity
feature
2 2
PPC
variable 1
variable 1
neuron 1
orientation
feature
intensity
1
neuron
2
1
neuron 3
firing rate
neuron 1
...
variable22
variable
firing rate
variable 2
trials / time
neuron 2
neuron 1
variable 1
neuron 2 variable 1
neuron 3
variable 1
...
neuron 1
Bottom line
 Sampling
based
probabilisitic
representational
schemes are promising alternatives but there is no
neural evidence supporting the hypothesis that the
cortex uses such a scheme
 In order to obtain such evidence we take a detour…
Part 4
The role of spontaneous activity in visual coding
• Raichle-lab: humans, entire cortex cortex, fMRI
- Fox & Raichle Nature Rev Neu. 2007; Vincent et al. Nature 2007; Fox et al. Nat. Neu.
2005
• Grinvald-lab: cat, visual cortex, optical imaging
- Kenet et al. Nature 2004; Tsodyks et al. Science 1999; Arieli et al. Science 1996
• Buzsáki-lab: rat, hippocampus, tetrode recording
- Buzsáki & Draghun Science 2004; Harris et al. Nature 2003; Leinekugel et al. Science
2002;
• McCormick-lab: ferret, cat, rat slices
- Shu et al. Nature 2003; Shu et al. J. Neurosci 2003; McCormicki et al. Cereb. Cortex
2003
• Yuste & Ferster labs: in vivo in vitro single cell recording, 2-photon Ca imaging
- Ikegaya et al. Science 2004; Anderson et al. Nat. Neuro 2000; Cossart et al. Nature
2003
Spontaneous activity in the awake brain
(Fiser et al. Nature 2004)
V
• Awake, head-restrained ferrets
• 3 age groups (P30, P45, P90)
Visual
stimulus
• Multi-electrode recording from V1
• Three interleaved trial types:
• complete dark (spontaneous act.)
• random noise stimuli
• natural scene movie
The MATRIX
Goal: Compare neural activity under the three different conditions
Spontaneous activity in the awake brain
(Fiser et al. Nature 2004)
Results:
• A systematic change in the correlational structure with age
• Minimal difference between spontaneous and evoked activities
Framework
 Not perceptual learning in the classic sense, but representational learning of the structure of the visual world through
detecting "suspicious coincidences" a'la Barlow
– Classical perceptual learning
– Representational learning
 Supervised learning
 Reinforcement learning
 Signal-detection problem
 Unsupervised learning
 Combinatorial problem
 Not higher cognition or concept learning, but visual perception
 A plausible method for infants and animals, too
Experiments with pairs
 Inventory:
A
 Test: 2-AFC, Base-pair vs.
Non-base pair
B
E
 Non-base element in either
novel or familiar grid position
I
J
Base-pair
F
Non-base pairs
Novel element pos.
Familiar element pos.
I
F
A
B
F
I
Pair inventory
**
Novel position
*
*
Familiar position
Elements of the non-base pairs
 Subjects automatically extracted both position-dependent
and position-independent statistics from the scenes
Frequency-balanced inventory
Rare
Frequent
Base pairs
Freq. balanced
nonbase-pairs
Base pairs
Frequent base-pairs
SCENE
Frequency balanced
nonbase-pairs
Frequency-balanced results
 Subjects automatically preferred the rare but more predictable pairs
 At the same time, they were also sensitive to single element
appearance frequency
Embedded inventory
How do we learn hierarchical structures?
Method:
 Two base-pairs
 Two base-quadruples
 Six elements per scene
 Testing embedded and not
embedded base-pairs
against non-base pairs
Sample scene
Embedded results
1st round
2nd round
*
Chance
• significant preference for base quads and non-embedded pairs
• no significant preference for embedded pairs
Results with balanced appearance of each element
Experiment 1
p < 0.007
Results with frequency-balanced non-base pairs
Experiment 2
Pairs: p < 0.012
Experiment 3
Singles: p = ns.
Model performances I
Basic experiments with pairs
 Frequency-counting models fail
Model performances II
Embedded experiment with triplets
A
D
 All three of the naïve statistical models fail
Sample scene
B
E
C
F
G
AH
I B
G
I
C
K
H
J
L
Summary
 Correct prediction without any parameter tweaking
Part 3
The capacity of Working Memory
Externa
l
world
Visual
stimulus
Working memory
Long-term memory
What is the effect of long-term memory on working memory?
Now we have a tool to investigate this!
Issues with the capacity of WM
 WM is defined as a limited-capacity memory buffer
 Capacity of WM is traditionally given as the maximum number of
‘chunks’ stored in WM
 There is a controversy around the capacity of WM: 4 or 2 or 1?
Vogel & Luck, Nature (1997)
Olsson & Poom, PNAS (2005)
Idea: Long-term memory is used to compress data in WM
Visual Encoding
signal
Externa stimulus
l
world
Working memory
LTM dependent
capacity
Description length
Learning
Source coding
theorem
Long-term memory
Predictive
distribution
DL
Likely stimulus
Encoding signal:
b a a
Unlikely stimulus
Encoding signal:
a a b d b a c
Experimental paradigm
Familiarization
Change detection task
2AFC task
?
vs.
Learning predictive
distribution
Compression description length
Assessing
learning efficiency
Key idea: modulating DL by building from different chunks
Overall performance of humans
True inventory
2AFC task
CD task
Naïve All
vs.
vs.
vs.
GL
BL
Overall performance of humans: Effect of set size
Learning decouples set size and description length
Controlling for set size
Controlling for DL
Performance change of trained subjects relative to naive subjects
Low-DL trials
Intermediate-DL trials
High-DL trials
Conditioning eliminates most of set size dependence
Controlling for DL
Controlling for set size
Part 4
The role of spontaneous activity in visual coding
An abstract computational model of representational learning
• Probabilistic representation of data and uncertainty
• Use of a generative model of the world
A partial model of visual coding in the brain with issues
• Ongoing activity, trial-by-trial response variability
• Context dependency, attention, illusions
Can we reconcile these issue?
A Bayesian generative framework for the visual system
 Generative modeling hypothesis: the visual cortex
implements an approximate inference in a generative
model of the visual input.
p(x)
p(y | x)
Causes: objects (x)
…
Learning
…
pixels (y)
Representation = probability
distributions
p(x | y)
 p(y | x)  p(x)
y
Recognition = inference
 Sampling hypothesis: neural activity represents Monte
Carlo samples from the probability distribution over
possible causes p(x | y)
Top-down
effects
Confirmation
 Similarity measured by Kullback-Leibler divergence between
the distributions of spontaneous and evoked activity

KL pMovie PpSpont
 Results:
pMovie x 
  pMovie x  log
pSpont x 
X

The probabilistic model
Familiarization/learning:
Testing/calculating predictive probabilities
Graphical
model:
Likelihood:
Model posterior:
Choice probability:
Computational framework
Sigmoid Belief Network
Computational framework
Associative learner
Graphical
model:
Likelihood:
Model posterior:
Choice probability:
Embedded pairs
Bayesian model comparison
Compare evidences for inventories:
Automatic Occam’s razor
possible data sets
experienced data set
Overall performance of humans: Effect of set size
Fitting 2-AFC:
How about temporal correlations?
• Adults can rapidly learn to segment (and
group) sequences of shapes based solely on
statistical information (joint and/or transitional
probabilities)
(Fiser & Aslin JEP LM&C 2002)
• Infants can do the same
(Fiser, McCrink & Aslin, in preparation)
Download