Generative models and receptive fields

advertisement
Projects:
1. Predictive coding in balanced spiking networks
(Erwan Ledoux).
2. Using Canonical Correlation Analysis (CCA) to
analyse neural data (David Schulz).
3. Encoding and Decoding in the Auditory System
(Izzett Burak Yildiz).
4. Quadratic programming of tuning curves: a
theory for tuning curve shape (Ralph
Bourdoukan).
5. The Bayesian synapse: A theory for synaptic
short term plasticity (Sophie Deneve).
Projects:
1. Choose a project. Send email to
sophie.deneve@ens.fr
2. Once project assigned, take appointment
with advisor ASAP (before April 17).
3. Plan another meeting with advisor (midMay).
4. Prepare Oral presentation (June 5).
Pedagogy, context, clarity, results not so
important.
The efficient coding hypothesis
Predicting sensory receptive fields
Schematics of the visual system
The retina
Center-surround RFs
Hubel and Wiesel
V1 orientation selective cell
Hubel and Wiesel model
How are receptive fields measured?
How are receptive fields measured?
How are receptive fields measured?
How are receptive fields measured?
It is a linear regression problem
It is a linear regression problem
Solution:
w   ss
T
 sr 
1
T
Receptive fields of V1 simple cells
Optimal sensory coding?
The notion of surprise
The entropy of a distribution
Minimal and maximal entropy
Maximizing information transfer
Conditional entropy H(Y|X): Surprise about Y when one knows X
H  Y | X    p  x  H Y | X  x 
xA
With:
H Y | X  x     p  y | x  log  p  y | x  
yB
Or more shortly:
H Y | X   

xA, yB
p  x, y  log  p  y | x  
Maximizing information transfer
Conditional entropy H(Y|X): Surprise about Y when one knows X
H  Y | X    p  x  H Y | X  x 
xA
Mutual information between X and Y:
I  X ,Y   H  X   H  X | Y 
 H  Y   H Y | X 
Maximizing information
Mutual information between x and y:
X
Y
I  X ,Y   H  X   H  X | Y 
Maximize
Boring!
Interesting!
…or…
Minimize
p  x | y
p  x | y
Unreliable!
Precise!
Sensory system as information channel
Maximizing information transfer
Mutual information between x and r:
Generative models
Analysis models
I  r, s   H  s   H  s | r 
 H r   H r | s
Maximize
Fixed (no noise)
Maximizing information transfer
Distribution of responses
Entropy maximization
Infomax activation function
An example in the fly
But: neurons cannot have any
activation function!
Information maximization
Information maximization
Information maximization
Two neurons
Each neuron maximizing its own
entropy
Entropy of a 2D distribution
Two neurons
Entropy maximization = Independent
component analysis
Entropy maximization, 2 neurons
Independent component analysis, N
neurons
Application: visual processing
Transformation of the visual input
Entropy maximization
Entropy maximization
Weights learnt by ICA (image patch)
The distribution of natural images
Geometric interpretation of ICA
First stages of visual processing
The efficient coding hypothesis
Limitations of ICA
Works only once…
Great!
Limitations of ICA
Works only once…
Great!
… and then what?
Limitations of ICA
Complete basis. Number of features = Number of pixels
Limitations of ICA
Bottleneck
Number optic nerve fibers << Number of retinal receptors
Maximizing information transfer
Mutual information between x and r:
Fixed
Generative models
Analysis models
I  r, s   H  s   H  s | r 
Minimize
Reconstruction
error
 H r   H r | s
Maximize
Fixed (no noise)
Maximizing information
Mutual information between x and y:
s
r
I  r, s   H  s   H  s | r 
Fixed
Minimize
ps | r
ps | r
Unreliable!
Precise!
Maximizing information
Mutual information between x and y:
s
r
I  r, s   H  s   H  s | r 
Fixed
Minimize
ps | r
r
Unreliable!
must predict the sensory input
as well as possible
ps | r
Precise!
Generative model
h1
h2
h3
h4
h5

Generate
s1
s2
s3
Independent, prior p
 h
si    ij h j  Noise
j
Generative model
h1
h2
h3
h4
h5

Generate
s1
s2
s3
Independent, prior p
 h
si    ij h j  Noise
j
I  s, h   H  s   H  s | h 
Generative model
h1
h2
h3
h4
h5

Generate
s1
s2
s3
Independent, prior p
 h
si    ij h j  Noise
j
I  s, h   H  s   H  s | h 
Find the dictionary of features,

, minimizing
H s | h
The Gaussian Distribution
Minimize mean squared error
Generative model, recognition model
h1
h2
h3
h4
h5

Generate
s1
s2
s3
Independent, prior p
si   ij h j  N  0,  2 
j
Recognize
r1  h1
 h
Minimize entropy
r2  h2
r3
r4
Minimize expected reconstruction error
r5
sˆi    ij rj  si  reconstruction error
j
Separate the problem in two:
• Start with some random dictionary

*
*
s
• Given current sensory input , and dictionary  estimate
the hidden state
r
r
• Given the current state estimates
and sensory input
* to minimize reconstruction error.
update the

• Repeat until convergence.
s
How to estimate r= h?
h1
h2
h3
h4
h5

Generate
s1
s2
s3
Recognize
Maximum a-posteriori (MAP)
r1
r2
r3
r4
r5
r  arg max  p  h | s,   
h
How to estimate r= h?
Bayes rule:
h1
h2
h3
h4
h5

Generate
s1
s2
s3
p  s | h,   p  h 
p  h | s,   
p s
h


r  arg max log  p  s | h,     log  p  h  
Recognize
h
r1

r  arg max log  p  s | h,   p  h  
r2
r3
r4
r5

Reconstruction error and MAP
Normal distribution
Cost

2
p  si | h   N   ij h j ,  
 j

Prior
  hk    log  p  hk  
Variance of pixel noise
Minus log posterior equivalent to reconstruction error with cost:
2


 log  p  h | s,     2   si   ij h j     hk 
 i 
j
k

1
Minimize reconstruction error
sˆi    ij rj
j
Reconstructed sensory input
Neural responses
Dictionary of features


2
r  arg min    si  sˆi      rk  
r
k
 i

Reconstruction error
Penalty or cost
How to estimate r= h?
h1
h2
h3
h4

Generate
s1
s2
s3
r1
Maximize log posterior probability:
 

2
ˆ
rj  
s

s


r




 i i k k 
rj  i
T
Recognize
 '
h5
r2
r3
T 
r4
r5
  r 
r

 T s  T r
t
r
How to update the dictionary
h1
h2
h3
h4

Generate
s1
s2
s3
r1
Minimize mean-squared error:

ij 
ij
  sˆ  s 
i
2
i
i
T
Recognize
 '
h5
r2
r3
T 
r4
r5
ij   rj  si  sˆi 

Generative model, recognition model
h1
h2
h3
h4
h5

Generate
s1
s2
s3
2. Update
Recognize
r1
1. Find most probable hidden states
r2
r3
r4
r5
 to minimize MSE
What prior to use? Sparse coding
p  h
Cost = number of neurons with non-zero responses
Good!
h
Many cortical neurons are near-silent…
p  h
Bad!
h
Sparse responses of an edge detector
…
…
ri
Sparse prior:
p  hk   exp   hk
ri

Elementary features found by sparse
coding
p  h  e
 h
Limitation of the sparse coding
approach applied to sensory RFs
h1
h2
h3
h4
h5

Generate
s1
s2
s3
r2
r3
r4
“Predictive fields”
Different!
Recognize
r1
sˆ  h
r5
“Receptive fields”
rˆ  ws
Receptive fields depend on stimulus
type
Receptive fields depend on stimulus type
Carandini et al, JNeurosci 2005
Responses to natural scene are poorly predicted by the RF.
STRF:
f
t
Machens CK, Wehr MS, Zador AM. J Neurosci. 2004
Download