Neural Coding Through The Ages February 1, 2002 Albert E. Parker

advertisement
Neural Coding Through
The Ages
February 1, 2002
Albert E. Parker
Complex Biological Systems
Department of Mathematical Sciences
Center for Computational Biology
Montana State University
Outline
 Introduction
 The Problem
 Approaches in Neural Encoding
– Spike Count Coding
– Poisson Model
– Wiener/Volterra series
(Adrian and Zotterman 1926)
(Fatt and Katz 1952)
(1930, 1958)
 Approaches in Neural Decoding
– Linear Methods
• Linear Reconstruction (Reike et al 1997)
• Vector Method (Georgopoulos et al 1983)
• Optimal Linear Estimator (Abbot and Salinas 1994)
– Gaussian Model
– Metric Space
(de Ruyter van Steveninck and Bialek 1988)
(Victor and Purpura 1996)
We want to understand the neural code.
We seek an answer to the question:
How does neural activity represent information about environmental stimuli?
“The little fly sitting in the fly’s brain trying to fly the fly”
Looking for the dictionary to the neural code …
encoding
stimulus
X()
response
Y(t)
decoding
… but the dictionary is not deterministic!
Given a stimulus, an experimenter observes many different neural responses (spike trains):
X()
Yi(t)| X()
i = 1, 2, 3, 4
… but the dictionary is not deterministic!
Given a stimulus, an experimenter observes many different neural responses (spike trains):
X()
Yi(t)| X()
i = 1, 2, 3, 4
Neural coding is stochastic!!
Similarly, neural decoding is stochastic:
Y(t)
Xi()|Y(t)
i = 1, 2, … , 9
Probability Framework
encoder: P(Y(t) |X())
environmental
stimuli
neural
responses
X()
Y(t)
decoder: P(X()|Y(t))
Information Theory tells us that if the relationship
between X and Y can be modeled as an optimal
communication channel, then …
P(X(),Y(t))
neural responses
Areas of high probability
A coding scheme
needs to be stochastic
on a fine scale and
almost deterministic
on a large scale.
Y
X
environmental stimuli
How to determine a coding
scheme?
There are 2 methodologies in this search:
encoding (determining P(Y|X))
decoding (determining P(X|Y))
As we search for a coding scheme, we proceed in the spirit of
John Tukey:
It is better to be approximately right than exactly wrong.
Neural encoding ….
Spike Count Coding
(Adrian and Zotterman 1926)
Encoding as a non-linear process: the response tuning curve
Response
Amplitude
Stimulus Amplitude
• Hanging weights from a muscle ( Adrian 1926)
Stimulus Amplitude: mass in grams
• Moving a pattern across the visual field of a blowfly (de Ruyter van Steveninck and Bialek, 1988)
Stimulus Amplitude: average velocity (omms/s) in 200ms window
In Spike Count Coding, response amplitude is the (spikes count)/time
Spike Count Coding
P(Y |X)
spikes/time
Stimulus Amplitude
An experimenter repeats each stimulus many times to
estimate the encoder P(Y | X).
Spike Count Coding
P(Y |X)
spikes/time
P(Y)
Stimulus Amplitude
P(X)
And now you can get P(X | Y) from Bayes rule
1
P( X | Y )  P(Y | X )  P( X ) 
P(Y )
Spike Count Coding
the directional tuning curve
(Miller,
Jacobs and Theunissen 1991)
The response tuning curves for the 4 interneurons in the cricket cercal
sensory system. The preferred directions are orthogonal to each other.
Cons
• Counting spikes/time neglects the temporal pattern of the spikes of the
neural response, which
– Potentially decreases the information conveyed
– Known short behavioral decision times imply that many neurons make
use of just a few spikes
• Sensory systems respond to stimulus attributes that are very complex.
That is, the space of possible stimuli for some systems is a very large
(infinite dimensional) space. Hence, it is not feasible to present all
possible stimuli in experiment.
• Some neural systems do seem to
encode certain stimulus attributes
by (number spikes)/time.
• Can work well if the stimulus
space is small (e.g. when coding
direction in cricket cercal sensory system).
Abbot, 2001
Pros
Poisson Model
Electrical multi-site stimulation of
chicken retina in vitro:
(Fatt and Katz 1952)
P-type afferent responses in the electric
fish to transdermal potential stimulus
(Xu, Payne and Nelson 1996)
(Stett et al., 2000)
These (normalized) histograms give the
probability per unit time of firing given that
x(t) occurred: r[t |X()=x()]
Poisson Model
If we assume that the spikes are independent from each other
given a stimulus X(), then we can model P(Y | X) as an
inhomogenuous (dependent on time) Poisson process:
If Y is the (spike count)/time, then
P(Y | X()) = Poisson( r[t |X()] dt)
If Y(t) is a spike train, then
P(Y(t) |X()) = Poisson-like(r[t |X()])
T

1 N

r[ti | X ( )]  exp   r[t | X ( )]

N ! i 1
0

Poisson Model
Abbot, 2001
When is the Poisson model a good one to try? Examine the mean and
variance of the spike counts/time given each stimulus. For a Poisson
process, they should be equal:
Pros
• Have an explicit form of P(Y|X)
• A Poisson process is a basic, well
studied process
Cons
(Stett et al., 2000)
• Counting spikes/time neglects the
temporal pattern of the spikes of the neural response.
• Assuming that spikes are independent neglects the refractory period of
a neuron. This period must be ‘small’ compared to mean ISI in order
for Poisson model to be appropriate. To deal with this:
– Berry and Meister (1998) have proposed a Poisson model that includes the
refractory period.
– Emory Brown (2001) uses a mixture of Gamma and inverse Gaussian
models.
• The space of possible stimuli for some systems is a very large space,
so it is not possible to present all possible stimuli in experiments to
estimate r(t | X()).
Wiener / Volterra Series
•
(1930, 1958)
The Taylor series for a function y = g(x):
y(x) = y(x0) + y’(x0 ) (x - x0) + ½ y’’ (x0)(x - x0)2 + …
= f0 + f1(x - x0) + f2(x - x0)2 + …
•
The Volterra series is the analog of a Taylor series for a functional
Y(t) = G[X(t)]:
Y(t) = f0 + d1 f1(1)X(t - 1) + d1 d2 f2(1, 2)X(t - 1) X(t - 2) + …
How to compute { fi } ??
•
Wiener reformulated the Volterra series in a way so that the new coefficient
functions or kernels could be measured from experiment.
•
There are theorems that assure that this series with sufficiently many terms
provides a complete description for a broad class of systems.
•
The first Wiener kernel is proportional to the cross correlation of stimulus and
response:
f1 = <XY>/SX
This is proportional to the spike triggered average when Y(t) is a spike train.
Wiener / Volterra Series
Reike et al. 1997: recordings from H1 of the fly
Constructing the neural response (here, Y is the firing rate)
from the first Wiener kernel …
Actual response
Predicted response
seems to be able to capture slow modulations in the firing fate.
Pros
• Computing the first Wiener kernel is
inexpensive.
• Not much data is required.
Cons
Slice of a fly brain
• Although it is theoretically possible to compute many
terms in the Wiener series, practical low order
approximations, of just f0 and f1 for example, don’t work
well in practice (i.e. coding is NOT linear).
• Wiener series is for a continuous function Y(t). This is
fine when Y(t) = r[t |X()], the firing rate. But how do we
construct a series to model the discrete spiking of neurons?
• The Wiener series gives a specific Y(t) | X(). What is
P(Y | X)? In principle, one can do a lot of repeated
experiments to estimate P(Y | X). In practice, the
preparation dies on you before enough data is collected.
Neural Decoding
Why Decoding?
• Encoding looks non-linear in many systems.
Maybe decoding is linear and hence easier.
• It is conceivably easier to estimate P(X|Y) over an
ensemble of responses {Y}, since {Y} live in a
much smaller space than the {X}.
Linear Reconstruction Method
(Reike et al 1997)
• Consider a linear Volterra approximation (X  Y):
X(t)=  K1()Y(t - ) d
= i K1(t – ti )
if we represent the discrete spike train as Y(t) = i  (t – ti ),
where the ith spike occurs at ti.
• How to determine K1 ? Minimize the mean squared error:
minK() |Xobserved(t) - i K(t – ti )|2dt X
•

 F [ X ( )]   exp( it j )
j
K1  F 1 

|  exp( it j ) |2

j







Fourier transform of average
stimulus surrounding a spike
Power spectrum of the spike
train
Linear Reconstruction Method
Reike et al. 1997: recordings from H1 of the fly
Reconstructing the stimulus with the linear filter K1:
Actual stimulus
Predicted stimulus
Pros
• It’s cheap.
• The temporal pattern of the spikes is considered.
• Even though encoding is non-linear (and hence the failure of
the Wiener linear approximation), decoding for some
neurons seems linear.
Cons
• Only one neuron is modeled.
• No explicit form of P(X|Y)
Other Linear Methods
For populations of neurons
Assumptions:
1. Yi = number of spikes from neuron i in a time window.
2. X(t) is randomly chosen and continuously varying.
• Vector Method
(Georgopoulos et al 1983)
X(t) = i Yi Ci where Ci is the preferred stimulus for neuron i.
• Optimal Linear Estimator (OLE)
(Abbot and Salinas 1994)
X(t) = i Yi Di where Di is chosen so that
dt |Xobserved(t) - i Yi Di |2  Y X is minimized
so that Di = j Qij-1 Lj where
Lj is center of mass of the tuning curve for cell i
Qij is the correlation of Yi and Yj
(Abbot and Salinas 1994)
cricket cercal sensory system
Other Linear Methods
Difference between the stimulus
reconstructed by the Vector and OLE
methods and the true stimulus presented.
Evidence suggests that the cricket can code direction with an accuracy of up
to 5 degrees. This data suggests that these algorithms decode direction as
well as the cricket does in this experiment.
Pros
• Vector Method
– It’s cheap
– This method is ideal when the tuning curve is a (half) cosine.
– Has small error if Ci are orthogonal
• OLE
– Has smallest average MSE of all linear methods over a population of
neurons
Cons
• Vector Method
– It is not always obvious what the preferred stimulus Ci is for generic
stimuli.
– Does not work well if the Ci are not uniformly distributed (orthogonal)
– Requires a lot of neurons in practice
• Counting spikes/time neglects the temporal pattern of the spikes of the
neural response Y(t).
• No explicit form of P(X|Y)!
Gaussian Model
(de Ruyter van Steveninck and Bialek 1988)
• In experiment, let X(t) be a randomly chosen, continuously varying (GWN)
• Approximate P(X|Y) with a Gaussian with meanX|Y and covarianceX|Y
computed from data.
Reike et al. 1997
recordings from H1 of the fly
meanX|Y
Y(t)
Pros
• The temporal pattern of the spikes is considered.
• We have an explicit form for P(X|Y). Why should P(X|Y) be
Gaussian? This choice is justified by Jaynes maximum entropy
principle: of all models that satisfy a given set of constraints,
choose the one that maximizes the entropy. For a fixed mean and
covariance, the Gaussian is the maximum entropy model.
Cons
• An inordinate amount of data is required
to obtain good estimates of covarianceX|Y=y
over all observed y(t).
– One way to deal with the problem of
not having enough data is to cluster the
responses together and estimate a
gaussian model for each response
cluster.
Metric Space Approach
(Victor and Purpura 1996)
We desire a rigorous decoding method that …
• Estimates P(X|Y).
• Takes the temporal structure of the spikes of the neural responses Y(t)
into account.
• Deals with the insufficient data problem by clustering the responses.
Abbot, 2001
Assumptions:
• The stimuli, X1, X2, … , XC ,
must be repeated multiple
times.
• There are a total of T
neural responses: Y1 , …, YT .
Metric Space Approach
The Method:
1. Given two spike trains, Yi and Yj , the distance between them is defined
by the metric D[q](Yi , Yj), the minimum cost required to transform Yi
into Yj via a path of elementary steps:
a . adding or deleting a spike (cost = 1)
b. shifting a spike in time by t (cost = q ·|t|)
– 1/q is a measure of the temporal precision of the metric.
– D[q=0](Yi , Yj) is just the difference
in the number of spikes between the
spike trains Yi and Yj. Decoding
based on this metric is just counting
spikes.
Yi
– D[q=](Yi , Yj) gives infinitesimally
precise timing of the spikes.
Yj
Metric Space Approach
The Method:
1. Given two spike trains, Yi and Yj , the distance between them is defined
by the metric D[q](Yi , Yj), the minimum cost required to transform Yi
into Yj via a path of elementary steps:
a . adding or deleting a spike (cost = 1)
b. shifting a spike in time by t (cost = q ·|t|)
– 1/q is a measure of the temporal precision of the metric.
– D[q=0](Yi , Yj) is just the difference
in the number of spikes between the
spike trains Yi and Yj. Decoding
based on this metric is just counting
spikes.
Yi
– D[q=](Yi , Yj) gives infinitesimally
precise timing of the spikes.
Yj
Metric Space Approach
2.
Let r1, r2, … rC be response classes.
Metric Space Approach
Let r1, r2, … rC be response classes. Let N be the classification matrix.
r
r1
r2
r3 r4
r5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X1 X2 X3 X4 X5
X
N: the classification matrix
2.
Metric Space Approach
Let r1, r2, … rC be response classes. Let N be the classification matrix.
Suppose that Y1 was elicited by X1. Assign Y1 to response class r3 if
D[q](Y1 , Y)z Y elicited by X_ 3 1/z is the minimum over all Xk for k = 1, … , C.
r1
r2
r3 r4
r5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X1 X2 X3 X4 X5
N: the classification matrix
2.
3.
Metric Space Approach
4.
Let r1, r2, … rC be response classes. Let N be the classification matrix.
Suppose that Y1 was elicited by X1. Assign Y1 to response class r3 if
D[q](Y1 , Y)z Y elicited by X_ 3 1/z is the minimum over all Xk for k = 1, … , C.
Increment N1, 3 by 1.
r1
r2
r3 r4
r5
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X1 X2 X3 X4 X5
N: the classification matrix
2.
3.
Metric Space Approach
4.
5.
Let r1, r2, … rC be response classes. Let N be the classification matrix.
Suppose that Yi was elicited by X. Assign Yi to response class r if
D[q](Yi , Y)z Y elicited by X_  1/z is the minimum over all Xk for k = 1, … , C.
Increment N,  by 1.
Repeat steps 3 and 4 for Yi for i = 1, … , T.
r1
r2
r3 r4
r5
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X1 X2 X3 X4 X5
N: the classification matrix
2.
3.
Metric Space Approach
2.
3.
4.
5.
Let r1, r2, … rC be response classes. Let N be the classification matrix.
Suppose that Yi was elicited by X. Assign Yi to response class r if
D[q](Yi , Y)z Y elicited by X_  1/z is the minimum over all Xk for k = 1, … , C.
Increment N,  by 1.
Repeat steps 3 and 4 for Yi for i = 1, … , T.
r1
r2
r3 r4
3 11 3
0 0 0
0 0 0
0 0 0
0 0 0
2
0
0
0
0
X1 was presented 20 times
eliciting 20 neural responses
r5
1
0
0
0
0
X1 X2 X3 X4 X5
N: the classification matrix
After repeating the process for all Y elicited by X1 …
Metric Space Approach
2.
3.
4.
5.
Let r1, r2, … rC be response classes. Let N be the classification matrix.
Suppose that Yi was elicited by X. Assign Yi to response class r if
D[q](Yi , Y)z Y elicited by X_  1/z is the minimum over all Xk for k = 1, … , C.
Increment N,  by 1.
Repeat steps 3 and 4 for Yi for i = 1, … , T.
r1
r2
r3 r4
3 11 3
5 10 3
0 0 0
0 0 0
0 0 0
2
2
0
0
0
r5
1
0
0
0
0
X1 X2 X3 X4 X5
N: the classification matrix
Then for all Y elicited by X2 …
X2 was presented 20 times
eliciting 20 neural responses
Metric Space Approach
2.
3.
4.
5.
Let r1, r2, … rC be response classes. Let N be the classification matrix.
Suppose that Yi was elicited by X. Assign Yi to response class r if
D[q](Yi , Y)z Y elicited by X_  1/z is the minimum over all Xk for k = 1, … , C.
Increment N,  by 1.
Repeat steps 3 and 4 for Yi for i = 1, … , T.
r
r1
r2
r3 r4
3 11 3
5 10 3
1 1 15
1 0 4
2 3 2
r5
2 1
2 0
1 2
2 13
5 8
X1 X2 X3 X4 X5
X
N: the classification matrix
Until we have repeated the process for all Y, including the ones elicited by XC.
In this example, the
T=100 responses (there
were 20 neural
responses elicited by
each stimulus) were
quantized or clustered
into 5 classes.
Metric Space Approach
Note that by normalizing the columns of the matrix N, we get the
decoder P(X|r).
Decode a neural response Y(t) by looking up its response class r in the
r1
r2
r3 r4
r5
.25
.44
.11
.17
.04
.42
.40
.11
.17
0
.08
.04
.56
.08
.08
.08
0
.15
.17
.54
.17
.12
.07
.42
.33
X1 X2 X3 X4 X5
P(X | r)
normalized matrix N:
Pros
•
•
•
•
The responses are clustered together.
P(X|r) estimates P(X|Y).
Considers the temporal pattern of the spikes.
Minimizing the cost function D[q] is intuitively a nice way to quantify
jitter in the spike trains. In information theory, this type of cost
function is called a distortion function.
• What to choose for q and z ? The values that maximize the transmitted
information from stimulus to response.
Cons
• D[q] imposes our assumptions of what is important in the structure of
spike trains (namely that shifts and spike insertions/deletions are
important).
• The space of possible stimuli for some systems is a very large space,
so it is not possible to present all possible stimuli in experiment.
So we’re looking for a decoding algorithm that …
 Produces an estimate of P(X|Y) as well as of X()|Y(t).
 Considers the temporal structure of the spike trains Y(t).
 Makes no assumptions about the linearity of decoding.
 Does not require that all stimuli be presented. That is, X(t) ought to be
randomly chosen and continuously varying (such as a GWN stimulus).
 Considers a population of neurons.
 Deals with the problem of never having enough data
by clustering the neural responses.
Tune in next week, on Friday at the CBS
seminar, to see how our method deals with these issues.
Download