Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University Outline Introduction The Problem Approaches in Neural Encoding – Spike Count Coding – Poisson Model – Wiener/Volterra series (Adrian and Zotterman 1926) (Fatt and Katz 1952) (1930, 1958) Approaches in Neural Decoding – Linear Methods • Linear Reconstruction (Reike et al 1997) • Vector Method (Georgopoulos et al 1983) • Optimal Linear Estimator (Abbot and Salinas 1994) – Gaussian Model – Metric Space (de Ruyter van Steveninck and Bialek 1988) (Victor and Purpura 1996) We want to understand the neural code. We seek an answer to the question: How does neural activity represent information about environmental stimuli? “The little fly sitting in the fly’s brain trying to fly the fly” Looking for the dictionary to the neural code … encoding stimulus X() response Y(t) decoding … but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X() Yi(t)| X() i = 1, 2, 3, 4 … but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X() Yi(t)| X() i = 1, 2, 3, 4 Neural coding is stochastic!! Similarly, neural decoding is stochastic: Y(t) Xi()|Y(t) i = 1, 2, … , 9 Probability Framework encoder: P(Y(t) |X()) environmental stimuli neural responses X() Y(t) decoder: P(X()|Y(t)) Information Theory tells us that if the relationship between X and Y can be modeled as an optimal communication channel, then … P(X(),Y(t)) neural responses Areas of high probability A coding scheme needs to be stochastic on a fine scale and almost deterministic on a large scale. Y X environmental stimuli How to determine a coding scheme? There are 2 methodologies in this search: encoding (determining P(Y|X)) decoding (determining P(X|Y)) As we search for a coding scheme, we proceed in the spirit of John Tukey: It is better to be approximately right than exactly wrong. Neural encoding …. Spike Count Coding (Adrian and Zotterman 1926) Encoding as a non-linear process: the response tuning curve Response Amplitude Stimulus Amplitude • Hanging weights from a muscle ( Adrian 1926) Stimulus Amplitude: mass in grams • Moving a pattern across the visual field of a blowfly (de Ruyter van Steveninck and Bialek, 1988) Stimulus Amplitude: average velocity (omms/s) in 200ms window In Spike Count Coding, response amplitude is the (spikes count)/time Spike Count Coding P(Y |X) spikes/time Stimulus Amplitude An experimenter repeats each stimulus many times to estimate the encoder P(Y | X). Spike Count Coding P(Y |X) spikes/time P(Y) Stimulus Amplitude P(X) And now you can get P(X | Y) from Bayes rule 1 P( X | Y ) P(Y | X ) P( X ) P(Y ) Spike Count Coding the directional tuning curve (Miller, Jacobs and Theunissen 1991) The response tuning curves for the 4 interneurons in the cricket cercal sensory system. The preferred directions are orthogonal to each other. Cons • Counting spikes/time neglects the temporal pattern of the spikes of the neural response, which – Potentially decreases the information conveyed – Known short behavioral decision times imply that many neurons make use of just a few spikes • Sensory systems respond to stimulus attributes that are very complex. That is, the space of possible stimuli for some systems is a very large (infinite dimensional) space. Hence, it is not feasible to present all possible stimuli in experiment. • Some neural systems do seem to encode certain stimulus attributes by (number spikes)/time. • Can work well if the stimulus space is small (e.g. when coding direction in cricket cercal sensory system). Abbot, 2001 Pros Poisson Model Electrical multi-site stimulation of chicken retina in vitro: (Fatt and Katz 1952) P-type afferent responses in the electric fish to transdermal potential stimulus (Xu, Payne and Nelson 1996) (Stett et al., 2000) These (normalized) histograms give the probability per unit time of firing given that x(t) occurred: r[t |X()=x()] Poisson Model If we assume that the spikes are independent from each other given a stimulus X(), then we can model P(Y | X) as an inhomogenuous (dependent on time) Poisson process: If Y is the (spike count)/time, then P(Y | X()) = Poisson( r[t |X()] dt) If Y(t) is a spike train, then P(Y(t) |X()) = Poisson-like(r[t |X()]) T 1 N r[ti | X ( )] exp r[t | X ( )] N ! i 1 0 Poisson Model Abbot, 2001 When is the Poisson model a good one to try? Examine the mean and variance of the spike counts/time given each stimulus. For a Poisson process, they should be equal: Pros • Have an explicit form of P(Y|X) • A Poisson process is a basic, well studied process Cons (Stett et al., 2000) • Counting spikes/time neglects the temporal pattern of the spikes of the neural response. • Assuming that spikes are independent neglects the refractory period of a neuron. This period must be ‘small’ compared to mean ISI in order for Poisson model to be appropriate. To deal with this: – Berry and Meister (1998) have proposed a Poisson model that includes the refractory period. – Emory Brown (2001) uses a mixture of Gamma and inverse Gaussian models. • The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiments to estimate r(t | X()). Wiener / Volterra Series • (1930, 1958) The Taylor series for a function y = g(x): y(x) = y(x0) + y’(x0 ) (x - x0) + ½ y’’ (x0)(x - x0)2 + … = f0 + f1(x - x0) + f2(x - x0)2 + … • The Volterra series is the analog of a Taylor series for a functional Y(t) = G[X(t)]: Y(t) = f0 + d1 f1(1)X(t - 1) + d1 d2 f2(1, 2)X(t - 1) X(t - 2) + … How to compute { fi } ?? • Wiener reformulated the Volterra series in a way so that the new coefficient functions or kernels could be measured from experiment. • There are theorems that assure that this series with sufficiently many terms provides a complete description for a broad class of systems. • The first Wiener kernel is proportional to the cross correlation of stimulus and response: f1 = <XY>/SX This is proportional to the spike triggered average when Y(t) is a spike train. Wiener / Volterra Series Reike et al. 1997: recordings from H1 of the fly Constructing the neural response (here, Y is the firing rate) from the first Wiener kernel … Actual response Predicted response seems to be able to capture slow modulations in the firing fate. Pros • Computing the first Wiener kernel is inexpensive. • Not much data is required. Cons Slice of a fly brain • Although it is theoretically possible to compute many terms in the Wiener series, practical low order approximations, of just f0 and f1 for example, don’t work well in practice (i.e. coding is NOT linear). • Wiener series is for a continuous function Y(t). This is fine when Y(t) = r[t |X()], the firing rate. But how do we construct a series to model the discrete spiking of neurons? • The Wiener series gives a specific Y(t) | X(). What is P(Y | X)? In principle, one can do a lot of repeated experiments to estimate P(Y | X). In practice, the preparation dies on you before enough data is collected. Neural Decoding Why Decoding? • Encoding looks non-linear in many systems. Maybe decoding is linear and hence easier. • It is conceivably easier to estimate P(X|Y) over an ensemble of responses {Y}, since {Y} live in a much smaller space than the {X}. Linear Reconstruction Method (Reike et al 1997) • Consider a linear Volterra approximation (X Y): X(t)= K1()Y(t - ) d = i K1(t – ti ) if we represent the discrete spike train as Y(t) = i (t – ti ), where the ith spike occurs at ti. • How to determine K1 ? Minimize the mean squared error: minK() |Xobserved(t) - i K(t – ti )|2dt X • F [ X ( )] exp( it j ) j K1 F 1 | exp( it j ) |2 j Fourier transform of average stimulus surrounding a spike Power spectrum of the spike train Linear Reconstruction Method Reike et al. 1997: recordings from H1 of the fly Reconstructing the stimulus with the linear filter K1: Actual stimulus Predicted stimulus Pros • It’s cheap. • The temporal pattern of the spikes is considered. • Even though encoding is non-linear (and hence the failure of the Wiener linear approximation), decoding for some neurons seems linear. Cons • Only one neuron is modeled. • No explicit form of P(X|Y) Other Linear Methods For populations of neurons Assumptions: 1. Yi = number of spikes from neuron i in a time window. 2. X(t) is randomly chosen and continuously varying. • Vector Method (Georgopoulos et al 1983) X(t) = i Yi Ci where Ci is the preferred stimulus for neuron i. • Optimal Linear Estimator (OLE) (Abbot and Salinas 1994) X(t) = i Yi Di where Di is chosen so that dt |Xobserved(t) - i Yi Di |2 Y X is minimized so that Di = j Qij-1 Lj where Lj is center of mass of the tuning curve for cell i Qij is the correlation of Yi and Yj (Abbot and Salinas 1994) cricket cercal sensory system Other Linear Methods Difference between the stimulus reconstructed by the Vector and OLE methods and the true stimulus presented. Evidence suggests that the cricket can code direction with an accuracy of up to 5 degrees. This data suggests that these algorithms decode direction as well as the cricket does in this experiment. Pros • Vector Method – It’s cheap – This method is ideal when the tuning curve is a (half) cosine. – Has small error if Ci are orthogonal • OLE – Has smallest average MSE of all linear methods over a population of neurons Cons • Vector Method – It is not always obvious what the preferred stimulus Ci is for generic stimuli. – Does not work well if the Ci are not uniformly distributed (orthogonal) – Requires a lot of neurons in practice • Counting spikes/time neglects the temporal pattern of the spikes of the neural response Y(t). • No explicit form of P(X|Y)! Gaussian Model (de Ruyter van Steveninck and Bialek 1988) • In experiment, let X(t) be a randomly chosen, continuously varying (GWN) • Approximate P(X|Y) with a Gaussian with meanX|Y and covarianceX|Y computed from data. Reike et al. 1997 recordings from H1 of the fly meanX|Y Y(t) Pros • The temporal pattern of the spikes is considered. • We have an explicit form for P(X|Y). Why should P(X|Y) be Gaussian? This choice is justified by Jaynes maximum entropy principle: of all models that satisfy a given set of constraints, choose the one that maximizes the entropy. For a fixed mean and covariance, the Gaussian is the maximum entropy model. Cons • An inordinate amount of data is required to obtain good estimates of covarianceX|Y=y over all observed y(t). – One way to deal with the problem of not having enough data is to cluster the responses together and estimate a gaussian model for each response cluster. Metric Space Approach (Victor and Purpura 1996) We desire a rigorous decoding method that … • Estimates P(X|Y). • Takes the temporal structure of the spikes of the neural responses Y(t) into account. • Deals with the insufficient data problem by clustering the responses. Abbot, 2001 Assumptions: • The stimuli, X1, X2, … , XC , must be repeated multiple times. • There are a total of T neural responses: Y1 , …, YT . Metric Space Approach The Method: 1. Given two spike trains, Yi and Yj , the distance between them is defined by the metric D[q](Yi , Yj), the minimum cost required to transform Yi into Yj via a path of elementary steps: a . adding or deleting a spike (cost = 1) b. shifting a spike in time by t (cost = q ·|t|) – 1/q is a measure of the temporal precision of the metric. – D[q=0](Yi , Yj) is just the difference in the number of spikes between the spike trains Yi and Yj. Decoding based on this metric is just counting spikes. Yi – D[q=](Yi , Yj) gives infinitesimally precise timing of the spikes. Yj Metric Space Approach The Method: 1. Given two spike trains, Yi and Yj , the distance between them is defined by the metric D[q](Yi , Yj), the minimum cost required to transform Yi into Yj via a path of elementary steps: a . adding or deleting a spike (cost = 1) b. shifting a spike in time by t (cost = q ·|t|) – 1/q is a measure of the temporal precision of the metric. – D[q=0](Yi , Yj) is just the difference in the number of spikes between the spike trains Yi and Yj. Decoding based on this metric is just counting spikes. Yi – D[q=](Yi , Yj) gives infinitesimally precise timing of the spikes. Yj Metric Space Approach 2. Let r1, r2, … rC be response classes. Metric Space Approach Let r1, r2, … rC be response classes. Let N be the classification matrix. r r1 r2 r3 r4 r5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X1 X2 X3 X4 X5 X N: the classification matrix 2. Metric Space Approach Let r1, r2, … rC be response classes. Let N be the classification matrix. Suppose that Y1 was elicited by X1. Assign Y1 to response class r3 if D[q](Y1 , Y)z Y elicited by X_ 3 1/z is the minimum over all Xk for k = 1, … , C. r1 r2 r3 r4 r5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X1 X2 X3 X4 X5 N: the classification matrix 2. 3. Metric Space Approach 4. Let r1, r2, … rC be response classes. Let N be the classification matrix. Suppose that Y1 was elicited by X1. Assign Y1 to response class r3 if D[q](Y1 , Y)z Y elicited by X_ 3 1/z is the minimum over all Xk for k = 1, … , C. Increment N1, 3 by 1. r1 r2 r3 r4 r5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X1 X2 X3 X4 X5 N: the classification matrix 2. 3. Metric Space Approach 4. 5. Let r1, r2, … rC be response classes. Let N be the classification matrix. Suppose that Yi was elicited by X. Assign Yi to response class r if D[q](Yi , Y)z Y elicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. Increment N, by 1. Repeat steps 3 and 4 for Yi for i = 1, … , T. r1 r2 r3 r4 r5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X1 X2 X3 X4 X5 N: the classification matrix 2. 3. Metric Space Approach 2. 3. 4. 5. Let r1, r2, … rC be response classes. Let N be the classification matrix. Suppose that Yi was elicited by X. Assign Yi to response class r if D[q](Yi , Y)z Y elicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. Increment N, by 1. Repeat steps 3 and 4 for Yi for i = 1, … , T. r1 r2 r3 r4 3 11 3 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 X1 was presented 20 times eliciting 20 neural responses r5 1 0 0 0 0 X1 X2 X3 X4 X5 N: the classification matrix After repeating the process for all Y elicited by X1 … Metric Space Approach 2. 3. 4. 5. Let r1, r2, … rC be response classes. Let N be the classification matrix. Suppose that Yi was elicited by X. Assign Yi to response class r if D[q](Yi , Y)z Y elicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. Increment N, by 1. Repeat steps 3 and 4 for Yi for i = 1, … , T. r1 r2 r3 r4 3 11 3 5 10 3 0 0 0 0 0 0 0 0 0 2 2 0 0 0 r5 1 0 0 0 0 X1 X2 X3 X4 X5 N: the classification matrix Then for all Y elicited by X2 … X2 was presented 20 times eliciting 20 neural responses Metric Space Approach 2. 3. 4. 5. Let r1, r2, … rC be response classes. Let N be the classification matrix. Suppose that Yi was elicited by X. Assign Yi to response class r if D[q](Yi , Y)z Y elicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. Increment N, by 1. Repeat steps 3 and 4 for Yi for i = 1, … , T. r r1 r2 r3 r4 3 11 3 5 10 3 1 1 15 1 0 4 2 3 2 r5 2 1 2 0 1 2 2 13 5 8 X1 X2 X3 X4 X5 X N: the classification matrix Until we have repeated the process for all Y, including the ones elicited by XC. In this example, the T=100 responses (there were 20 neural responses elicited by each stimulus) were quantized or clustered into 5 classes. Metric Space Approach Note that by normalizing the columns of the matrix N, we get the decoder P(X|r). Decode a neural response Y(t) by looking up its response class r in the r1 r2 r3 r4 r5 .25 .44 .11 .17 .04 .42 .40 .11 .17 0 .08 .04 .56 .08 .08 .08 0 .15 .17 .54 .17 .12 .07 .42 .33 X1 X2 X3 X4 X5 P(X | r) normalized matrix N: Pros • • • • The responses are clustered together. P(X|r) estimates P(X|Y). Considers the temporal pattern of the spikes. Minimizing the cost function D[q] is intuitively a nice way to quantify jitter in the spike trains. In information theory, this type of cost function is called a distortion function. • What to choose for q and z ? The values that maximize the transmitted information from stimulus to response. Cons • D[q] imposes our assumptions of what is important in the structure of spike trains (namely that shifts and spike insertions/deletions are important). • The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiment. So we’re looking for a decoding algorithm that … Produces an estimate of P(X|Y) as well as of X()|Y(t). Considers the temporal structure of the spike trains Y(t). Makes no assumptions about the linearity of decoding. Does not require that all stimuli be presented. That is, X(t) ought to be randomly chosen and continuously varying (such as a GWN stimulus). Considers a population of neurons. Deals with the problem of never having enough data by clustering the neural responses. Tune in next week, on Friday at the CBS seminar, to see how our method deals with these issues.