Neural Coding and Decoding Albert E. Parker Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller Zane Aldworth Problem: Determine a coding scheme: How does neural ensemble activity represent information about sensory stimuli? Our Approach: • Construct a model using Probability and Information Theory • Optimize the model to cluster the neural responses which gives an approximation of a coding scheme given the available data. • Apply our method to the cricket cercal system Neural Coding and Decoding. Goal: What conditions must a coding scheme satisfy? Demands: • An animal needs to recognize the same object on repeated exposures. Coding has to be deterministic at this level. • The code must deal with uncertainties introduced by the environment and neural architecture. Coding is by necessity stochastic at this finer scale. Major Problem: The search for a coding scheme requires large amounts of data How to determine a coding scheme? Idea: Model a part of a neural system as a communication channel using Information Theory. This model enables us to: • Meet the demands of a coding scheme: o Define a coding scheme as a relation between stimulus and neural response classes. o Construct a coding scheme that is stochastic on the finer scale yet almost deterministic on the classes. • Deal with the major problem: o Use whatever quantity of data is available to construct coarse but optimally informative approximations of the coding scheme. o Refine the coding scheme as more data becomes available. • Investigate the cricket cercal sensory system. A Stochastic Map Q(Y|X) input output Y X The relationship between X and Y is completely described by the conditional probability Q. Realizations of X and Y in neural coding Q(Y=y|X=x) stimulus sequence X=x response sequence Y=y Determining Stimulus/Response Classes Given a joint probability p(X,Y): response sequences 1 2 Y 3 4 X stimulus sequences response sequences Stimulus and Response Classes 1 Y 2 3 Distinguishable stimulus/response classes 4 X stimulus sequences Information Theoretic Quantities A quantizer or encoder, Q, relates the environmental stimulus, X, to the neural response, Y, through a process called quantization. In general, Q is a stochastic map Q( y| x ): X Y The Reproduction space Y is a quantization of X. This can be repeated: Let Yf be a reproduction of Y. So there is a quantizer q ( y f | y ): Y Yf Use Mutual information to measure the degree of dependence between X and Yf. q ( y f | y ) p ( x, y ) y I ( X , Y f ) q ( y f | y ) p ( x, y ) log y, y f p ( x ) p ( y ) q ( y f | y ) y Use Conditional Entropy to measure the self-information of Yf given Y H (Yf |Y ) p( y ) q ( y y,y f f | y ) log(q ( y f | y )) The Model Problem: To determine a coding scheme between X and Y requires large amounts of data Idea: Determine the coding scheme between X and Yf, a clustering (reproduction) of Y, such that: Yf preserves as much information (mutual information) with X as possible and the self-information (entropy) of Yf |Y is maximized. That is, we are searching for an optimal mapping (quantizer): q * ( y f | y) : Y Y f that satisfies these conditions. Justification: Jayne's maximum entropy principle, which states that of all the quantizers that satisfy a given set of constraints, choose the one that maximizes the entropy. Equivalent Optimization Problems Maximum entropy: maximize F(q(yf|y)) = H(Yf|Y) constrained by I(X;Yf ) Io Io determines the informativeness of the reproduction. Deterministic annealing (Rose, ’98): maximize F(q(yf|y)) = H(Yf|Y) + I(X,Yf ). Small favor maximum entropy, large : maximum I(X,Yf ). Augmented Lagrangian with Newton CG line search Implicit solution: qy f | y e q DI q p y y , y e q DI q p y yf Simplex Algorithm: maximize I(X,Yf ) over vertices of constraint space Application to synthetic data (p(X,Y) is known) Random clusters Application to Real Data (the probabilities p(y) and p(x,y) are NOT known) • p(x,y) cannot be estimated directly for rich stimulus sets - there is not enough data. • I(X,Yf )=H(X) - H(X|Yf ). Only H(X|Yf ) depends on the q(y f |y). So an upper bound of H(X|Yf ) produces a lower bound of I(X,Yf ). • • H X | Y f E y f H X | Y f y f is bounded by a Gaussian: 1 log 2e X det C where 2 X |y H X | Y f E y f H G ( X | y f ) E y f 2 C X | y f is the conditional covariance of the stimulus. - E E CX | y f p( x | y f ) x y f x y f x CX | y f Ey| y f CX | y y y T y| y f f T y y| y f y T which is written explicitly as a function of the quantizer (through p(y|y f )). We estimate the mean and and covariance matrices. The Optimization Problem for Real Data Maximum entropy: maximize F(q(yf|y)) = H(Yf|Y) constrained by H(X)-HG(X|Yf ) Io Io determines the informativeness of the reproduction. ? ? Modeling the cricket cercal sensory system as a communication channel Nervous system Signal Communication channel Wind Stimulus and Neural Response in the cricket cercal system Neural Responses (over a 30 minute recording) caused by white noise wind stimulus. X Time in ms. A t T=0, the first spike occurs Y Some of the air current stimuli preceding one of the neural Neural Responses (these are all doublets) for a 12 ms window responses T, ms Quantization: A quantizer is any map f: Y -> Yf from Y to a reproduction space Yf with finitely many elements. Quantizers can be deterministic y f y or qy f | y probabilistic refined Y Yf Y Applying the algorithm to cricket sensory data. Class 1 Yf Class 2 Class 1 Class 2 Yf Class 3 Y Conclusions We • model a part of the neural system as a communication channel. • define a coding scheme through relations between classes of stimulus/response pairs. - Coding is probabilistic on the individual elements of X and Y. - Coding is almost deterministic on the stimulus/response classes. To recover such a coding scheme, we • propose a new method to quantify neural spike trains. - Quantize the response patterns to a small finite space (Yf). - Use information theoretic measures to determine optimal quantizer for a fixed reproduction size. - Refine the coding scheme by increasing the reproduction size. • present preliminary results with cricket sensory data.