Improved characterization of neural and behavioral response properties using point-process state-space framework Anna Alexandra Dreyer Harvard-MIT Division of Health Sciences and Technology Speech and Hearing Bioscience and Technology Program Neurostatistics Research Laboratory, MIT PI: Emery Brown, M.D., Ph.D. September 27, 2007 Action potentials as binary events • Action potentials (spikes) are binary events • Cells using timing and frequency of action potentials to communicate with neighboring cells • Most cell emit action potentials spontaneously in the absence of stimulation • Models should begin with spikes to most accurately describe the response Figure from laboratory of Mark Ungless Point Process Framework: Definition of Conditional Intensity Function • Given a recording interval of [0,T), the counting process N(t) represents the number of spikes that have occurred on the interval [0,t). • A model can be completely characterized by the conditional intensity function (CIF) that defines the instantaneous firing rate at every point in time as: (t | H ) lim Pr[ N (t ) N (t ) 1 | H (t )] 0 • where H(t) represents the autoregressive history until time t. Brown et al., 2003; Daley and Vere-Jones, 2003; Brown, 2005 Joint Probability of Spiking (Likelihood function) • Discretize time on duration [0,T) into B intervals. • As becomes increasingly small (tb|Ψ,Hk), where Ψ are parameters and Hb is the autoregressive history up to bin b, approaches the probability of seeing one event in the binwidth of . • If we select a sufficiently small binwidth ,, such that the probability of seeing more than one event in this binwidth approaches 0, the joint probability can be written as the product of Bernoulli independent events (Truccolo, et al., 2005): B • Pr[ N1:B | ] ( (tb | H b , )) Nb (1 (tb | H b , ) )1 Nb o( J ) b 1 where o(J) represents the probability of seeing two or more events on the interval (tb-1,tb]. Truccolo et al., 2005 An example of using PP models to analyze auditory data: Experimental Paradigm • Recordings of action potentials to 19 stimulus levels for multiple repetitions of the stimulus • Need to develop encoding model to characterize responses to each stimulus level as well as the noise in the system • Inference: find the lowest stimulus level for which the response is more than system noise • Given new responses from the same cell, need to decode the stimulus from which response originated Data from Lim and Anderson (2006) Modeling example of cortical response across stimulus levels • Response characteristics include – Autoregressive components – Temporal and rate-dependent elements • Typical autoregressive components Does NOT capture To have adequate goodness-of-fit and predictive power, must capture these elements from raw data Point process state space framework • Instantaneous Firing Intensity Model: – The firing intensity in each Δ=1ms bin, b, is modeled as a function of the past spiking history, Hl,k,bΔ and of the effect of the stimulus – Observation Equation J R l ,k (b | l , , H l ,k ,b ) exp l ,r gr (b) exp j nl ,k ,b j r 1 Conditional firing intensity Stimulus effect j 1 Past spiking history effect – State equation l 1,r l ,r l 1,r where εl+1,r is a Gaussian random vector Computational methods developed with G Czanner, U Eden, E Brown Encoding and Decoding Methodology • Estimation/Encoding/Inference – The Expectation-Maximization algorithm used – Monte Carlo techniques to estimate confidence bounds for stimulus effect • Goodness-of-fit – KS and autocorrelation of rescaled times • Decoding and response property inference Expectation-Maximization algorithm • Used in computing maximum likelihood (ML) parameters in statistical models with hidden variables or missing data. • The algorithm consists of two steps – expectation (E) step where the expectation of the complete likelihood is estimated – maximization (M) step when the maximum likelihood of the expectation is taken. • As the algorithm progresses, the initial estimates of the parameters are improved by taking iterations until the estimate converges on the maximum likelihood estimator. Dempster et al., 1977; McLachlan and Krishnan, 1997; Pawitan, 2001 SS-GLM model of stimulus effect • Level dependent stimulus effect captures many phenomena seen in data Stimulus Effect (spikes/s) – Increase of spiking with level – Spread of excitation in time Level Number Time since stimulus onset (ms) • Removes the effect of autoregressive history which is system (not stimulus dependent) property Threshold inference based on all trials and all levels • Define threshold as the first stimulus level for which we can be reasonably (>0.95) certain that the response at that level is different from the noise and continues to differ for higher stimulus levels • For this example, we define threshold as level 8 • Compare to common methodology of rate-level function (level 11) Dreyer et al., 2007; Czanner et al., 2007 Goodness-of-fit assessment • The KS plot fits close to the 45 degree line indicating uniformity of rescaled spike times • The autocorrelation plot implies that Gaussian rescaled spike times are relatively uncorrelated, implying independence. • In contrast, the KS plots for the underlying rate-based models provide a very poor fit to the data Johnson & Kotz, 1970; Brown et al, 2002; Box et al., 1994 Decoding based on a single trial • Decoding of new data based on encoding parameters ^ ^ ^ ^ ( 0 , , ) • Given a spike train, estimate the likelihood that the spike train, nl*, came from any stimulus, sl’, in our encoding model B ^ ^ Lik ( sl ' | nl ) [ (b | H b , l ' )]n [1 (b | H b , l ' ) ]1 n b 1 b b * • Calculate the likelihood for all stimuli, s1:L Lik T (s1:L | nl* ) [ Lik (s1 | nl* ) Lik (s2 | nl* ) ... Lik (sL | nl* )] • Take the most likely level as the decoded stimulus Lik T MAX (s1:L | nl* ) max( Lik T (s1:L | nl* )) Single-trial threshold inference using decoding based on ML more sensitive around than ROC based on number of spikes The area under ROC curve specifies the probability that, when two responses are drawn, one from a lower level and one from a higher level, the algorithm assigns a larger value to the draw from a higher level. Decoding across multiple trials improves performance Neural Model Conclusions • This methodology has potential for characterizing the behavior of any noisy system where separation of signal from noise is important in predicting responses to future stimuli Bayesian techniques – the alternative to frequentist estimation • Use Bayesian sampling techniques to: – Estimate behavioral responses to auditory stimuli – Apply methodology used for auditory encoding models to learning experiments to discover the neural mechanisms that encode for behavioral learning in the Basal Ganglia. In collaboration with B. Pfingst, A. Smith, A. Graybiel, E. Brown Bayesian sampling methodology • Goal is to compute the posterior probability density of the parameters and the hidden state given the data prior posterior p( , x | N ) comple data likelihood p( ) p( x | ) p( N | , x) p( N ) normalization • Use Monte Carlo Markov Chain (MCMC) methods to compute the posterior probability by simulating stationary Markov Chains. • MCMC methods provide approximate posterior probability density for parameters • Can compute credible intervals (analogous to confidence intervals for unknown parameter estimates) for parameter estimates Gilks et al., 1996; Congdon, 2003 References • Box GEP, Jenkins GM, Reinsel GC. Time series analysis, forecasting and control. 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1994. • Brown EN. Theory of Point Processes for Neural Systems. In: Chow CC, Gutkin B, Hansel D, Meunier C, Dalibard J, eds. Methods and Models in Neurophysics. Paris, Elsevier, 2005, Chapter 14: 691-726. • Brown EN, Barbieri R, Eden UT, and Frank LM. Likelihood methods for neural data analysis. In: Feng J, ed. Computational Neuroscience: A Comprehensive Approach. London: CRC, 2003, Chapter 9: 253-286. • Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. Time-Rescaling theorem and its application to neural spike train data analysis. Neural. Comput 2002: 14:325-346. • Congdon P. Applied Bayesian Modelling. John Wiley and Sons Ltd., Chichester, United Kingdom, 2003. • Daley D and Vere-Jones D. An Introduction to the Theory of Point Process. 2nd ed., Springer-Verlag, New York, 2003. • Czanner G, Dreyer AA, Eden UT, Wirth S, Lim HH, Suzuki W, Brown EN. Dynamic Models of Neural Spiking Activity. IEEE Conference on Decision and Control. 2007 Dec 12. • Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1977, 39(1): 1-38. • Dreyer AA, Czanner G, Eden UT, Lim HH, Anderson DJ, Brown EN. Enhanced auditory neural threshold detection using a point process state-space model analysis. Computational Systems Neuroscience Conference (COSYNE). February, 2007 • Gilks WR, Richardson S, Spiegelhalter DJ. Monte Carlo Markov chain in practice. New York: Chapman and Hall/CRC, 1996. • Johnson A, Kotz S. Distributions in Statistics: Continuous Univariate Distributions. New York: Wiley, 1970. • Lim HH, Anderson DJ. Auditory cortical responses to electrical stimulation of the inferior colliculus: Implications for an auditory midbrain implant. J. Neurophysiol. 2006, 96(3): 975-88. • McLachlan GJ and Krishnan T. The EM Algorithm and Extensions. John Wiley & Sons, 1997. • Pawitan Y. In All Likelihood: Statistical Modeling and Inference Using Likelihood. New York: Oxford Univ. Press, 2001. • Truccolo, W. Eden, U.T., Fellows, M.R., Donoghue, J.P. and Brown, E.N. A point process framework for relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. J. Neurophysiol. 2005, 93: 1074-1089.