Homework 1 Solution (sketch)

Homework 1 Solution (sketch) Ngoc Mai Tran Last updated: December 22, 2015 1. Prove the following lemma. (Lecture reference: section 2.1, on deriving the Wiener-Hopf equation). Lemma 1 (Correlation equation) Suppose x(·) and y(·) are two zero-mean random processes, jointly stationary of order two. Let h be such that Z T ŷ(T + λ|T ) = h(T, v)x(v) dv −∞ is the linear least squares estimate for y(T +λ). Then h is time-invariant, and satisfies Z ∞ h(v)Cxx (t − v) dv, t > 0. (1) Cyx (t + λ) = Cxy (−(t + λ)) = 0 where Cyx (t + λ) = E(y(t + λ)x(0)), and Cxx (t − v) = E(x(t − v)x(0)). Proof: By the orthogonality property, y(t + λ) − ŷ(t + λ) ⊥ y(σ). Then Z t E(y(t + λ)x(σ)) = E Z t h(t, τ )x(τ ) dτ x(σ) = −∞ h(t, τ )E (x(τ )x(σ)) dτ. −∞ Rewrite in terms of the auto- and cross-correlation functions and change variable, we get Z t Cxy (t + λ − σ) = h(t, τ )Cxx (τ − σ) dτ σ ∈ (−∞, t) −∞ Z ∞ Cxy (t + λ) = h(t + σ, t + σ − τ )Cxx (t − τ ) dτ t ∈ (0, ∞). 0 1 Since Cxy (t+λ) and Cxx (t−τ ) do not depend on σ, we conclude that h(t+σ, t+σ −τ ) is also not dependent on σ. So h is time-invariant, i.e. h(t+σ, t+σ −τ ) = h(0, −τ ) =: h(τ ). So h solves Z ∞ Cxy (t + λ) = h(τ )Cxx (t − τ ) dτ, t > 0. 0 2. The matlab file c1p8.mat on http://www.gatsby.ucl.ac.uk/~dayan/book/ exercises/c1/data/c1p8.mat contains data collected and provided by Rob de Ruyter van Steveninck from a fly H1 neuron responding to an approximate white-noise visual motion stimulus. Data were collected for 20 minutes at a sampling rate of 500Hz. In the file, rho is a vector that gives the sequence of spiking events or nonevents at the sampled times (every 2ms). When an element of rho is one, this indicates the presence of a spike at the corresponding time. Zero means no spike. The variable stim gives the sequence of stimulus values at the sampled times. Suppose we want to fit the causal linear filter model r̂(t|t) = h ∗ s(t) to the data, where r : [0, T ] → R is the spike rate, s : [0, T ] → R is the stimulus. Code and plots: to be uploaded on website. We sketch the solution below. Plot your estimate of h. Here we have N = 1 (one trial), and λ = 0 (filtering). By example 6 in lecture, an estimator for h based on the spike-triggered average is Ĉrs (t)hri σ2 P i s(i), hri = n/T is the average spike rate, h(t) = P where σ 2 = T1 i (s(i)2 − s̄), with s̄ = P and Ĉrs (t) = T1 nj=1 s(tj − t). 1 T Generate spike sequences according to the (inhomogeneous) Poisson point process with your value for r. First we compute r̂ = h ∗ s. Then generate spike ρ by ρt ∼ Bernoulli(r̂(t)), ρt ’s independent. Compute the average correlation between your synthetic spikes and the observed spike For large M , generate spike sequences ρ1 , . . . , ρM according to the above model. Let ρo be the observed spike sequence. For each λ ∈ {0, 1, . . . , T }, compute 1X 1X 1 X m Cρρo (λ) = E(ρ(t + λ))ρo (t) ≈ ρ (t + λ)ρo (t). T t T t M m 2 Do cross-validation Split the data into 5 time segments of equal length, fit your model on 4 of them using steps 1 and 2 above, and do prediction on the remaining segment using step 3. The 80% of data you use to fit the model is called the training set. The remaining 20% is called the test set. Report the overall average correlation for the test sets. 3b. The paper http://bethgelab.org/media/publications/BerensEtAl2012. pdf described an experiment performed on neurons in the visual cortex of monkey. The goal was to study the population code for orientation in the visual cortex. Describe the experiment: The stimulus were static (kept constant over trial) sine wave gratings. There were 17 sessions. In each session, there are 8 different stimulus orientations and two different contrasts. They collected 10 to 85 trials per stimulus condition. The experiment was performed on two alert monkeys (ie: not anesthesized). In each session between 6 and 20 neurons were recorded. Spike trains were binned at resolution of 10ms. The goal is to decode orientation from the spike trains of the population of neurons. Apply the logistic regression model to a single neuron. Suppose that the vector X[t] in equation (1) of the paper is just a single random variable X ∈ R, so 1 , P(θ = θ1 |X) = −wX−w 0 1+e for w, w0 ∈ R. Suppose that the number of spikes X is Poisson with mean f (θ), where f is the neuron’s tuning curve. That is, e−f (θ) f (θ)x . x! Give a formula for w in terms of f (θ1 ) and f (θ2 ). P(X = x|θ) = Assume uniform prior distribution, that is, P(θ = θ1 ) = P(θ = θ2 ) = 1/2. Note that P(θ = θ2 |X = x) = 1 − P(θ = θ1 |X = x) = e−wx−w0 . 1 + e−wx−w0 So by the model, P(θ1 |x) = ewx+w0 . P(θ2 |x) On the other hand, P(θ1 |x) P(x|θ1 )P(θ1 ) = P(θ2 |x) P(x|θ2 |x)P(θ2 ) P(x|θ1 ) = P(x|θ2 |x) x f (θ1 ) −f (θ1 )+f (θ2 ) =e f (θ2 ) 3 by Bayes’ rule by uniform prior by Poisson model. Thus, w0 = f (θ2 ) − f (θ1 ), and w = log f (θ1 ) − log f (θ2 ). How does this compare to equation (2) of the paper? Equation (2) seems dubious. The main problem is that they did not define what f (θ) means. In some papers in the literature, one would write f (θ) for our log f (θ) - but even with this change of variable equation (2) seems incorrect. Not to mention that they missed the constant term w0 in writing down the logistic regression. It could also be that they did not use the uniform prior (which is very unlikely, however). I could not find a derivation in the reference Ma et al paper. Moral of the story: define your notations and check your references! 3/ What are the tuning curves used in the paper? There are two populations of neurons, indexed by p = 1, 2. From the reference Berens et al, the cosine-like tuning curve for orientation is kp 1 1 + cos(θ − φi ) , fi (θ) = λ1 + λ2 2 2 where θ is the stimulus orientation, φi is the preferred orientation of neuron i, kp is a parameter that controls the width of the tuning curves for population p, and λ1 and λ2 are normalization constants that set fi (θ) to be in the range [5, 50]. The tuning curve for contrast for neuron i in population p is fi (c) = cn , cnp + cn where cp is the 50% contrast levell (called the semisaturation contrast) for population p. 4/ Why is the statistic d0i in equation (4) of the paper a measure of the decoding performance of neuron i? Let Xi be the number of spikes of neuron i. The higher d0i is, the further apart the mean of the distributions P(Xi |θ = θ1 ) and P(Xi |θ = θ2 ), and thus the easier it is to discriminate θ1 from θ2 based on Xi . The statistic d0i is called the discriminability. 5/ The authors defined the parameter d0 of the population decoder via 1 − Φ(d0 /2) = classification error of the population decoder. Suppose the population only has a single neuron, so d0 = d0i given in equation (4). In this case, under what assumption(s) would 1 − Φ(d0 /2) be the classification error the decoder? 4 Let X be the number of spikes of the neuron. Under the assumption that P(X|θ1 ) is Gaussian with mean µ1 , variance λ21 , and P(X|θ2 ) is Gaussian with mean µ2 , variance λ22 , then 1 − Φ(d0 /2) is the classification error of the ML decoder. (Prove by deriving the ML decoder of this setup). 6/ Consider the results in Figure 1B-E. Suppose that the logistic regression decoder is what is implemented in the monkey’s brain. How long does it take for the monkey’s brain to distinguish between two orientations? Does it find the task easier with high or low contrast? Are there much variations between sessions? It takes ≈ 80ms for the monkey to distinguish between two orientations with 75% accuracy of the peak performance. Peak performance was reached after ≈ 120ms, which is ≈ 80% correct, depending on ∆θ, the difference in the orientation. The task is easier with high contrast and harder with low contrast, but not by much. 7/ Read the Discussion section. Give a summary of the authors’ opinion to their own questions, which are the following: • How do the parameters (ie: the vector w) of the instantaneous decoder change over time? • How contrast-invariant is the population code? • Are correlations important for decoding? • Is the Poisson assumption in neural coding reasonable for this data? In this paper, the authors considered orientation decoding under different contrasts by a population of neurons in V1 of the macaque monkey. They fitted the monkey’s performance using a logistic regression decoder. They found that: • the decoder parameters remain largely constant over time. • the decoder parameters also remain largely the same under different contrasts. Decoders fitted on one contrast level and used for another contrast achieves ≈ 80% of the optimal performance. A single neuron response to tuning is known to be independent of contrast. This property is called contrast-invariant. This study suggests that the population response to tuning is also contrast-invariant. • the decoder is fitted with the assumption that the noise in different neurons are uncorrelated. In this study, they found the noise correlation between neurons is ≈ 0.01. The decoder performed quite well, suggesting that the uncorrelated assumption is ok for this case. 5 • From Ma et al 2010, the ‘Poisson-like’ assumption here refers to the assumption that the distribution of the population firing rate r with given orientation θ and contrast c is of the form P(r|θ, c) = ϕ(θ, c)eh(θ)·r . For example, if the neurons are independent and for neuron i, P(ri |θ, c) is Poisson with mean only dependent on θ, then the joint distribution of the firing rate for population of neurons fit the above equation. Hence all distributions of the above form is called ‘Poisson-like’ (terrible terminology). Here c is viewed as a nuisance parameter. In Poisson-like models, such parameters do not affect the log-likelihood - one can still do MLE as usual. This study found that a ‘Poisson-like’ model is a good fit for their data. 6

Homework 1 Solution (sketch)

Related documents

Products

Support

Homework 1 Solution (sketch)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib