Homework 1 Solution (sketch)

advertisement
Homework 1 Solution (sketch)
Ngoc Mai Tran
Last updated: December 22, 2015
1.
Prove the following lemma. (Lecture reference: section 2.1, on deriving the
Wiener-Hopf equation).
Lemma 1 (Correlation equation) Suppose x(·) and y(·) are two zero-mean random processes, jointly stationary of order two. Let h be such that
Z T
ŷ(T + λ|T ) =
h(T, v)x(v) dv
−∞
is the linear least squares estimate for y(T +λ). Then h is time-invariant, and satisfies
Z ∞
h(v)Cxx (t − v) dv, t > 0.
(1)
Cyx (t + λ) = Cxy (−(t + λ)) =
0
where Cyx (t + λ) = E(y(t + λ)x(0)), and Cxx (t − v) = E(x(t − v)x(0)).
Proof: By the orthogonality property,
y(t + λ) − ŷ(t + λ) ⊥ y(σ).
Then
Z
t
E(y(t + λ)x(σ)) = E
Z
t
h(t, τ )x(τ ) dτ x(σ) =
−∞
h(t, τ )E (x(τ )x(σ)) dτ.
−∞
Rewrite in terms of the auto- and cross-correlation functions and change variable, we
get
Z t
Cxy (t + λ − σ) =
h(t, τ )Cxx (τ − σ) dτ σ ∈ (−∞, t)
−∞
Z ∞
Cxy (t + λ) =
h(t + σ, t + σ − τ )Cxx (t − τ ) dτ t ∈ (0, ∞).
0
1
Since Cxy (t+λ) and Cxx (t−τ ) do not depend on σ, we conclude that h(t+σ, t+σ −τ )
is also not dependent on σ. So h is time-invariant, i.e. h(t+σ, t+σ −τ ) = h(0, −τ ) =:
h(τ ). So h solves
Z ∞
Cxy (t + λ) =
h(τ )Cxx (t − τ ) dτ, t > 0.
0
2.
The matlab file c1p8.mat on http://www.gatsby.ucl.ac.uk/~dayan/book/
exercises/c1/data/c1p8.mat contains data collected and provided by Rob de Ruyter
van Steveninck from a fly H1 neuron responding to an approximate white-noise visual
motion stimulus. Data were collected for 20 minutes at a sampling rate of 500Hz.
In the file, rho is a vector that gives the sequence of spiking events or nonevents at
the sampled times (every 2ms). When an element of rho is one, this indicates the
presence of a spike at the corresponding time. Zero means no spike. The variable
stim gives the sequence of stimulus values at the sampled times.
Suppose we want to fit the causal linear filter model
r̂(t|t) = h ∗ s(t)
to the data, where r : [0, T ] → R is the spike rate, s : [0, T ] → R is the stimulus.
Code and plots: to be uploaded on website. We sketch the solution below.
Plot your estimate of h.
Here we have N = 1 (one trial), and λ = 0 (filtering). By example 6 in lecture, an
estimator for h based on the spike-triggered average is
Ĉrs (t)hri
σ2
P
i s(i), hri = n/T is the average spike rate,
h(t) =
P
where σ 2 = T1 i (s(i)2 − s̄), with s̄ =
P
and Ĉrs (t) = T1 nj=1 s(tj − t).
1
T
Generate spike sequences according to the (inhomogeneous) Poisson point
process with your value for r.
First we compute r̂ = h ∗ s. Then generate spike ρ by ρt ∼ Bernoulli(r̂(t)), ρt ’s
independent.
Compute the average correlation between your synthetic spikes and the
observed spike
For large M , generate spike sequences ρ1 , . . . , ρM according to the above model. Let
ρo be the observed spike sequence. For each λ ∈ {0, 1, . . . , T }, compute
1X
1X 1 X m
Cρρo (λ) =
E(ρ(t + λ))ρo (t) ≈
ρ (t + λ)ρo (t).
T t
T t M m
2
Do cross-validation Split the data into 5 time segments of equal length, fit your
model on 4 of them using steps 1 and 2 above, and do prediction on the remaining
segment using step 3. The 80% of data you use to fit the model is called the training
set. The remaining 20% is called the test set. Report the overall average correlation
for the test sets.
3b. The paper http://bethgelab.org/media/publications/BerensEtAl2012.
pdf described an experiment performed on neurons in the visual cortex of monkey.
The goal was to study the population code for orientation in the visual cortex.
Describe the experiment:
The stimulus were static (kept constant over trial) sine wave gratings. There were 17
sessions. In each session, there are 8 different stimulus orientations and two different
contrasts. They collected 10 to 85 trials per stimulus condition. The experiment was
performed on two alert monkeys (ie: not anesthesized). In each session between 6
and 20 neurons were recorded. Spike trains were binned at resolution of 10ms. The
goal is to decode orientation from the spike trains of the population of neurons.
Apply the logistic regression model to a single neuron. Suppose that the
vector X[t] in equation (1) of the paper is just a single random variable
X ∈ R, so
1
,
P(θ = θ1 |X) =
−wX−w
0
1+e
for w, w0 ∈ R. Suppose that the number of spikes X is Poisson with mean
f (θ), where f is the neuron’s tuning curve. That is,
e−f (θ) f (θ)x
.
x!
Give a formula for w in terms of f (θ1 ) and f (θ2 ).
P(X = x|θ) =
Assume uniform prior distribution, that is, P(θ = θ1 ) = P(θ = θ2 ) = 1/2. Note that
P(θ = θ2 |X = x) = 1 − P(θ = θ1 |X = x) =
e−wx−w0
.
1 + e−wx−w0
So by the model,
P(θ1 |x)
= ewx+w0 .
P(θ2 |x)
On the other hand,
P(θ1 |x)
P(x|θ1 )P(θ1 )
=
P(θ2 |x)
P(x|θ2 |x)P(θ2 )
P(x|θ1 )
=
P(x|θ2 |x)
x
f (θ1 )
−f (θ1 )+f (θ2 )
=e
f (θ2 )
3
by Bayes’ rule
by uniform prior
by Poisson model.
Thus,
w0 = f (θ2 ) − f (θ1 ),
and
w = log f (θ1 ) − log f (θ2 ).
How does this compare to equation (2) of the paper?
Equation (2) seems dubious. The main problem is that they did not define what f (θ)
means. In some papers in the literature, one would write f (θ) for our log f (θ) - but
even with this change of variable equation (2) seems incorrect. Not to mention that
they missed the constant term w0 in writing down the logistic regression. It could
also be that they did not use the uniform prior (which is very unlikely, however). I
could not find a derivation in the reference Ma et al paper. Moral of the story: define
your notations and check your references!
3/ What are the tuning curves used in the paper?
There are two populations of neurons, indexed by p = 1, 2. From the reference Berens
et al, the cosine-like tuning curve for orientation is
kp
1 1
+ cos(θ − φi )
,
fi (θ) = λ1 + λ2
2 2
where θ is the stimulus orientation, φi is the preferred orientation of neuron i, kp is a
parameter that controls the width of the tuning curves for population p, and λ1 and
λ2 are normalization constants that set fi (θ) to be in the range [5, 50].
The tuning curve for contrast for neuron i in population p is
fi (c) =
cn
,
cnp + cn
where cp is the 50% contrast levell (called the semisaturation contrast) for population
p.
4/ Why is the statistic d0i in equation (4) of the paper a measure of the
decoding performance of neuron i?
Let Xi be the number of spikes of neuron i. The higher d0i is, the further apart the
mean of the distributions P(Xi |θ = θ1 ) and P(Xi |θ = θ2 ), and thus the easier it is to
discriminate θ1 from θ2 based on Xi . The statistic d0i is called the discriminability.
5/ The authors defined the parameter d0 of the population decoder via
1 − Φ(d0 /2) = classification error of the population decoder.
Suppose the population only has a single neuron, so d0 = d0i given in equation (4). In this case, under what assumption(s) would 1 − Φ(d0 /2) be the
classification error the decoder?
4
Let X be the number of spikes of the neuron. Under the assumption that P(X|θ1 ) is
Gaussian with mean µ1 , variance λ21 , and P(X|θ2 ) is Gaussian with mean µ2 , variance
λ22 , then 1 − Φ(d0 /2) is the classification error of the ML decoder. (Prove by deriving
the ML decoder of this setup).
6/ Consider the results in Figure 1B-E. Suppose that the logistic regression
decoder is what is implemented in the monkey’s brain. How long does it
take for the monkey’s brain to distinguish between two orientations? Does
it find the task easier with high or low contrast? Are there much variations
between sessions?
It takes ≈ 80ms for the monkey to distinguish between two orientations with 75%
accuracy of the peak performance. Peak performance was reached after ≈ 120ms,
which is ≈ 80% correct, depending on ∆θ, the difference in the orientation. The task
is easier with high contrast and harder with low contrast, but not by much.
7/ Read the Discussion section. Give a summary of the authors’ opinion
to their own questions, which are the following:
• How do the parameters (ie: the vector w) of the instantaneous decoder change over time?
• How contrast-invariant is the population code?
• Are correlations important for decoding?
• Is the Poisson assumption in neural coding reasonable for this data?
In this paper, the authors considered orientation decoding under different contrasts
by a population of neurons in V1 of the macaque monkey. They fitted the monkey’s
performance using a logistic regression decoder. They found that:
• the decoder parameters remain largely constant over time.
• the decoder parameters also remain largely the same under different contrasts.
Decoders fitted on one contrast level and used for another contrast achieves
≈ 80% of the optimal performance. A single neuron response to tuning is known
to be independent of contrast. This property is called contrast-invariant. This
study suggests that the population response to tuning is also contrast-invariant.
• the decoder is fitted with the assumption that the noise in different neurons are
uncorrelated. In this study, they found the noise correlation between neurons
is ≈ 0.01. The decoder performed quite well, suggesting that the uncorrelated
assumption is ok for this case.
5
• From Ma et al 2010, the ‘Poisson-like’ assumption here refers to the assumption
that the distribution of the population firing rate r with given orientation θ and
contrast c is of the form
P(r|θ, c) = ϕ(θ, c)eh(θ)·r .
For example, if the neurons are independent and for neuron i, P(ri |θ, c) is Poisson with mean only dependent on θ, then the joint distribution of the firing
rate for population of neurons fit the above equation. Hence all distributions of
the above form is called ‘Poisson-like’ (terrible terminology). Here c is viewed
as a nuisance parameter. In Poisson-like models, such parameters do not affect
the log-likelihood - one can still do MLE as usual. This study found that a
‘Poisson-like’ model is a good fit for their data.
6
Download