Estimating mutual information Kenneth D. Harris 25/3/2015

advertisement
Estimating mutual
information
Kenneth D. Harris
25/3/2015
Entropy
𝐻 𝑋 =
− log 2 𝑝(𝑋 = 𝑋𝑖 )
𝑖
• Number of bits needed to communicate 𝑋, on average
Mutual information
Number of bits saved communicating X if you know Y
Number of bits saved communicating Y if you know X
𝐼 𝑋; π‘Œ = 𝐻 𝑋 − 𝐻 𝑋 π‘Œ
=𝐻 π‘Œ −𝐻 π‘Œ 𝑋
= 𝐻 𝑋 + 𝐻 π‘Œ − 𝐻 𝑋, π‘Œ
• If 𝑋 = π‘Œ, 𝐼 𝑋; π‘Œ = 𝐻 𝑋 = 𝐻 π‘Œ
• If 𝑋 and π‘Œ are independent, 𝐼 𝑋; π‘Œ = 0
𝐻 𝑋, π‘Œ = 𝐻 𝑋 + 𝐻 π‘Œ 𝑋 = 𝐻 π‘Œ + 𝐻 𝑋 π‘Œ
𝐻 𝑋 π‘Œ ≤ 𝐻(𝑋)
“Plug in” measure
• Compute histogram of X and Y, 𝑝 𝑋, π‘Œ .
• Estimate
𝐼=
𝑝 𝑋, π‘Œ log 2
π‘‹π‘Œ
• Biased above
𝑝 𝑋, π‘Œ
𝑝 𝑋 𝑝 π‘Œ
No information
• X and Y
independent and
random binary
variables
• True information is
zero
• Histogram is rarely
.25 .25
.25 .25
Bias correction methods
• Not always perfect
• Only use them if you truly
understand how they work!
Panzeri et al, J Neurophys 2007
Cross-validation
• Mutual information measures how many bits I save telling you about
the spike train, if we both know the stimulus
• Or how many bits I save telling you the stimulus, if we both know the
spike train
• We agree a code based on the training set
• How many bits do we save on the test set? (might be negative)
Strategy
• Use training set to estimate
𝑝 π‘†π‘π‘–π‘˜π‘’π‘  π‘†π‘‘π‘–π‘šπ‘’π‘™π‘’π‘ 
Compute
1
𝑁
−log 2 𝑝(π‘†π‘π‘–π‘˜π‘’π‘ ) − −log 2 𝑝 π‘†π‘π‘–π‘˜π‘’π‘ |π‘†π‘‘π‘–π‘šπ‘’π‘™π‘’π‘ 
𝑑𝑒𝑠𝑑 𝑠𝑒𝑑
Codeword length when we don’t know stimulus
Codeword length when we do know stimulus
This underestimates information
• Can show expected bias
is negative of plug-in
bias
Two choices:
• Predict stimulus from spike train(s)
• Predicted spike train(s) from stimulus
Predicting spike counts
• Single cell
𝑝 𝑛 𝑆 ~ π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘›(πœ‡π‘† )
𝑝 𝑛 = π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘› πœ‡
𝐼=
1
𝑁𝑑𝑒𝑠𝑑
log 2
𝑑𝑒𝑠𝑑
Likelihood ratio
𝑝 𝑛𝑆
𝑝 𝑛
Problem: variance is higher than Poisson
Solution: use generalized Poisson or negative binomial distribution
Unit of measurement
“Information theory is probability theory with logs taken to base 2”
• Bits / stimulus
• Bits / second (Bits/stimulus divided stimulus length)
• Bits / spike (Bits/second divided mean firing rate)
• High bits/second => dense code
• High bits/spike => sparse code.
Bits per stimulus and bits per spike
1 bit if spike
1 bit if no spike
1 bit/stimulus
.5 spikes/stimulus
2 bits/spike
6 bits if spike
64
log 2 63 = .02 bits if no spike
63
1
. 02 ∗ 64 + 6 ∗ 64 = 0.12 bits/stimulus
1
64
spikes/stimulus
7.4 bits/spike
Measuring sparseness with bits/spike
Sakata and Harris, Neuron 2009
Continuous time
log 𝑝 𝑑𝑠 |πœ† 𝑑
=
log πœ† 𝑑𝑠 − ∫ πœ† 𝑑 𝑑𝑑 + π‘π‘œπ‘›π‘ π‘‘
𝑠
• πœ† 𝑑 is intensity function
• If πœ† = 0 when there is a spike, this is −∞
• Must make sure predictions are never too close to 0
• Compare against πœ† 𝑑 = πœ†0 where πœ†0 is training set mean rate
Itskov et al, Neural computation 2008
Likelihood ratio
log 𝑝 𝑑𝑠 |πœ† 𝑑
=
log πœ† 𝑑𝑠 − ∫ πœ† 𝑑 𝑑𝑑 + π‘π‘œπ‘›π‘ π‘‘
𝑠
log 𝑝 𝑑𝑠 |πœ†0 =
𝑝 𝑑𝑠 πœ† 𝑑
log
𝑝 𝑑𝑠 πœ†0
log πœ†0 − ∫ πœ†0 𝑑𝑑 + π‘π‘œπ‘›π‘ π‘‘
𝑠
=
𝑠
πœ† 𝑑𝑠
log
πœ†0
− ∫ [πœ† 𝑑𝑠 − πœ†0 ]𝑑𝑑
Constants cancel! Good thing, since they are both infinite.
Remember these are natural logs. To get bits, divide by log(2) .
Predicting firing rate from place
πœ† 𝑑 =𝑓 𝐱 𝑑
𝑓 π‘₯ =
π‘†π‘π‘–π‘˜π‘’πΆπ‘œπ‘’π‘›π‘‘π‘€π‘Žπ‘ ∗ 𝐾 + πœ– 𝑓
π‘‚π‘π‘π‘€π‘Žπ‘ ∗ 𝐾 + πœ–
Cross-validation finds best smoothing
width
Without cross-validation, appears best
with least smoothing
Harris et al, Nature 2003
Comparing different predictions
Harris et al, Nature 2003
Download