Investigating attention in complex visual search

advertisement
Discrete approximation with binning
Given a continuous underlying distribution, the probability of an event in bin j is
𝑃𝑗 =
(S1)
∫𝐡 𝑒 ∑π‘˜ πœƒπ‘˜πœ™π‘—π‘˜ 𝑑π‘₯
𝑗
∑𝑖 ∫𝐡 𝑒 ∑π‘˜ πœƒπ‘˜ πœ™π‘—π‘˜ 𝑑π‘₯
𝑖
Equation 5 can be considered a piecewise-uniform approximation to (S1). Here we will
assume regressors, πœ™, vary smoothly over the sampling space and are continuously
differentiable. The log-likelihood function in this situation derived from (S1) is
(S2)
β„“(πœƒ) = ∑ [∑ 𝑦𝑗 (𝑑) log (∫ 𝑒 ∑π‘˜ πœƒπ‘˜ πœ™π‘—π‘˜ (π‘₯,𝑑) 𝑑π‘₯) − log (∑ ∫ 𝑒 ∑π‘˜ πœƒπ‘˜ πœ™π‘—π‘˜ (π‘₯,𝑑) 𝑑π‘₯ )]
𝑗
𝑑
𝐡𝑗
𝑖 𝐡𝑖
where 𝑦𝑗 (𝑑) is 1 when a fixation falls in the jth bin in time-interval t and 0 otherwise. Eq.
S2 gives an exact likelihood for the underlying distribution despite the fact that the
dependent measure has been discretely sampled; that is, estimation based on this
likelihood should not introduce any particular bias, rather, binning discards information,
imposing limits on the space of models that can be distinguished, such as the standard
Nyquist limits on the spatial bandwidth of the underlying distribution.
The log of the integral in both terms of (S2) is the partition function for the
distribution of point events conditioned on the domain of integration. A key property of
the partition function is that the nth-order derivative with respect to a parameter gives the
nth-order cumulant within bin Bj for the associated regressor (Agresti, 2014). The first
cumulant is simply the conditional expectation:
π›»πœƒ log (∫ 𝑒 πœƒ
𝑇 πœ™(π‘₯,𝑑)
𝑑π‘₯) = ⟨πœ™(π‘₯, 𝑑)⟩𝐡
𝐡
where ⟨… ⟩𝐡 denotes expectation within the domain B. The gradient of the likelihood
function can therefore be expressed succinctly as
π›»πœƒ β„“ = ∑ (𝑦𝑗 (𝑑) − πœ‹Μ‚π‘— (𝑑)) ⟨πœ™(π‘₯, 𝑑)⟩𝐡𝑗
𝑗,𝑑
(S3)
Standard maximum-likelihood estimation is a matter of finding where (A3) vanishes. Of
note, the estimate therefore depends only on the bin partition functions and their first
derivatives (the within-bin expectations ⟨πœ™(π‘₯, 𝑑)⟩𝐡𝑗 ). Likelihood-based estimation will
therefore be unable to distinguish between models for which both assume the same
values across all bins, a limitation imposed by discrete sampling. Included among such
models is one that is piecewise constant, for which the within-bin value of the
regressor πœ™π΅π‘— = ⟨πœ™(π‘₯, 𝑑)⟩𝐡𝑗 , which also includes a unique intercept for each bin. This
idealized model is a yardstick against which realizable models can be measured.
Realizable models are constrained by the fact that we avoid granting each bin a separate
intercept, which would otherwise violate the assumption of continuity, and that we are
limited to numerical approximations in evaluating the integral in the partition function.
Deviations from the ideal model introduce error in the parameter estimate. This
error has two sources: the substitution of πœ™π‘— = ⟨πœ™(π‘₯, 𝑑)⟩𝐡𝑗 with some proxy value, πœ™π‘—∗
within the same range as ⟨πœ™(π‘₯, 𝑑)⟩𝐡𝑗 , such as the value of πœ™(π‘₯, 𝑑) at the center of the bin
and the suppression of the bin intercepts. We may treat the weighted difference between
πœ™π‘—∗ and the true expectation, πœ–π‘—π‘‡ = 𝑦𝑗 (⟨πœ™⟩𝑗 − πœ™π‘—∗ ) − (πœ‹Μ‚π‘— ⟨πœ™⟩𝑗 − πœ‹Μ‚π‘—∗ πœ™π‘—∗ ), as a random error.
To quantify the magnitude of the error in a statistically meaningful way, we use
the expected deviation of the log likelihood from its maximum, which represents the
relative goodness of fit of the estimate under the approximation. Heuristically, this
quantifies how much worse the estimate becomes with binning error by how less well the
data support it. Using the second order expansion of the log likelihood function, β„“(πœƒ) ≈
1
β„“(πœƒΜ‚) + 2 π›Ώπœƒ 𝑇 β„“″ (πœƒΜ‚)π›Ώπœƒ, the error of the estimate can be related to the log likelihood
gradient as
π›Ώπœƒ ≈ β„“″−1 (πœƒΜ‚)β„“′(πœƒ)
so that
𝛿ℓ ≈ β„“(πœƒ) − β„“(πœƒΜ‚) = β„“′ (πœƒ)𝑇 β„“″−1 (πœƒΜ‚)β„“′(πœƒ)
(S4)
The binning approximation causes the gradient to deviate from zero by ∑𝑗 πœ–π‘—π‘‡ , hence
(S5)
𝛿ℓ ≈ ∑ πœ–π‘—π‘‡ β„“″−1 (πœƒΜ‚ ) ∑ πœ–π‘˜
𝑗
π‘˜
Under a first-order approximation, error is assumed to scale with the spatial derivatives of
πœ™ multiplied by the corresponding dimensions of the bin, or πœ–π‘— ∝ 𝛻π‘₯ πœ™(π‘₯𝑗∗ ) π›₯𝑗 . Along with
the simplifying assumptions that πœ–’s are independent and that 𝐸[πœ–π‘—2 ] = πœ‹π‘— (⟨πœ™⟩𝑗 −
𝑇
2
πœ™π‘—∗ )(⟨πœ™⟩𝑗 − πœ™π‘—∗ ) + 𝑂(πœ‹π‘—2 ) ≈ πœ‹π‘— (⟨πœ™⟩𝑗 − πœ™π‘—∗ ) this gives an expectation for 𝛿ℓ
(S6)
𝐸(𝛿ℓ) ≈ ∑ πœ‹π‘— π›₯𝑗𝑇 𝑄𝑗 π›₯𝑗 = ∑ πœŒπ‘— 𝛱(π›₯𝑗 )π›₯𝑗𝑇 𝑄𝑗 π›₯𝑗
𝑗
𝑗
where
𝑄𝑗 = 𝛻π‘₯ πœ™(π‘₯𝑗∗ )β„“″−1 (πœƒΜ‚)𝛻π‘₯ πœ™ 𝑇 (π‘₯𝑗∗ )
(S7)
is a matrix determined by the spatial gradient of regressors and the expected Fisher
information matrix and πœ‡π‘— = Π(π›₯𝑗 ) denotes the product of the elements of π›₯𝑗 . In the case
of two dimensions, solving for the optimum gives π›₯𝑗 ∝
1
v𝑗
√πœŒπ‘—
(−1) 𝑇
v𝑗
and (v𝑗
−
(2 + 𝑁𝑑−1 )𝐼)𝑄𝑗 v𝑗 = 0 where elements of v𝑗(−1) are the inverse of corresponding elements
of π›₯𝑗 .
For a more detailed discussion of optimal binning, the reader is referred to the
literature on signal block quantization (Du, Faber, & Gunzburger, 1999; Gersho, 1979;
Lloyd, 1982; Panter & Dite, 1951). An essential difference in the present case is that the
aim of optimization is improving model parameters estimates rather than minimizing
signal distortion. For this reason the likelihood-derived cost function, Eq. S6, depends on
expected Fischer information and the spatial gradients of regressors by way of the matrix
𝑄𝑗 .
Supplementary References
Agresti, A. (2014). Categorical data analysis: John Wiley & Sons.
Du, Q., Faber, V., & Gunzburger, M. (1999). Centroidal Voronoi tessellations:
applications and algorithms. SIAM review, 41(4), 637-676.
Gersho, A. (1979). Asymptotically optimal block quantization. Information Theory, IEEE
Transactions on, 25(4), 373-380.
Lloyd, S. (1982). Least squares quantization in PCM. Information Theory, IEEE
Transactions on, 28(2), 129-137.
Panter, P., & Dite, W. (1951). Quantization distortion in pulse-count modulation with
nonuniform spacing of levels. Proceedings of the IRE, 39(1), 44-48.
Download