Investigating attention in complex visual search

Discrete approximation with binning Given a continuous underlying distribution, the probability of an event in bin j is 𝑃𝑗 = (S1) ∫𝐵 𝑒 ∑𝑘 𝜃𝑘𝜙𝑗𝑘 𝑑𝑥 𝑗 ∑𝑖 ∫𝐵 𝑒 ∑𝑘 𝜃𝑘 𝜙𝑗𝑘 𝑑𝑥 𝑖 Equation 5 can be considered a piecewise-uniform approximation to (S1). Here we will assume regressors, 𝜙, vary smoothly over the sampling space and are continuously differentiable. The log-likelihood function in this situation derived from (S1) is (S2) ℓ(𝜃) = ∑ [∑ 𝑦𝑗 (𝑡) log (∫ 𝑒 ∑𝑘 𝜃𝑘 𝜙𝑗𝑘 (𝑥,𝑡) 𝑑𝑥) − log (∑ ∫ 𝑒 ∑𝑘 𝜃𝑘 𝜙𝑗𝑘 (𝑥,𝑡) 𝑑𝑥 )] 𝑗 𝑡 𝐵𝑗 𝑖 𝐵𝑖 where 𝑦𝑗 (𝑡) is 1 when a fixation falls in the jth bin in time-interval t and 0 otherwise. Eq. S2 gives an exact likelihood for the underlying distribution despite the fact that the dependent measure has been discretely sampled; that is, estimation based on this likelihood should not introduce any particular bias, rather, binning discards information, imposing limits on the space of models that can be distinguished, such as the standard Nyquist limits on the spatial bandwidth of the underlying distribution. The log of the integral in both terms of (S2) is the partition function for the distribution of point events conditioned on the domain of integration. A key property of the partition function is that the nth-order derivative with respect to a parameter gives the nth-order cumulant within bin Bj for the associated regressor (Agresti, 2014). The first cumulant is simply the conditional expectation: 𝛻𝜃 log (∫ 𝑒 𝜃 𝑇 𝜙(𝑥,𝑡) 𝑑𝑥) = ⟨𝜙(𝑥, 𝑡)⟩𝐵 𝐵 where ⟨… ⟩𝐵 denotes expectation within the domain B. The gradient of the likelihood function can therefore be expressed succinctly as 𝛻𝜃 ℓ = ∑ (𝑦𝑗 (𝑡) − 𝜋̂𝑗 (𝑡)) ⟨𝜙(𝑥, 𝑡)⟩𝐵𝑗 𝑗,𝑡 (S3) Standard maximum-likelihood estimation is a matter of finding where (A3) vanishes. Of note, the estimate therefore depends only on the bin partition functions and their first derivatives (the within-bin expectations ⟨𝜙(𝑥, 𝑡)⟩𝐵𝑗 ). Likelihood-based estimation will therefore be unable to distinguish between models for which both assume the same values across all bins, a limitation imposed by discrete sampling. Included among such models is one that is piecewise constant, for which the within-bin value of the regressor 𝜙𝐵𝑗 = ⟨𝜙(𝑥, 𝑡)⟩𝐵𝑗 , which also includes a unique intercept for each bin. This idealized model is a yardstick against which realizable models can be measured. Realizable models are constrained by the fact that we avoid granting each bin a separate intercept, which would otherwise violate the assumption of continuity, and that we are limited to numerical approximations in evaluating the integral in the partition function. Deviations from the ideal model introduce error in the parameter estimate. This error has two sources: the substitution of 𝜙𝑗 = ⟨𝜙(𝑥, 𝑡)⟩𝐵𝑗 with some proxy value, 𝜙𝑗∗ within the same range as ⟨𝜙(𝑥, 𝑡)⟩𝐵𝑗 , such as the value of 𝜙(𝑥, 𝑡) at the center of the bin and the suppression of the bin intercepts. We may treat the weighted difference between 𝜙𝑗∗ and the true expectation, 𝜖𝑗𝑇 = 𝑦𝑗 (⟨𝜙⟩𝑗 − 𝜙𝑗∗ ) − (𝜋̂𝑗 ⟨𝜙⟩𝑗 − 𝜋̂𝑗∗ 𝜙𝑗∗ ), as a random error. To quantify the magnitude of the error in a statistically meaningful way, we use the expected deviation of the log likelihood from its maximum, which represents the relative goodness of fit of the estimate under the approximation. Heuristically, this quantifies how much worse the estimate becomes with binning error by how less well the data support it. Using the second order expansion of the log likelihood function, ℓ(𝜃) ≈ 1 ℓ(𝜃̂) + 2 𝛿𝜃 𝑇 ℓ″ (𝜃̂)𝛿𝜃, the error of the estimate can be related to the log likelihood gradient as 𝛿𝜃 ≈ ℓ″−1 (𝜃̂)ℓ′(𝜃) so that 𝛿ℓ ≈ ℓ(𝜃) − ℓ(𝜃̂) = ℓ′ (𝜃)𝑇 ℓ″−1 (𝜃̂)ℓ′(𝜃) (S4) The binning approximation causes the gradient to deviate from zero by ∑𝑗 𝜖𝑗𝑇 , hence (S5) 𝛿ℓ ≈ ∑ 𝜖𝑗𝑇 ℓ″−1 (𝜃̂ ) ∑ 𝜖𝑘 𝑗 𝑘 Under a first-order approximation, error is assumed to scale with the spatial derivatives of 𝜙 multiplied by the corresponding dimensions of the bin, or 𝜖𝑗 ∝ 𝛻𝑥 𝜙(𝑥𝑗∗ ) 𝛥𝑗 . Along with the simplifying assumptions that 𝜖’s are independent and that 𝐸[𝜖𝑗2 ] = 𝜋𝑗 (⟨𝜙⟩𝑗 − 𝑇 2 𝜙𝑗∗ )(⟨𝜙⟩𝑗 − 𝜙𝑗∗ ) + 𝑂(𝜋𝑗2 ) ≈ 𝜋𝑗 (⟨𝜙⟩𝑗 − 𝜙𝑗∗ ) this gives an expectation for 𝛿ℓ (S6) 𝐸(𝛿ℓ) ≈ ∑ 𝜋𝑗 𝛥𝑗𝑇 𝑄𝑗 𝛥𝑗 = ∑ 𝜌𝑗 𝛱(𝛥𝑗 )𝛥𝑗𝑇 𝑄𝑗 𝛥𝑗 𝑗 𝑗 where 𝑄𝑗 = 𝛻𝑥 𝜙(𝑥𝑗∗ )ℓ″−1 (𝜃̂)𝛻𝑥 𝜙 𝑇 (𝑥𝑗∗ ) (S7) is a matrix determined by the spatial gradient of regressors and the expected Fisher information matrix and 𝜇𝑗 = Π(𝛥𝑗 ) denotes the product of the elements of 𝛥𝑗 . In the case of two dimensions, solving for the optimum gives 𝛥𝑗 ∝ 1 v𝑗 √𝜌𝑗 (−1) 𝑇 v𝑗 and (v𝑗 − (2 + 𝑁𝑑−1 )𝐼)𝑄𝑗 v𝑗 = 0 where elements of v𝑗(−1) are the inverse of corresponding elements of 𝛥𝑗 . For a more detailed discussion of optimal binning, the reader is referred to the literature on signal block quantization (Du, Faber, & Gunzburger, 1999; Gersho, 1979; Lloyd, 1982; Panter & Dite, 1951). An essential difference in the present case is that the aim of optimization is improving model parameters estimates rather than minimizing signal distortion. For this reason the likelihood-derived cost function, Eq. S6, depends on expected Fischer information and the spatial gradients of regressors by way of the matrix 𝑄𝑗 . Supplementary References Agresti, A. (2014). Categorical data analysis: John Wiley & Sons. Du, Q., Faber, V., & Gunzburger, M. (1999). Centroidal Voronoi tessellations: applications and algorithms. SIAM review, 41(4), 637-676. Gersho, A. (1979). Asymptotically optimal block quantization. Information Theory, IEEE Transactions on, 25(4), 373-380. Lloyd, S. (1982). Least squares quantization in PCM. Information Theory, IEEE Transactions on, 28(2), 129-137. Panter, P., & Dite, W. (1951). Quantization distortion in pulse-count modulation with nonuniform spacing of levels. Proceedings of the IRE, 39(1), 44-48.

Investigating attention in complex visual search

Related documents

Products

Support

Investigating attention in complex visual search

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib