Work-in-Progress CHI 2012, May 5–10, 2012, Austin, Texas, USA Modeling Dwell-based Eye Pointing at Two-dimensional Targets Xinyong Zhang Abstract School of Information, Renmin Zhang et al. (2010) proposed a performance model for dwell-based eye pointing. However, their model was based on a specific circular target condition, without the ability to predict the performance of acquiring conventional rectangular targets. In this paper, we extend their one-dimensional model to two-dimensional (2D) target conditions. Carrying out an experiment, we evaluate the abilities of different model candidates to find out the most appropriate one. The new index of difficulty we redefine for 2D eye pointing (IDeye) can properly reflect the asymmetrical impact of target width and height, and consequently the IDeye model can accurately predict the performance for 2D targets (R2 > 0.9). According to the results of our study, the new model can provide more useful design implications for gaze-based interactions. University of China Beijing, 100872, P.R. China x.y.zhang@ruc.edu.cn Wenxin Feng School of Information, Renmin University of China Beijing, 100872, P.R. China fengwenxin@gmail.com Hongbin Zha Key Lab of Machine Perception, MOE, Peking University Beijing, 100871, P.R. China zha@cis.pku.edu.cn Keywords Gaze input; 2D eye pointing task; modeling. ACM Classification Keywords Copyright is held by the author/owner(s). H.5.2 [Information Interfaces and Presentation]: User Interfaces - evaluation, theory and methods; H.1.2 [Models and Principles]: User/machine Systems – human factors. CHI 2012, May 5–10, 2012, Austin, TX, USA. ACM 978-1-4503-1016-1/12/05. General Terms Experimentation; Human Factors; Theory. 1751 Work-in-Progress CHI 2012, May 5–10, 2012, Austin, Texas, USA Introduction Target Area △h △w C △d Cursor Positions (a) (b) Figure 1. (a) A head mounted eye tracker, Eyelink II, used as the pointing device. The software and hardware configurations of the experiment were similar to those in Zhang et al.'s work [7, 8]. The task was presented on a 19-inch CRT display at 1024×768 resolution. Twenty participants (8 females and 12 males, with the average age of 24) successfully completed this experiment. (b) The dwelling area of the unstable eye cursor. When the cursor is inside of the target area, its positions will be sampled at the rate of 25 Hz and used to calculate three measures: ∆d, which denotes the diameter if viewing the area as a circle, ∆w and ∆h, which denote the width and the height, respectively, if treating this area as a rectangle. For each trial, the following two measures will also be recorded: 1) eye movement time (EMT), which is approximately measured from the start of the trial to the moment the cursor is entering the target for the first time, and 2) Eye pointing time (EPT), which is measured from the start to the moment the cursor has continuously dwelt on the target for a given duration (800 ms) to activate it. Recently, Zhang et al. proposed a new performance model instead of Fitts’ law for eye pointing [8]. Not using Fitts' law as the theoretical foundation to construct performance models but using a novel concept, they defined a new index of difficulty (ID) for dwell-based eye pointing. The corresponding model can be expressed as follows: T = a + b × ID eye = a + b × e λA W −μ (1) where the symbols λ and μ are two empirical constants that reflect the inherent features of eye movements, and a and b are two regression coefficients. The variables A and W denote movement distance and target size (diameter), respectively. The fraction term is defined as the index of difficulty of eye pointing task (IDeye). Zhang et al. carefully verified this model in different situations and compared it with Fitts' law and other existing models, justifying its reliability and advantages. However, they had not yet confirmed this model's validity for general 2D targets. The targets, such as buttons, icons and tool bars, in commonly used user interface have two properties, i.e. width (W) and height (H). In this paper, therefore, we extend Zhang et al.'s work to clarify the effects of both factors W and H on dwell-based eye pointing. Previous work of modeling eye pointing Although a number of studies investigated the validity of Fitts' law for eye pointing, there was no research specifically addressed the issue of modeling eye pointing until Zhang et al.'s work. They indicated that the definition of IDeye could be interpreted from the perspective of information theory. Fitts was the acknowledged pioneer who applied the information theory to aimed movement and proposed the well-known Fitts' law [2]. Its form used in HCI is the Shannon formulation expressed as follows: T = a + b × ID = a + b log2 ( A / W + 1) (2) This form was proposed by MacKenzie [4]. With respect to a communication system, the capacity (C) of signal transmission channel from transmitter to receiver is defined by Shannon theorem 17 [5]: C = B log2 (S / N + 1) (3) where B denotes channel bandwidth, S is transmitter's signal power, and N means noise power. As can be seen, the definition of ID is an analogy to C. Zhang et al. hypothesized that “the logarithmic function can be considered to be the `operator' of the encoding and decoding process from the transmitter to the receiver” [8]. Since the nervous signals relevant to eye pointing is transmitted only in the cranial nerve system, they viewed it as both the transmitter and the receiver in the pseudo communication system they assumed. Thus, they hypothesized that the processes of encoding and decoding were unnecessary in the assumed communication system. In addition, they employed a concept of “signal enhancement” to account for the effect of effortless saccadic eye movements. They defined the “channel capacity” (C') of the information transmission in eye pointing as follows: C ′ = B(S + S 0 + N ) / N (4) Comparing Equation 4 with Shannon's theorem 17, there are the absence of the logarithmic function and the existence of a constant component S0, which denotes the gain of signal enhancement. Zhang et al. 1752 Work-in-Progress CHI 2012, May 5–10, 2012, Austin, Texas, USA H Targe A New model candidates for 2D eye pointing did not provide a mathematical derivation for the definition of C', but logically, it is reasonable. W + Using the similar methodology of analogizing the Fitts ID to the classical definition of channel capacity [4], Zhang et al. initially defined IDeye based on C' as follows: Trail-start (a) IDeye = A + A0 + W W (5) ID az Vertical where A0, as a distance constant, is the analogy to S0. They hypothesized that A << A0 due to the little contribution rate of A to eye movement time as reported in various experiments [6, 3, 7], and that W<A in general. Thus, they simplified IDeye to the following forms: Horizontal 1+ ID eye = A0 (b) Figure 2. (a) Movement directions of the eye cursor. The arrows just denote the directions, without showing the real experiment interface. (b) Target appearances. As can be seen, the center of the target is marked with a small circle. The trial start button is presented as a small solid circle (whose diameter is 32 pixels), but in fact its effective diameter is 120 pixels. The subject is instructed to focus on the two small circles in turn and avoid chasing the visible cross cursor. According to our investigation [9], these feedback forms and instructions would not significantly bias the stability of the cursor but could make the subject feel better to acquire the target. A W A + A0 A 0 e A0 e λA ≈ A0 ⇒ ID eye = W W W Among the efforts to predict movement time depending on both W and H, the methodology of weighted ℓp-norm used by Accot and Zhai [1] is an effective solution to integrate W and H together into the definition of ID. Equation 7 expresses the general form of their definition, which is denoted using IDaz in this paper. (6) where λ, as an empirical constant, is the replacement of 1/A0. Taking the instability of eye cursor [7] into consideration, they introduced another empirical constant –μ into the denominator W to compensate for the negative effect of the unstable cursor. This means that if target size is close to the threshold μ, it will be impossible to perform the task. Equation 1 expresses their final definition of IDeye. Zhang et al. found that the grand mean of the observed diameters of the areas where the eye cursor was jittering inside the target (see Figure 1b) could be directly used as the empirical constant μ. In other words, μ is a measurable and meaningful parameter in practice. With respect to λ, it was a small decimal (varied from .0003 to .0007), directly reflecting the feature of saccades that movement distance relatively has little impact on movement time. r r 1/ r ⎛⎡ ⎞ ⎛ A⎞ ⎤ ⎛ A⎞ ⎜ = log 2 ⎜ ⎢ω ⎜ ⎟ + η ⎜ ⎟ ⎥ + 1⎟⎟ ⎜ ⎣⎢ ⎝ W ⎠ ⎟ ⎝ H ⎠ ⎦⎥ ⎝ ⎠ (7) where ω and η are two weighted parameters, and r denotes the norm order (r ≥ 1). Accot and Zhai examined the accuracy of IDaz for different parameterization schemes and found out that the weighted Euclidean model (i.e. r = 2 and ω = 1), as Equation 8 expresses, was the most appropriate form. (8) ID az = log 2 ( ( A / W ) 2 + η ( A / H ) 2 + 1) Their work provides us very useful inspirations. We can easily use the methodology of weighted ℓp-norm to extend the original IDeye for 2D eye pointing. At the same time, we hypothesize that the whole impact of the target depends on both dimensions and if one dimension's impact increases, the other dimension's impact will decrease equivalently. Thus, we can let ω+η = 1. With this revision, we directly redefine the index of difficulty for 2D eye pointing as: ID eye = e λA ⎛ ω 1−ω ⎜⎜ + r (H − μ )r ⎝ (W − μ ) ⎞ ⎟⎟ ⎠ 1/ r (9) where the weight ω is in the range of [0, 1]. This definition retains all the properties of IDaz [1] except for that of scale independency [8]. At the same time, there are additional properties as follows: 1753 Work-in-Progress CHI 2012, May 5–10, 2012, Austin, Texas, USA Backward Compatibility. Regardless of ω and r, the redefined IDeye is equivalent to the original form in Equation 1 when the target is a square or a circle. In other words, the original form is a special case of IDeye. 600 EMT (ms) 550 Vertical 500 Horizontal 45 40 400 350 40 56 250 W 90 90 160 160 H (a) 2 1 1.5 0 0.5 -1 -0.5 -1.5 -2 200 1850 EPT (ms) 1650 Vertical Horizontal 40 1450 1250 56 56 W 90 H 90160 160 2 1 1.5 0 0.5 -1 -0.5 850 -1.5 ω (W − μ ) 2 + 1−ω (H − μ )2 Design We employed a repeated measures within-subject design. There were four explicit experimental factors: target width W (40, 56, 90, 160 pixels), height H (40, 56, 90, 160 pixels), gaze amplitude A (300, 600, 900 pixels) and eye cursor control method CCM (iSR vs. iSR') 1 . For each method, there were 48 combinations of task conditions (4W × 4H × 3A). All these combinations, with 2 trials for each, composed a block, within which the trials were presented in a random order. The subject needed to perform 7 blocks of trials under both of iSR and iSR' conditions. (10) Experiment: 2D eye pointing task (b) -2 In the new definition of IDeye, except for μ, there are still three parameters (ω , λ and r) needed to be determined. As Table 1 presents, we produced eight parameterization schemes, i.e. eight model candidates. When ω = 0.5, it means that W and H have equal impact. When r = 2, we also call the corresponding expression (Equation 10) Euclidean model. T = a + b × e λA 40 1050 Duality of the thresholds of H and W. When either H or W is gradually decreased to a threshold (μ), the difficulty of the task will tend to infinity. Complementarity of H and W. When one target dimension's impact increases, the other dimension's impact will decrease equivalently, and vise versa. 56 300 distance in the diagonal direction, with the concurrent disappearance of the trial-start button. Then, the subject was asked to fixate on the target as soon as possible. If the cursor could continuously dwell on the target up to the threshold of dwell time (800 ms) within 5000 ms, a successful trial would be recorded. Otherwise, an error would be recorded. For each of the unsuccessful trials, it could be repeated for 5 times at most before suspending the experiment to recalibrate the device. At the end of each trial, the start button reappeared at a different position for next trial, with the target disappearing again. Figure 2 shows the gaze movement directions and the target appearances. ln(W/H) Figure 3. EMT and EPT by ln(W/A). As can be seen, when W or H gradually increased, EMT and EPT would decrease in general. This experiment was to verify the accuracies of the eight model candidates and find out the most suitable one. Meanwhile, we used Equations 8, with subtracting μ from W and H, as a baseline for comparisons. Task and procedure The task was to acquire a target by dwell-based eye pointing. Figure 1 illustrates the gaze input device and describes the measures in the task. For each trial, there was a trial-start button randomly appearing at one of the predefined positions on the screen. At first, the subject needed to look at the button for a very short time to explicitly begin a trial so that the target immediately appeared at a given position with certain Results We excluded the error trials and the outliers, totaling about 4.5% of the whole data, from our analyses. The measure ∆d, ∆w and ∆h were only used to determine the empirical constant μ and the overall error rate was 1 iSR’ is a revision of iSR [7]. We expected that it could improve the human performance of eye pointing by making the cursor more stable [8]. Unfortunately, these two methods did not result in significant differences for all the measures we used. The details of iSR’ are not presented due to the page limitation. 1754 Work-in-Progress CHI 2012, May 5–10, 2012, Austin, Texas, USA very low (2.4%). Therefore, here we only present the effects of the different factors on EMT and EPT. r ω a 972.1 164.1 969.1 * 163.6 989.9 0.5 158.4 2 957.5 * 158.2 991.7 0.5 179.7 ∞ 977.4 * 173.6 961.3 0.5 159.8 * 958.7 * 159.4 0.5 1 b λ ω, η r R2 111.5 70.1 114.6 70.5 110.5 69.8 113.7 70.3 154.0 94.1 164.9 99.0 112.1 70.8 115.2 71.2 .0012 .0013 .0011 .0013 .0011 .0013 .0011 .0013 .0012 .0015 .0012 .0014 .0011 .0013 .0011 .0013 — — 0.35 0.40 — — 0.33 0.39 — — 0.43 0.45 — — 0.34 0.40 — — — — — — — — — — — — 1.64 1.47 1.65 1.49 .885 .886 .939 .904 .893 .887 .946 .905 .810 .806 .887 .843 .895 .890 .948 .908 — — 3.38 1.90 456.3 195.7 Eq. 8 -112.2 122.1 — .877 — .895 Table 1. Summary of model fitting for the mean data (results presented in each cell above for EPT and below for EMT). The asterisk denotes a free parameter. Eye Movement Time (EMT) There was no significant learning effect in this experiment (F6,114 = 0.56, p = .762). All the three factors W (F3,57 = 107.36, p < .001), H (F3,57 = 114.45, p < .001) and A (F2,38 = 381.65, p < .001) had a significant main effect on EMT, and their interaction effects W×A (F6,114 = 12.52, p < .001), H×A (F6,114 = 12.58, p < .001) and W×H (F9,171 = 2.72, p < .01) were also significant. For each factor that significantly affected EMT, post hoc pairwise comparison tests indicated that all the differences between different levels were significant. EMT was positively correlated to A, but negatively to both W and H. As depicted in Figure 3a, when W or H gradually increased, EMT would decrease in general. The significant interaction effect between W and H implied that both of them were not independent of each other to affect EMT. Eye Pointing Time (EPT) EPT was also significantly affected by W (F3,57 = 72.83, p < .001), H (F3,57 = 194.38, p < .001) and A (F2,38 = 234.51, p < .001), with a significant interaction effect between each pair of them, especially that of W×H (F9,171 = 2.40, p < .05). All the differences between different levels of these factors were significant except for that between the target width W levels of 90 and 160 pixels. As Figure 3b shows, the correlations of EPT with W and H were similar to those of EMT. Model fitting Before fitting the models to the data, we calculated the grand means of ∆d, ∆w and ∆h and found that the last two were both close to 11.0 pixels. When viewing the dwelling area of the cursor as a square instead of a circle to determine the value of μ, we found that Equation 10 had higher prediction accuracy (R2 > 0.98) under the special case of acquiring square targets. Thus, we set μ=11 when fitting the models to the mean data of EPT and set μ=0 for the mean data of EMT since the duration of dwelling had been excluded from EMT. In addition, in order to make the values of IDeye and IDaz distributed in a similar range, we magnified the former 80 times on the base of Equation 10. Table 1 summarizes the results of model fitting. As can be seen, IDeye resulted in higher R2 especially when ω was treated as a free parameter. More importantly, the estimates of the regression coefficient a were unreasonable in the models of Equations 8. For EPT, they were smaller than the specified dwell time (800 ms), and even negative for EMT. These situations could lead to a negative prediction of EMT or a prediction of EPT that was smaller than 800 ms when the task was very easy. However, that is impossible in reality. Therefore, IDeye was more suitable for 2D eye pointing. With respect to the different parameterizations of ω and r, it was better to set ω free than to preset ω=0.5. At the same time, compared with the results in the situation of setting r free, the Euclidean model (Equation 10) was already good enough. According to the estimates of λ in different situations, it seems that we could fix it at the level of .001 to reach satisfactory results. We plotted R2 as a function of both ω and r. As can be seen in Figure 4, this function had a convex curved surface. We found that the Euclidean model was very close to the “optimal solution”, which the model’s R2-ω curve contains the global maximum of R2. Therefore, we finally give the most suitable model as Equation 11 expresses. 1755 Work-in-Progress CHI 2012, May 5–10, 2012, Austin, Texas, USA T = a + b × 80e 0.001A ω (W − μ ) 2 + 1−ω (H − μ )2 (11) Discussion and conclusions We have verified that the Euclidean model is the most suitable extension. As Table 1 shows, when r=1, IDeye could mathematically fit to the data well enough, but this situation would imply that W and H were independent of each other to affect eye pointing time. In fact, this did not agree with the findings that there was significant interaction effect between W and H in the experiment. Furthermore, if IDeye employed only one of the target dimensions (i.e. ω = 1 or 0), the correlation between IDeye and EPT as well as EMT would be weak as Figure 4 demonstrates. Our work also uncovered that the impacts of W and H were asymmetrical. More importantly, unlike the asymmetry in hand pointing [1, 10], it was target height H that had greater impact on the human performance of eye pointing because the estimates of ω were less than 0.5. This point informs designers that it will be more efficient to elongate target height to facilitate dwell-based eye pointing. Figure 4. IDeye model's fitness (R2) as a function of ω and r. The convexity of the curved surfaces indicates that the estimates in Table 1 are reliable. This implies that setting r = 2 is suitable for data fitting because the summit of the corresponding R2-ω curve is close to the global maximum of R2 To summarize, the following redefined Euclidean form of IDeye can accurately (R2 > 0.9) express the difficulty of pointing at arbitrary 2D rectangular targets. ID eye = e λ A ω (W − μ ) 2 + 1−ω (H − μ ) 2 The weighted parameter ω, varying in the range of (0.25, 0.55), can properly capture the asymmetrical impact of target width and height on eye pointing, with the later exceeding the former in general. Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant No. 61070144). References [1] Accot, J. and Zhai, S. Refining Fitts' law models for bivariate pointing. In Proc. CHI 2003, ACM Press (2003), 193-200. [2] Fitts, P.M. The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, Vol.47 (1954), 381-391. [3] Gajos, K.Z., Wobbrock, J.O. and Weld, D.S. Automatically generating user interfaces adapted to users' motor and vision capabilities. In Proc. UIST07, ACM Press (2007), 231-240. [4] MacKenzie, I.S. Fitts' law as a research and design tool in human-computer interaction. Human-Computer Interaction, Vol.7 (1992), 91-139. [5] Shannon, C.E. A mathematical theory of communication. The Bell System Technical Journal, Vol.27 (1948), 379-423, 623-656. [6] Sibert, L. and Jacob, J. Evaluation of eye gaze interaction. In Proc. CHI 2000, ACM Press (2000), 281288. [7] Zhang, X., Ren, X. and Zha, H. Improving eye cursor's stability for eye pointing tasks. In Proc. CHI 2008, ACM Press (2008), 525-534. [8] Zhang, X., Ren, X. and Zha, H. Modeling dwellbased eye pointing target acquisition. In Proc. CHI 2010, ACM Press (2010), 2083-2092. [9] Zhang, X., Feng, W. and Zha, H. Effects of different visual feedback forms on eye cursor's stabilities. In LNCS Vol. 6775, HCII 2011, Springer (2011), 273-282. [10] Zhang, X., Zha, H. and Feng, W. Extending Fitts' law to account for the effects of movement direction on 2D pointing. To Appear in Proc. CHI 2012. 1756