Modeling Dwell-based Eye Pointing at Two-dimensional Targets Abstract Work-in-Progress

advertisement
Work-in-Progress
CHI 2012, May 5–10, 2012, Austin, Texas, USA
Modeling Dwell-based Eye Pointing at
Two-dimensional Targets
Xinyong Zhang
Abstract
School of Information, Renmin
Zhang et al. (2010) proposed a performance model for
dwell-based eye pointing. However, their model was
based on a specific circular target condition, without the
ability to predict the performance of acquiring
conventional rectangular targets. In this paper, we
extend their one-dimensional model to two-dimensional
(2D) target conditions. Carrying out an experiment, we
evaluate the abilities of different model candidates to
find out the most appropriate one. The new index of
difficulty we redefine for 2D eye pointing (IDeye) can
properly reflect the asymmetrical impact of target width
and height, and consequently the IDeye model can
accurately predict the performance for 2D targets (R2 >
0.9). According to the results of our study, the new
model can provide more useful design implications for
gaze-based interactions.
University of China
Beijing, 100872, P.R. China
x.y.zhang@ruc.edu.cn
Wenxin Feng
School of Information, Renmin
University of China
Beijing, 100872, P.R. China
fengwenxin@gmail.com
Hongbin Zha
Key Lab of Machine Perception,
MOE, Peking University
Beijing, 100871, P.R. China
zha@cis.pku.edu.cn
Keywords
Gaze input; 2D eye pointing task; modeling.
ACM Classification Keywords
Copyright is held by the author/owner(s).
H.5.2 [Information Interfaces and Presentation]: User
Interfaces - evaluation, theory and methods; H.1.2
[Models and Principles]: User/machine Systems –
human factors.
CHI 2012, May 5–10, 2012, Austin, TX, USA.
ACM 978-1-4503-1016-1/12/05.
General Terms
Experimentation; Human Factors; Theory.
1751
Work-in-Progress
CHI 2012, May 5–10, 2012, Austin, Texas, USA
Introduction
Target Area
△h
△w
C
△d
Cursor Positions
(a)
(b)
Figure 1. (a) A head mounted eye
tracker, Eyelink II, used as the
pointing device. The software and
hardware configurations of the
experiment were similar to those in
Zhang et al.'s work [7, 8]. The task
was presented on a 19-inch CRT
display at 1024×768 resolution.
Twenty participants (8 females and 12
males, with the average age of 24)
successfully completed this
experiment. (b) The dwelling area of
the unstable eye cursor. When the
cursor is inside of the target area, its
positions will be sampled at the rate of
25 Hz and used to calculate three
measures: ∆d, which denotes the
diameter if viewing the area as a circle,
∆w and ∆h, which denote the width and
the height, respectively, if treating this
area as a rectangle. For each trial, the
following two measures will also be
recorded: 1) eye movement time
(EMT), which is approximately
measured from the start of the trial to
the moment the cursor is entering the
target for the first time, and 2) Eye
pointing time (EPT), which is
measured from the start to the
moment the cursor has continuously
dwelt on the target for a given duration
(800 ms) to activate it.
Recently, Zhang et al. proposed a new performance
model instead of Fitts’ law for eye pointing [8]. Not
using Fitts' law as the theoretical foundation to
construct performance models but using a novel
concept, they defined a new index of difficulty (ID) for
dwell-based eye pointing. The corresponding model can
be expressed as follows:
T = a + b × ID eye = a + b ×
e λA
W −μ
(1)
where the symbols λ and μ are two empirical constants
that reflect the inherent features of eye movements,
and a and b are two regression coefficients. The
variables A and W denote movement distance and
target size (diameter), respectively. The fraction term
is defined as the index of difficulty of eye pointing task
(IDeye). Zhang et al. carefully verified this model in
different situations and compared it with Fitts' law and
other existing models, justifying its reliability and
advantages. However, they had not yet confirmed this
model's validity for general 2D targets.
The targets, such as buttons, icons and tool bars, in
commonly used user interface have two properties, i.e.
width (W) and height (H). In this paper, therefore, we
extend Zhang et al.'s work to clarify the effects of both
factors W and H on dwell-based eye pointing.
Previous work of modeling eye pointing
Although a number of studies investigated the validity
of Fitts' law for eye pointing, there was no research
specifically addressed the issue of modeling eye
pointing until Zhang et al.'s work. They indicated that
the definition of IDeye could be interpreted from the
perspective of information theory.
Fitts was the acknowledged pioneer who applied the
information theory to aimed movement and proposed
the well-known Fitts' law [2]. Its form used in HCI is
the Shannon formulation expressed as follows:
T = a + b × ID = a + b log2 ( A / W + 1)
(2)
This form was proposed by MacKenzie [4]. With respect
to a communication system, the capacity (C) of signal
transmission channel from transmitter to receiver is
defined by Shannon theorem 17 [5]:
C = B log2 (S / N + 1)
(3)
where B denotes channel bandwidth, S is transmitter's
signal power, and N means noise power. As can be
seen, the definition of ID is an analogy to C.
Zhang et al. hypothesized that “the logarithmic function
can be considered to be the `operator' of the encoding
and decoding process from the transmitter to the
receiver” [8]. Since the nervous signals relevant to eye
pointing is transmitted only in the cranial nerve system,
they viewed it as both the transmitter and the receiver
in the pseudo communication system they assumed.
Thus, they hypothesized that the processes of encoding
and decoding were unnecessary in the assumed
communication system. In addition, they employed a
concept of “signal enhancement” to account for the
effect of effortless saccadic eye movements. They
defined the “channel capacity” (C') of the information
transmission in eye pointing as follows:
C ′ = B(S + S 0 + N ) / N
(4)
Comparing Equation 4 with Shannon's theorem 17,
there are the absence of the logarithmic function and
the existence of a constant component S0, which
denotes the gain of signal enhancement. Zhang et al.
1752
Work-in-Progress
CHI 2012, May 5–10, 2012, Austin, Texas, USA
H
Targe
A
New model candidates for 2D eye pointing
did not provide a mathematical derivation for the
definition of C', but logically, it is reasonable.
W
+
Using the similar methodology of analogizing the Fitts
ID to the classical definition of channel capacity [4],
Zhang et al. initially defined IDeye based on C' as follows:
Trail-start
(a)
IDeye =
A + A0 + W
W
(5)
ID az
Vertical
where A0, as a distance constant, is the analogy to S0.
They hypothesized that A << A0 due to the little
contribution rate of A to eye movement time as
reported in various experiments [6, 3, 7], and that
W<A in general. Thus, they simplified IDeye to the
following forms:
Horizontal
1+
ID eye = A0
(b)
Figure 2. (a) Movement directions
of the eye cursor. The arrows just
denote the directions, without
showing the real experiment
interface. (b) Target appearances.
As can be seen, the center of the
target is marked with a small circle.
The trial start button is presented as
a small solid circle (whose diameter is
32 pixels), but in fact its effective
diameter is 120 pixels. The subject is
instructed to focus on the two small
circles in turn and avoid chasing the
visible cross cursor. According to our
investigation [9], these feedback
forms and instructions would not
significantly bias the stability of the
cursor but could make the subject
feel better to acquire the target.
A W
A
+
A0 A 0
e A0
e λA
≈ A0
⇒ ID eye =
W
W
W
Among the efforts to predict movement time depending
on both W and H, the methodology of weighted ℓp-norm
used by Accot and Zhai [1] is an effective solution to
integrate W and H together into the definition of ID.
Equation 7 expresses the general form of their
definition, which is denoted using IDaz in this paper.
(6)
where λ, as an empirical constant, is the replacement of
1/A0. Taking the instability of eye cursor [7] into
consideration, they introduced another empirical
constant –μ into the denominator W to compensate for
the negative effect of the unstable cursor. This means
that if target size is close to the threshold μ, it will be
impossible to perform the task. Equation 1 expresses
their final definition of IDeye. Zhang et al. found that the
grand mean of the observed diameters of the areas
where the eye cursor was jittering inside the target
(see Figure 1b) could be directly used as the empirical
constant μ. In other words, μ is a measurable and
meaningful parameter in practice. With respect to λ, it
was a small decimal (varied from .0003 to .0007), directly
reflecting the feature of saccades that movement
distance relatively has little impact on movement time.
r
r 1/ r
⎛⎡
⎞
⎛ A⎞ ⎤
⎛ A⎞
⎜
= log 2 ⎜ ⎢ω ⎜ ⎟ + η ⎜ ⎟ ⎥ + 1⎟⎟
⎜ ⎣⎢ ⎝ W ⎠
⎟
⎝ H ⎠ ⎦⎥
⎝
⎠
(7)
where ω and η are two weighted parameters, and r
denotes the norm order (r ≥ 1). Accot and Zhai
examined the accuracy of IDaz for different
parameterization schemes and found out that the
weighted Euclidean model (i.e. r = 2 and ω = 1), as
Equation 8 expresses, was the most appropriate form.
(8)
ID az = log 2 ( ( A / W ) 2 + η ( A / H ) 2 + 1)
Their work provides us very useful inspirations. We can
easily use the methodology of weighted ℓp-norm to
extend the original IDeye for 2D eye pointing. At the
same time, we hypothesize that the whole impact of
the target depends on both dimensions and if one
dimension's impact increases, the other dimension's
impact will decrease equivalently. Thus, we can let ω+η
= 1. With this revision, we directly redefine the index of
difficulty for 2D eye pointing as:
ID eye = e
λA
⎛
ω
1−ω
⎜⎜
+
r
(H − μ )r
⎝ (W − μ )
⎞
⎟⎟
⎠
1/ r
(9)
where the weight ω is in the range of [0, 1]. This
definition retains all the properties of IDaz [1] except for
that of scale independency [8]. At the same time, there
are additional properties as follows:
1753
Work-in-Progress
CHI 2012, May 5–10, 2012, Austin, Texas, USA
ƒ
Backward Compatibility. Regardless of ω and r, the
redefined IDeye is equivalent to the original form in
Equation 1 when the target is a square or a circle. In
other words, the original form is a special case of IDeye.
600 EMT (ms)
550
Vertical
500
Horizontal
45
40
400
350 40
56
250
W
90
90
160 160
H
(a)
2
1
1.5
0
0.5
-1
-0.5
-1.5
-2
200
1850 EPT (ms)
1650
Vertical
Horizontal
40
1450
1250
56
56
W
90 H
90160
160
2
1
1.5
0
0.5
-1
-0.5
850
-1.5
ω
(W − μ ) 2
+
1−ω
(H − μ )2
Design
We employed a repeated measures within-subject
design. There were four explicit experimental factors:
target width W (40, 56, 90, 160 pixels), height H (40,
56, 90, 160 pixels), gaze amplitude A (300, 600, 900
pixels) and eye cursor control method CCM (iSR vs.
iSR') 1 . For each method, there were 48 combinations of
task conditions (4W × 4H × 3A). All these
combinations, with 2 trials for each, composed a block,
within which the trials were presented in a random
order. The subject needed to perform 7 blocks of trials
under both of iSR and iSR' conditions.
(10)
Experiment: 2D eye pointing task
(b)
-2
In the new definition of IDeye, except for μ, there are
still three parameters (ω , λ and r) needed to be
determined. As Table 1 presents, we produced eight
parameterization schemes, i.e. eight model candidates.
When ω = 0.5, it means that W and H have equal
impact. When r = 2, we also call the corresponding
expression (Equation 10) Euclidean model.
T = a + b × e λA
40
1050
ƒ
Duality of the thresholds of H and W. When either
H or W is gradually decreased to a threshold (μ), the
difficulty of the task will tend to infinity.
ƒ
Complementarity of H and W. When one target
dimension's impact increases, the other dimension's
impact will decrease equivalently, and vise versa.
56
300
distance in the diagonal direction, with the concurrent
disappearance of the trial-start button. Then, the
subject was asked to fixate on the target as soon as
possible. If the cursor could continuously dwell on the
target up to the threshold of dwell time (800 ms) within
5000 ms, a successful trial would be recorded.
Otherwise, an error would be recorded. For each of the
unsuccessful trials, it could be repeated for 5 times at
most before suspending the experiment to recalibrate
the device. At the end of each trial, the start button
reappeared at a different position for next trial, with
the target disappearing again. Figure 2 shows the gaze
movement directions and the target appearances.
ln(W/H)
Figure 3. EMT and EPT by
ln(W/A). As can be seen, when
W or H gradually increased, EMT
and EPT would decrease in
general.
This experiment was to verify the accuracies of the
eight model candidates and find out the most suitable
one. Meanwhile, we used Equations 8, with subtracting
μ from W and H, as a baseline for comparisons.
Task and procedure
The task was to acquire a target by dwell-based eye
pointing. Figure 1 illustrates the gaze input device and
describes the measures in the task. For each trial, there
was a trial-start button randomly appearing at one of
the predefined positions on the screen. At first, the
subject needed to look at the button for a very short
time to explicitly begin a trial so that the target
immediately appeared at a given position with certain
Results
We excluded the error trials and the outliers, totaling
about 4.5% of the whole data, from our analyses. The
measure ∆d, ∆w and ∆h were only used to determine
the empirical constant μ and the overall error rate was
1
iSR’ is a revision of iSR [7]. We expected that it could improve
the human performance of eye pointing by making the cursor
more stable [8]. Unfortunately, these two methods did not
result in significant differences for all the measures we used.
The details of iSR’ are not presented due to the page limitation.
1754
Work-in-Progress
CHI 2012, May 5–10, 2012, Austin, Texas, USA
very low (2.4%). Therefore, here we only present the
effects of the different factors on EMT and EPT.
r
ω
a
972.1
164.1
969.1
*
163.6
989.9
0.5
158.4
2
957.5
*
158.2
991.7
0.5
179.7
∞
977.4
*
173.6
961.3
0.5
159.8
*
958.7
*
159.4
0.5
1
b
λ
ω, η
r
R2
111.5
70.1
114.6
70.5
110.5
69.8
113.7
70.3
154.0
94.1
164.9
99.0
112.1
70.8
115.2
71.2
.0012
.0013
.0011
.0013
.0011
.0013
.0011
.0013
.0012
.0015
.0012
.0014
.0011
.0013
.0011
.0013
—
—
0.35
0.40
—
—
0.33
0.39
—
—
0.43
0.45
—
—
0.34
0.40
—
—
—
—
—
—
—
—
—
—
—
—
1.64
1.47
1.65
1.49
.885
.886
.939
.904
.893
.887
.946
.905
.810
.806
.887
.843
.895
.890
.948
.908
—
—
3.38
1.90
456.3 195.7
Eq. 8
-112.2 122.1
— .877
— .895
Table 1. Summary of model fitting
for the mean data (results presented
in each cell above for EPT and below
for EMT). The asterisk denotes a free
parameter.
Eye Movement Time (EMT)
There was no significant learning effect in this
experiment (F6,114 = 0.56, p = .762). All the three
factors W (F3,57 = 107.36, p < .001), H (F3,57 = 114.45,
p < .001) and A (F2,38 = 381.65, p < .001) had a
significant main effect on EMT, and their interaction
effects W×A (F6,114 = 12.52, p < .001), H×A (F6,114 =
12.58, p < .001) and W×H (F9,171 = 2.72, p < .01) were
also significant. For each factor that significantly
affected EMT, post hoc pairwise comparison tests
indicated that all the differences between different
levels were significant. EMT was positively correlated to
A, but negatively to both W and H. As depicted in
Figure 3a, when W or H gradually increased, EMT would
decrease in general. The significant interaction effect
between W and H implied that both of them were not
independent of each other to affect EMT.
Eye Pointing Time (EPT)
EPT was also significantly affected by W (F3,57 = 72.83,
p < .001), H (F3,57 = 194.38, p < .001) and A (F2,38 =
234.51, p < .001), with a significant interaction effect
between each pair of them, especially that of W×H
(F9,171 = 2.40, p < .05). All the differences between
different levels of these factors were significant except
for that between the target width W levels of 90 and
160 pixels. As Figure 3b shows, the correlations of EPT
with W and H were similar to those of EMT.
Model fitting
Before fitting the models to the data, we calculated the
grand means of ∆d, ∆w and ∆h and found that the last
two were both close to 11.0 pixels. When viewing the
dwelling area of the cursor as a square instead of a
circle to determine the value of μ, we found that
Equation 10 had higher prediction accuracy (R2 > 0.98)
under the special case of acquiring square targets.
Thus, we set μ=11 when fitting the models to the mean
data of EPT and set μ=0 for the mean data of EMT since
the duration of dwelling had been excluded from EMT.
In addition, in order to make the values of IDeye and
IDaz distributed in a similar range, we magnified the
former 80 times on the base of Equation 10. Table 1
summarizes the results of model fitting. As can be seen,
IDeye resulted in higher R2 especially when ω was
treated as a free parameter. More importantly, the
estimates of the regression coefficient a were
unreasonable in the models of Equations 8. For EPT,
they were smaller than the specified dwell time (800
ms), and even negative for EMT. These situations could
lead to a negative prediction of EMT or a prediction of
EPT that was smaller than 800 ms when the task was
very easy. However, that is impossible in reality.
Therefore, IDeye was more suitable for 2D eye pointing.
With respect to the different parameterizations of ω and
r, it was better to set ω free than to preset ω=0.5. At
the same time, compared with the results in the
situation of setting r free, the Euclidean model
(Equation 10) was already good enough. According to
the estimates of λ in different situations, it seems that
we could fix it at the level of .001 to reach satisfactory
results. We plotted R2 as a function of both ω and r. As
can be seen in Figure 4, this function had a convex
curved surface. We found that the Euclidean model was
very close to the “optimal solution”, which the model’s
R2-ω curve contains the global maximum of R2.
Therefore, we finally give the most suitable model as
Equation 11 expresses.
1755
Work-in-Progress
CHI 2012, May 5–10, 2012, Austin, Texas, USA
T = a + b × 80e 0.001A
ω
(W − μ )
2
+
1−ω
(H − μ )2
(11)
Discussion and conclusions
We have verified that the Euclidean model is the most
suitable extension. As Table 1 shows, when r=1, IDeye
could mathematically fit to the data well enough, but
this situation would imply that W and H were
independent of each other to affect eye pointing time.
In fact, this did not agree with the findings that there
was significant interaction effect between W and H in
the experiment. Furthermore, if IDeye employed only
one of the target dimensions (i.e. ω = 1 or 0), the
correlation between IDeye and EPT as well as EMT would
be weak as Figure 4 demonstrates.
Our work also uncovered that the impacts of W and H
were asymmetrical. More importantly, unlike the
asymmetry in hand pointing [1, 10], it was target
height H that had greater impact on the human
performance of eye pointing because the estimates of ω
were less than 0.5. This point informs designers that it
will be more efficient to elongate target height to
facilitate dwell-based eye pointing.
Figure 4. IDeye model's fitness
(R2) as a function of ω and r. The
convexity of the curved surfaces
indicates that the estimates in Table
1 are reliable. This implies that
setting r = 2 is suitable for data
fitting because the summit of the
corresponding R2-ω curve is close to
the global maximum of R2
To summarize, the following redefined Euclidean form
of IDeye can accurately (R2 > 0.9) express the difficulty
of pointing at arbitrary 2D rectangular targets.
ID eye = e λ A
ω
(W − μ )
2
+
1−ω
(H − μ ) 2
The weighted parameter ω, varying in the range of
(0.25, 0.55), can properly capture the asymmetrical
impact of target width and height on eye pointing, with
the later exceeding the former in general.
Acknowledgements
This work was supported by the National Natural
Science Foundation of China (Grant No. 61070144).
References
[1] Accot, J. and Zhai, S. Refining Fitts' law models for
bivariate pointing. In Proc. CHI 2003, ACM Press (2003),
193-200.
[2] Fitts, P.M. The information capacity of the human
motor system in controlling the amplitude of movement.
Journal of Experimental Psychology, Vol.47 (1954),
381-391.
[3] Gajos, K.Z., Wobbrock, J.O. and Weld, D.S.
Automatically generating user interfaces adapted to
users' motor and vision capabilities. In Proc. UIST07,
ACM Press (2007), 231-240.
[4] MacKenzie, I.S. Fitts' law as a research and design
tool in human-computer interaction. Human-Computer
Interaction, Vol.7 (1992), 91-139.
[5] Shannon, C.E. A mathematical theory of
communication. The Bell System Technical Journal,
Vol.27 (1948), 379-423, 623-656.
[6] Sibert, L. and Jacob, J. Evaluation of eye gaze
interaction. In Proc. CHI 2000, ACM Press (2000), 281288.
[7] Zhang, X., Ren, X. and Zha, H. Improving eye
cursor's stability for eye pointing tasks. In Proc. CHI
2008, ACM Press (2008), 525-534.
[8] Zhang, X., Ren, X. and Zha, H. Modeling dwellbased eye pointing target acquisition. In Proc. CHI
2010, ACM Press (2010), 2083-2092.
[9] Zhang, X., Feng, W. and Zha, H. Effects of different
visual feedback forms on eye cursor's stabilities. In
LNCS Vol. 6775, HCII 2011, Springer (2011), 273-282.
[10] Zhang, X., Zha, H. and Feng, W. Extending Fitts'
law to account for the effects of movement direction on
2D pointing. To Appear in Proc. CHI 2012.
1756
Download