Match-Up & Conquer: A Two-Step Technique for Recognizing

advertisement
Match-Up & Conquer: A Two-Step Technique for Recognizing
Unconstrained Bimanual and Multi-Finger Touch Input
Yosra Rekik
Radu-Daniel Vatavu
Laurent Grisoni
Inria Lille, University Lille 1,
LIFL,
University Stefan cel Mare of
Suceava
University Lille 1, LIFL, Inria
Lille
yosra.rekik@inria.fr
vatavu@eed.usv.ro
ABSTRACT
We present a simple, two-step technique for recognizing multitouch gesture input independently of how users articulate gestures,
i.e., using one or two hands, one or multiple fingers, synchronous
or asynchronous stroke input. To this end, and for the first time in
the gesture literature, we introduce a preprocessing step specifically
for multi-touch gestures (Match-Up) that clusters together similar
strokes produced by different fingers, before running a gesture recognizer (Conquer). We report gains in recognition accuracy of up
to 10% leveraged by our new preprocessing step, which manages to
construct a more adequate representation for multi-touch gestures
in terms of key strokes. It is our hope that the Match-Up technique will add to the practitioners’ toolkit of gesture preprocessing
techniques, as a first step toward filling today’s lack of algorithmic
knowledge to process multi-touch input and leading toward the design of more efficient and accurate recognizers for touch surfaces.
Categories and Subject Descriptors
H.5.2. [Information Interfaces and Presentation]: Information
Interfaces and Presentation; I.5.2. [Pattern Recognition]: Design
Methodology - Classifier design and evaluation
General Terms
Algorithms; Human Factors; Design; Experimentation
Keywords
laurent.grisoni@lifl.fr
toward the success and adoption of this technology. However, the
versatility of multi-touch input makes prototyping multi-touch gesture recognizers a difficult task that requires dedicated effort because, in many cases, “the programming of these [multi-touch] gestures remains an art” [16] (p. 1). Up-to-date, there are no simple
techniques in the style of the $-family [3,27,31] to be employed
by designers for recognizing multi-touch gestures under the large
variety of users’ articulation behaviors [19] (see Figure 1 for some
examples). Consequently, designers have recurred to multi-touch
declarative formalisms [13] that require dealing with sometimes
complex regular expressions; to using toolkits that may be limited
to specific platforms only [16]; and to adapting single-stroke recognizers for multi-touch [11], in which case the expressibility of
multi-touch input might be affected.
Such a lack of simple techniques for multi-touch gesture recognition has many causes, such as the complexity of multi-touch input with many degrees of freedom, our today’s limited understanding of how multi-touch gestures are articulated, and, we believe,
a lack of algorithmic knowledge to preprocess multi-touch input
by considering its specific characteristics. For example, there are
many useful techniques available to preprocess raw gestures, such
as scale normalization, motion resampling, translation to origin,
and rotation to indicative angle that make stroke gesture recognizers invariant to translation, scale, and rotation (see the pseudocode
available in [3,4,15,27,31]). Although these techniques are general enough and may be applied to multi-touch gestures as well,
they cannot address the specific variability that occurs during multitouch articulation, such as the use of different number of fingers,
1
Multi-touch gestures; structuring fingers movement; gesture recognition; $P recognizer; unconstrained multi-touch input.
1
2
1
1.
2
INTRODUCTION
Multi-touch interfaces are increasingly popular, with applications from personal devices [22] to interactive displays installed in
public settings [10] and, lately, applications that span between these
two interactive spaces [14]. The broad range of interaction styles
offered by multi-touch, as well as the large number of degrees of
freedom available to users to articulate gestures have contributed
3
(a)
1
(b)
1
1
(e)
1
(f)
(d)
1
1
2
2
3
3
(g)
3
4
(c)
2
1
2
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
1
1
2
(h)
Figure 1: Various multi-touch articulation patterns for the “square”
symbol produced with different number of strokes (b-h), fingers
(b, d-h), sequential (c, d), and parallel movements (e-h). N OTE :
Numbers on strokes indicate stroke ordering. The same number on
top of multiple strokes indicates that all the strokes were produced
at the same time by different fingers.
strokes, and bimanual input [19], leaving multi-touch gesture recognizers non-invariant to these aspects of articulation.
To overcome this lack of algorithmic knowledge to assist in recognizing multi-touch input, we propose in this work a new preprocessing step that is specific to multi-touch gesture articulation. We
demonstrate the usefulness of our new preprocessing technique for
recognizing multi-touch gestures independently of how they were
articulated. To deal with the complex problem of handling users’
variations in articulating multi-touch input [2,19], we inspire from
the “Divide & Conquer” paradigm from algorithm design [7] (p.
28), in which a complex problem is broke down successively into
two or more subproblems that are expected to be easier to solve. We
group together strokes produced by different fingers with similar articulation patterns and identify key strokes as a new representation
for the multi-touch gesture (the “Match-Up” step) that we found
to improve the accuracy of subsequent recognition procedures (the
“Conquer” step). Match-Up & Conquer (M&C) is thus a two-step
technique able to recognize multi-touch gestures independently of
how they are articulated, i.e., using one or two hands, one or multiple fingers, and synchronous and asynchronous stroke input.
The contributions of this work include: (1) a new preprocessing
step, Match-Up, specific to multi-touch gestures (a first in the gesture literature) that structures finger movements consistently into
clusters of similar strokes, which we add to the practitioners’ toolkit
of gesture processing techniques, next to scale normalization, resampling, and rotation to indicative angle [3,4,15,27,31]; (2) an application of Match-Up to recognize multi-touch input under unconstrained articulation (Match-Up & Conquer), for which we show
an improvement in recognition accuracy of up to 10% over an existing technique; and (3) pseudocode for assisting practitioners in
implementing Match-Up into their gesture interface prototypes.
2.
RELATED WORK
We review previous works on gesture articulation and relate to
existing gesture recognition techniques. We also present recognition results for the $P gesture recognizer [27] applied to multitouch gestures and point to reasons explaining its performance.
2.1
Variability of gesture articulation
Supporting the users’ wide range of articulating multi-touch gestures has been previously noted as an important design criterion
for delivering increased flexibility and high-quality pleasurable interaction experiences [10]. This fact has led to a number of orthogonal design implications [9,18,30]. For example, principled
approaches attempt to provide basic building blocks from which
gesture commands can be derived, such as gesture relaxation and
reuse [32], fluidity of interaction techniques [20], and cooperative
gesture work [17]. Other researchers have advocated for assisting
users in the process of learning multi-touch gestures during actual
interaction and application usage by proposing gesture visualizations [9], dynamic guides [6], and multi-touch menus [5,12]. At
the same time, other user-centric approaches advocate for enrolling
users since the early stages of gesture set design [14,22,30]. Such
participatory studies have led to interesting findings on users’ behaviors in articulating gestures as well as on users’ conceptual models of the interaction. These works also recommend flexible design
of gesture commands to accommodate variations in how users articulate gestures. Oh et al. [18] go further and describe an investigation of the feasibility of user-customizable gesture commands, with
findings showing users focusing on familiar gestures and being influenced by misconceptions about the performance of gesture recognizers. To deal with users’ variations in multi-touch input, Rekik
et al. [19] introduced a taxonomy harnessing symbolic multi-touch
gestural variations at mental, physical, and movement levels, as a
result of a user-centric study.
2.2
Gesture recognition techniques
For symbolic gestures, the popular $-family of gesture recognizers [3,4,27,31] deliver robust recognition accuracy with a set
of simple techniques, in contrast with more complex recognizers,
such as Hidden Markov Models [23] and statistical classifiers [21].
However, the $-family was mainly designed and validated for singletouch gestures. Jiang et al. [11] extended $1 to recognize multifinger gestures by aggregating the touch paths into a single stroke.
However, the reduction developed in [11] to transform all touch
points in a single stroke is incompatible with the fact that multiple
strokes can interleave in time, such as drawing a square with two
symmetric movements in parallel (Figure 1e).
Several tools were proposed to help designers create gestures
easier, such as specifying multi-touch as regular expressions [13]
or supporting gesture programming by demonstration [16]. However, despite skillful design, these tools do not deal well with input
variability. For example, the number of fingers and strokes as well
as their space-time combination need to be predefined by the designer. The only possible variation allowed to users is to move fingers along the same direction in an infinite loop. “Concepture” [8]
is such a framework based on regular language grammars for authoring and recognition of sketched gestures with infinitely varying
and repetitive patterns.
2.3
Multi-touch gestures and $P
We employ in this work the $P recognizer [27] because previous
works found it accurate for classifying multi-stroke gesture input
under various conditions [1,27]. $P employs point clouds to represent gestures and, therefore, it manages to ignore users’ articulation variations in terms of number of strokes, stroke types, and
stroke ordering. $P has been validated so far on gestures articulated with single-touch strokes, which is typically the case for pen
and single-finger input [1,27]. However, multi-touch gestures exhibit considerably more degrees of freedom, with users articulating gestures with one or both hands, variable number of fingers,
and following synchronous and asynchronous gesture production
mechanisms. For example, users frequently employ multiple fingers to produce gestures, and even use different fingers to simultaneously articulate strokes with different shapes [19]. Figure 1 provides an illustration of various articulation patterns captured from
participants in our study when asked to produce a square. For such
articulations, extracting the key strokes (e.g., the four strokes that
make up the square) is a challenging task, and techniques like those
discussed in [11] for reusing the $1 recognizer [31] on multi-touch
gestures are not applicable when fingers move in parallel (e.g., each
finger drawing half the square as in Figure 1e).
As $P does not depend on the notion of stroke [27], a potential
approach to recognize multi-touch gestures would be to directly apply the matching technique of $P on the point clouds resulted from
sampling the multi-touch input. However, we argue that such an approach has limitations endorsed by the very nature of multi-touch
gesture articulation. For example, Figure 2 shows a specific situation in which two distinct gesture types, “spiral” and “circle”, have
similar point cloud representations because of additional points introduced in the cloud when employing multiple fingers. This limitation was confirmed by running the $P recognizer on a set of 22
multi-touch gesture types, for which the recognition accuracy, between 82% and 98%, was lower than in previous works that evaluated $P on similar, but single-touch gestures [27] (we provide complete details on the evaluation procedure under the Results section).
tamp, p = (xp , yp , tp ) ∈ R3 . A multi-touch gesture isrepresented
by a set of points, P = pi = xip , ypi , tip |i = 1..n . The displacement vector of point p between two consecutive timestamps
i
i−1
~pi = xip − xi−1
and the angle
ti−1 <ti is defined as D
p , y p − yp
~pi and D
~qi of points p and q is given by:
between vectors D
!
~pi · D
~qi
D
θp,q = arccos
(1)
~pi k kD
~qi k
kD
(a)
(b)
(c)
(d)
Figure 2: Adopting a direct point cloud representation for gestures
produced with multiple fingers can lead to situations in which different classes of gestures have similar point clouds, such as the
“spiral” (a) and the two-finger “circle” (c), which was not a problem before for single-touch input (b). More fingers lead to point
clouds that are even more problematic to discriminate (d vs. a).
The accuracy problem could be partially handled by sampling more
points from the multi-touch input that would provide more resolution for representing gestures and, potentially, more opportunity
for $P to discriminate between different gesture types. However,
previous works have shown that the performance of many gesture
metrics1 does not necessarily improve when more sampling resolution is available [25,27]. Also, increasing the size of the point
cloud increases the time required to compute the $P cost function,
which depends quadratically with the sampling rate O(n2.5 ) [27]
(p. 278). Consequently, such an approach may prove problematic
for low-resource devices that need particular attention in terms of
dimensionality representation in order to be able to sense and recognize gestures with real-time performance [26].
3.
MATCH-UP: PREPROCESSING FOR
MULTI-TOUCH INPUT
The first step of the M&C technique consists in running a clustering procedure to group touch points that belong to finger movements that are similar in direction and path shape. Two finger
movements are considered similar if they are produced simultaneously and are relatively “close” to each other. The goal of this
procedure is to identify key strokes, which are uniquely identifiable
movements in the multi-touch gesture (see Figure 3). A key stroke
may be composed of one stroke only (i.e., only one finger touches
the surface), or it may consist in multiple strokes when multiple
fingers touch the surface simultaneously and they all move in the
same direction and follow the same path shape.
We consider two points p and q as part of two similar finger
movements (p≈q) if their displacement vectors are approximately
collinear and p and q are sufficiently close together:
p ≈ q ⇔ θp,q ≤ εθ and kp − qk ≤ εd
where kp − qk represents the Euclidean distance between points p
and q, and εθ and εd are two thresholds.
With these considerations, we describe our clustering procedure
that groups together touch points of similar strokes at each timestamp of the articulation timeline. The procedure is based on the
agglomerative hierarchical classification algorithm [29] (p. 363):
1. Construct clusters for each touch point p available at timestamp t, Cj = {p ∈ P | tp = t}. Initially, all |Cj | = 1. If a
touch is detected for the first time, delay its cluster assignment until next timestamp.
2. For each pair of clusters (Cj , Ck ), compute their minimum
angle θj,k =min{θp,q | p ∈ Cj , q ∈ Ck } and minimum distance δj,k =min{kp − qk | p ∈ Cj , q ∈ Ck }.
3. Find the pair of clusters (Cj , Ck ) for which θj,k and δj,k satisfy equation 2 and θj,k is minimized.
4. If no such pair exists, stop. Otherwise, merge Cj and Ck .
5. If there is only one cluster left, stop. Otherwise, go to 2.
The result of the clustering process clearly depends on εθ and εd ,
for which we provide a detailed analysis later in the paper.
Once clusters are computed for each timestamp ti , they are analyzed to derive the key strokes of the multi-touch gesture. To consistently group clusters into key strokes, we track cluster evolution
over time and assign clusters of time ti with the same stroke identifiers used for the previous timestamp ti−1 <ti . If no clusters exist
at ti−1 , all the clusters are assigned new stroke identifiers, which
corresponds to the case in which users touch the surface for the
first time or when they momentarily release the finger from the
surface. Otherwise, we examine the touch points of every cluster and compare their point structure between times ti−1 and ti .
Each cluster Ci at moment ti will take the identifier of a previous cluster Ci−1 at time ti−1 <ti if the following conditions are
met: (i) there exists a subset of points from Ci that also appears
in Ci−1 , i.e., Ci ∩ Ci−1 6= ∅, (ii) all the other points of Ci appeared for the first time at moment t, and (iii) all the other points
from Ci−1 were released from the surface. Otherwise, a new stroke
identifier
is assigned to Ci . The result of this process is a set of clus
ters Cik | k = 1..K that reflect the key strokes of the multi-touch
gesture (e.g., K=2 for the example in Figure 3).
4.
Figure 3: A multi-touch gesture articulated with two hands and six
fingers (left) that has two key strokes (right).
We describe a touch point p by its 2-D coordinates and times1
As $P was not covered by these works [25,26], we cannot estimate its
behavior with increased sampling resolution. However, the peaking phenomenon was often observed in the pattern recognition community, according to which adding more features up from one point does not improve, but
actually increases classification error [24].
(2)
CONQUER: RECOGNIZING MULTITOUCH GESTURE INPUT
By computing clusters of points that belong to similar strokes,
we extract a representative set of points for each key stroke. To do
that, we use the trail of centroids across all timestamps t for clusters
that were assigned the same stroke identifier k ∈ {1..K}:
1 X
p
(3)
ck,t =
|Ck | p∈C
k
1
1 2
1
2
3
3
1
2
(a)
(b)
(c)
Figure 4: Gesture resampling with the Match-Up step (b) compared
to direct resampling (c) for several symbols in our set (a).
with key stroke k represented by the set {ck,t }. Key strokes are normalized and resampled with standard gesture preprocessing procedures [3,27,31]. The resulted key stroke representation is then fed
into a gesture recognizer, which is the $P recognizer [27] in this
work. Figure 4 shows the result of the point resampling procedure
before and after running the Match-Up technique for three multitouch gestures from our set: “circle”, “N”, and “asterisk”.
5.
EVALUATION
We conducted an experiment to collect multi-touch gestures in
unconstrained conditions in terms of number of strokes, fingers,
and single-handed and bimanual input. Our goal was to collect as
many variations as possible for articulating multi-touch gestures in
order to test the effectiveness of the Match-Up technique to make
recognizers invariant to such aspects of multi-touch articulation.
5.1
Participants
Sixteen participants (5 females) took part in the experiment (mean
age 27.5 years, SD=4.1 years, all right-handed). Half of participants were regular users of smart phones and tablet devices, and
two were regular users of an interactive tabletop.
5.2
Apparatus
Gestures were collected on a 32 inch (81.3 cm) multi-touch display, 3MTM C3266PW, supporting up to 40 simultaneous touches.
The display was connected to a computer running Windows and
our custom data collection software application. The interface of
the experiment application showed a gesture creation area covering
approximately the entire screen and the name of the gesture symbol
to articulate displayed at the top of the screen. Three buttons were
available to control each trial: Start, Save and Next. Before pressing Start, participants could experiment multi-touch input with their
fingers in the gesture creation area. Once Start was pressed, the
Save button was activated enabling participants to save their most
recent gesture. The application logged touch coordinates with associated timestamps and identification numbers. Once a gesture was
saved, participants could proceed to the next trial in the experiment.
5.3
selected to be general enough so that participants could reproduce
them without a visual representation and thus encourage unconstrained articulation behavior. The application asked participants
to produce gestures with trials presented in a random order. Only
the gesture name was presented to participants. For each symbol,
participants were asked to create as many different articulation variations as they were able to, given the requirement that executions
are realistic for practical scenarios, i.e., easy to produce and reproduce later. We asked five repetitions for each proposed variation of
each gesture type in order to dispose of a sufficient amount of training samples to assess the recognition accuracy of M&C. To reduce
bias [18], there was no recognition feedback. Also, to prevent any
visual content from influencing participants into how they articulate gestures, no visual feedback was provided other than showing
light red circles under each finger to acknowledge surface contact.
6.
RESULTS
To validate the effectiveness of Match-Up, we conducted a recognition experiment in which we compared the M&C technique employing the $P recognizer in the second step (Conquer) with the
unmodified $P [27]. We chose $P as previous works found it accurate for classifying multi-stroke gesture input under various conditions [1,27]. In this work we follow the same experiment methodology as in [27] (p. 275) by running both user-dependent and userindependent tests on our dataset of 22 distinct gesture types and
5,155 total samples. As the results of Match-Up depend on the values of εθ and εd (see equation 2), we conducted a preliminary study
to compute the optimal values of these parameters, which we found
εθ =30◦ and εd =12.5%. Later in the paper we discuss in detail the
impact of these two parameters on the performance of M&C.
6.1
User-dependent training
Recognition rates were computed individually for each participant with the same methodology as in [27] (p. 275). We report
results from ≈2.5×106 recognition tests by controlling the following factors: (1) the recognition condition, M&C with $P versus unmodified $P, (2) the number of training samples per gesture type,
T =1..9, and (3) the size of the cloud, n = 8, 16, 32, and 64 points.
Results showed an average recognition accuracy of 90.7% for
M&C, significantly larger than that delivered by the $P recognizer
running without the Match-Up step, which was 86.1% (Z=5.23,
p<.001). For n=32 points (recommended in [27]), M&C delivered 95.5% accuracy with 4 training samples per gesture type and
reached 98.4% with 9 samples, while $P delivered 93.5% and 97.4%
under the same conditions. Figure 6 illustrates the influence of
both number of training samples T per gesture type and the size
of the cloud n on recognition accuracy. M&C was significantly
more accurate than $P for all T ∈ {1...9} and n ∈ {8, 16, 32, 64}
(p < .05), except for T =9 and n=64.
The influence of the Match-Up step was larger for smaller point
clouds (9.4% for n=8) and smaller for fine-grained representations
of the point clouds (0.9% for n=64). To investigate further this
phenomenon, we divided gestures into four categories according
Procedure
The experiment application asked participants to enter gestures
from a dataset containing 22 different symbols (see Figure 5): letters, geometric shapes (triangle, square, horizontal line, circle),
symbols (five-point star, spiral, heart, zig-zag), and algebra (stepdown, asterisk, null, infinite). The gesture set is based on those
found in other interactive systems [3,19,28,31]. Gestures were also
Triangle
Square
Heart
Asterisk
Horizontale Line
Zig-Zag
Circle
Infinite
Five-Point-Star
Step-Down
Spiral
Null
Figure 5: The set of 22 gestures used in our experiment.
2
3
4
5
6
7
8
50
9
1
2
Number of training samples
5
6
7
8
80
70
n = 32
60
1
2
3
4
5
6
7
8
Number of training samples
Single touch
sequential
60
50
9
1
9
90
80
70
n = 64
60
50
2
3
1
2
3
4
5
6
7
8
90
80
70
50
4
Single touch
parallel
60
1
Number of training samples
80
70
Multi touch
sequential
60
1
Number of training samples
Figure 6: Recognition rates for M&C versus unmodified $P.
to (1) the number of fingers employed by participants (i.e., one or
multiple) and (2) the relative order of stroke articulation (sequential or parallel), as per the taxonomy of Rekik et al. [19]. We took
this approach as we hypothesized large number of points resulting
from multi-finger articulations cause $P to deliver decreased performance just because point clouds are not representative enough
(see the “circle” and “spiral” in Figure 2). We identified four categories: SS (Single-touch Sequential), SP (Single-touch Parallel),
MS (Multi-touch Sequential), and MP (Multi-touch Parallel). Sequential means that strokes are produced in a row. Parallel means
that some strokes (typically two) are drawn by different fingers at
the same time. Recognition rates were computed for each gesture
category under two conditions:
1. Gesture-category-dependent testing: training samples for
each gesture type were selected independently of category,
while testing samples from one category at a time, e.g., test
with single and multi-finger gestures separately.
2. Gesture-category-independent testing: training samples for
each gesture type were selected from one category at a time,
while testing samples were selected independently of the category, e.g., we train with single-finger articulations, but test
with both single and multi-finger articulations.
For these tests we report recognition rates for T in {1, 2, 3, 4}
because not all participants produced gestures in all categories for
all symbols. However, participants did produce at least five samples
for each variation they proposed. Also, some symbols were omitted if there were not enough samples to support this requirement
(e.g., gestures from the parallel categories were rarely produced for
symbols not symmetrical in shape, such as “spiral”, “S”, “P”, etc.).
Figures 7 and 8 show the recognition performance of M&C and
$P under both conditions. For category-dependent testing, results
showed M&C more accurate than $P for cases in which participants
articulated gestures with multiple fingers: 93.4% vs. 89.2% for the
multi-touch sequential category (Z=6.23, p<.001), and 89.9% vs.
84.9% for multi-touch parallel (Z=5.35, p<.001). Results were
not significant any longer for single-finger articulations: 92.7% vs.
92.9% for single-finger sequential and 93.1% vs. 92.6% for singlefinger parallel, n.s. For the gesture-category-independent scenario,
results showed M&C more accurate than the unmodified $P for
three out of four categories: 87.4% vs. 84.5% for SS (Z=5.98,
p<.001), 87.1% vs. 86.5% for SP (n.s), 87.8% vs. 82.5% for MS
(Z=6.07, p<.001), and 85.6% vs. 81.1% (Z=2.94, p<.005).
As we found that M&C scored higher for participants that articulated gestures with more fingers, we investigated this finding
further. Figure 9 illustrates the percentage of gesture categories
per participant, as well as the corresponding recognition rates com-
3
4
100
90
50
9
2
Number of training samples
100
Recognition rate [%]
Recognition rate [%]
Recognition rate [%]
4
100
90
50
3
70
Number of training samples
100
2
3
90
80
70
50
4
Multi touch
parallel
60
1
Number of training samples
2
3
4
Number of training samples
Figure 7: Recognition rates for gesture-category-dependent testing.
N OTE: Rates are averaged for all gesture types and participants;
n = 32 points; error bars show 95% CI.
100
Recognition rate [%]
1
n = 16
60
100
90
80
70
Single touch
sequential
60
50
1
2
3
90
80
70
50
4
Single touch
parallel
60
1
Number of training samples
3
4
100
90
80
70
Multi touch
sequential
60
50
2
Number of training samples
100
Recognition rate [%]
50
70
80
Recognition rate [%]
n=8
60
80
100
90
Recognition rate [%]
70
90
Recognition rate [%]
80
100
Recognition rate [%]
90
Recognition rate [%]
100
Recognition rate [%]
Recognition rate [%]
100
1
2
3
Number of training samples
4
90
80
70
Multi touch
parallel
60
50
1
2
3
4
Number of training samples
Figure 8: Recognition rates for gesture-category-independent testing. N OTE: Rates are averaged for all gesture types and participants, n=32 points, and error bars show 95% CI.
puted for T =4 and n=32. The figure reveals the fact that recognition rates for M&C are higher than those delivered by the unmodified $P for those participants who employed more fingers during gesture articulation (i.e., higher percentage for the MS and MP
categories). A Spearman test confirmed this observation, showing
significant correlation between the difference in recognition rates
of M&C and $P (for all T ) and the percentage of the MS and MP
categories employed by participants (ρ=.630, p<.001).
These results confirm our hypothesis that multi-finger gestures
(with more points from more fingers) significantly influence the
point cloud resampling of $P, which affects recognition accuracy.
However, Match-Up manages to group similar strokes together, and
constructs a more representative cloud for multi-touch gestures.
6.2
User-independent training
In this scenario, we computed recognition rates for each gesture
type using data from different participants as per the methodology
in [27] (p. 275). We report results from ≈2.6×107 recognition
tests by controlling the following factors: (1) the recognition condition, M&C with $P versus unmodified $P, (2) number of training
participants, P =1..15, and (3) number of training samples per gesture type, T =1..4. The size of the cloud was n=32 points.
Results show M&C significantly outperforming $P without the
Match-Up step, with 88.9% versus 82.5% (Z=3.400, p<.001).
For the maximum number of tested participants (P =15) and training samples (T =4), M&C achieved 94.2%, significantly higher
than 90.8% delivered by $P (Z=8.14, p<.001). The number of
100
60
40
9: Distribution of gesture categories (left)
and recognition rates per participant (right).
Note how the recognition rates for M&C
are higher for participants employing multiple fingers. N OTE: An asterisk ∗ next to
participant number shows the difference is
significant at p<.05; error bars show 95%
CI.
90
80
70
60
20
0
P1
P2
P3
P4
SS
P5
P6
P7
P8
MS
P1* P2* P3 P4 P5* P6 P7* P8* P9* P10*P11* P12 P13 P14 P15 P16*
MP
M&C
$P
100
Recognition rate [%]
Recognition rate [%]
100
90
80
70
60
50
50
P9 P10 P11 P12 P13 P14 P15 P16
SP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of training participants
90
80
Distance εd [%]
% of total
80
70
60
50
1
2
3
4
Number of training samples
7.
7.1
DISCUSSION
Effect of Match-Up parameters
We mentioned before that the key stroke extraction results of the
Match-Up technique depend on the values of the εθ and εd parameters of equation 2. The recognition experiment reported in the
previous section employed optimal values for these parameters that
we found to maximize recognition accuracy for our set of gestures,
i.e., εθ =30◦ and εd =12.5%. In this section we present an in-depth
analysis of the effect of these parameters on the recognition performance of M&C, for which we ran the user-dependent recognition
experiment and controlled the values of εθ and εd (T =4 training
samples per gesture type and n=32 points were used this time to
keep the running time manageable). We varied the values of the
angle εθ in the range [0, 90] by a step of 5◦ . For εd , we normalized
the Euclidean distance to the input area and varied εd in the range
[0, 50] by 2.5%. Overall, we report results for the user-dependent
recognition accuracy of M&C under 19 × 21 = 399 different combinations for εθ and εd (see Figure 11).
As expected, accuracy is low when the angle parameter εθ is
small (≤10◦ ), which makes nearby to be incorrectly assigned to
distinct clusters. As εθ increases, points are clustered correctly,
even though they do not follow a perfectly collinear stroke path.
For εθ ≥ 20◦ , recognition accuracy is impacted by the distance parameter, εd . For example, when εd is low (≤5%), points are incorrectly considered as individual clusters. As εd increases, points
that are both collinear and close together are more likely to be clustered correctly. For εd ≥15%, recognition rates start to decrease
significantly (p<.001). This result is explained by the fact that distant points that move in approximately the same direction can be
0.96
45
0.95
40
0.94
35
0.93
30
0.92
25
0.91
20
0.9
15
Figure 10: Recognition rates for the user-independent scenario.
N OTE : n = 32 points; error bars show 95% CI.
10
0.89
5
0.88
0
0.87
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Angle εΘ [degree]
Recognition rate [%]
participants had a significant effect on accuracy (χ2 (14)=55.82,
p<.001), with M&C delivering higher recognition rates than $P
(Figure 10, left). As expected, both the performance of M&C and
$P improved as the number of training participants increased, from
71.9% and 65.5% with one participant to 93.1% and 89.1% respectively with 15 training participants. The number of training samples had a significant effect on recognition accuracy (χ2 (3)=45,
p<.001), with both M&C and $P delivering higher rates with more
samples: 85.0% and 78.3% with one sample to 90.8% and 85.2%
with 4 samples per gesture type (Figure 10, right).
50
96
95
94
93
92
91
90
89
88
87
εd=0
εd=2.5
εd=5
εd=7.5
εd=10
εd=12.5
εd=15
εd=17.5
εd=20
εd=22.5
εd=25
εd=27.5
εd=30
εd=32.5
εd=35
εd=37.5
εd=40
εd=42.5
εd=45
εd=47.5
εd=50
0
10
20
30
40
50
60
Angle εΘ [%]
70
80
Recognition rate [%]
Recognition rate [%]
100
90
96
95
94
93
92
91
90
89
88
87
εΘ=0
εΘ=5
εΘ=10
εΘ=15
εΘ=20
εΘ=25
εΘ=30
εΘ=35
εΘ=40
εΘ=45
εΘ=50
εΘ=55
εΘ=60
εΘ=65
εΘ=70
εΘ=75
εΘ=80
εΘ=85
εΘ=90
0
5
10 15 20 25 30 35 40 45 50
Distance εd[%]
Figure 11: Effect of parameters εθ and εd on the recognition accuracy of M&C. Combined (top) and individual effects (bottom).
incorrectly clustered together for large εd values.
The highest recognition accuracy was obtained for εθ ≥30◦ and
εd ∈[10, 15]. For εθ =30◦ and εd =12.5%, we observed that stroke
extraction matched the structure of the articulated gesture. These
values were therefore selected to evaluate the M&C technique in
the previous section. Note however that the optimal values might
need slight adjustment for other datasets. However, we estimate
[10, 45] and [10, 15] as safe intervals for εθ and εd .
7.2
Confusable gestures
We found gesture type to have a significant effect on recognition
accuracy (χ2 (21)=113.81, p<.001), with some gestures exhibiting lower recognition rates. Remember that we did not constrain
participants to mind the number of fingers, nor did participants follow any training procedure before entering gestures. As a result,
our multi-touch dataset includes versatile gestures that expose the
intrinsic variability of articulating multi-touch input, and, consequently, some of the articulations are challenging to recognize. For
example, Figure 12 illustrates several ways in how participants articulated circles. Table 1 shows the top-5 most frequently misclassified gesture pairs by M&C. As the stroke structure of these gestures is similar, small deviations in their articulations likely leads to
recognition conflicts. However, maximum error rates were 0.56%
for confused pairs and 0.65% for individual gestures.
7.3
Execution time
The execution time of M&C is composed of the time required to
identify key strokes (Match-Up) and the time to run the recognizer
(Conquer). In practice, the execution time of the Match-Up step averaged over all participants and gesture types from our dataset was
Figure 12: Several articulations for the “circle” symbol captured in our dataset.
Confused pairs
Occurrence
Circle × Square
N×H
Circle × Heart
D × Circle
X × Asterisk
0.56%
0.28%
0.28%
0.24%
0.23%
Gestures
Square
Circle
N
D
X
Occurrence
0.65%
0.58%
0.32%
0.31%
0.28%
Table 1: Top-5 most confusable pairs (left) and gestures (right).
N OTE: user-dependent testing, T =4 samples, and n=32 points.
R Xeon
R CPU 2.67 GHz), which
22.8 ms (measured on a Intel
makes the Match-Up technique suitable for real-time processing.
7.4
On-line computation of Match-Up
According to the definition of the clustering procedure, the MatchUp technique may run on-line during actual articulation. In fact, the
procedure only needs to keep track of points detected at the previous timestamp and their corresponding cluster IDs to compute
identifiers for current points. The on-line computability feature of
Match-Up has interesting implications. First, key strokes may be
computed in real-time and, thus, compute the articulation type the
user is performing (e.g., Match-Up can detect that two strokes are
being performed at the same time when the user performs a gesture
with two parallel movements). This goes beyond gesture recognition toward discriminating between gesture variation categories.
Second, computing key strokes at runtime opens new interaction
possibilities. For instance, Match-Up makes possible to deliver
user feedback by displaying the key stroke movements made by
fingers as they occur. Such feedback may serve to help users learn
gestures in a flexible and consistent manner.
7.5
Any gesture recognizer
In this work we employed the $P recognizer [27] in the second
step (Conquer) of M&C and showed how Match-Up improves its
classification accuracy for multi-finger gestures. We note however
that any other recognizer may be used in conjunction with MatchUp. As a result, we deliver Match-Up as a new add-on to the practitioners’ toolkit of gesture processing techniques, leading toward
new algorithmic knowledge for processing multi-touch gestures.
8.
CONCLUSION
We introduced in this work, for the first time in the gesture literature, a preprocessing technique specific for multi-touch gestures
that clusters similar finger strokes. We used Match-Up in the M&C
technique and showed improved recognition accuracy of multi-touch
gestures under unconstrained articulation. Following the practice
of previous works in the literature, we make our gesture dataset
freely available to other practitioners, at http://anonymous. Finally, it is our hope that this work will advance our algorithmic
knowledge for multi-touch gestures, leading to the design of more
efficient and accurate recognizers for touch sensitive surfaces.
9.
REFERENCES
[1] L. Anthony, Q. Brown, B. Tate, J. Nias, R. Brewer, and
G. Irwin. Designing smarter touch-based interfaces for
educational contexts. Journal of Personal and Ubiquitous
Computing, November 2013.
[2] L. Anthony, R.-D. Vatavu, and J. O. Wobbrock.
Understanding the consistency of users’ pen and finger
stroke gesture articulation. proceedings of graphics interface.
In Proc. of GI ’13, pages 87–94.
[3] L. Anthony and J. O. Wobbrock. A lightweight multistroke
recognizer for user interface prototypes. In Proc. of GI ’10,
pages 245–252. Canadian Information Processing Society.
[4] L. Anthony and J. O. Wobbrock. $N-Protractor: a fast and
accurate multistroke recognizer. In Proc. of GI ’12, pages
117–120.
[5] G. Bailly, J. MüLler, and E. Lecolinet. Design and evaluation
of finger-count interaction: Combining multitouch gestures
and menus. Int. J. Hum.-Comput. Stud., 70(10):673–689.
[6] O. Bau and W. E. Mackay. Octopocus: a dynamic guide for
learning gesture-based command sets. In Proc. of UIST ’08,
pages 37–46.
[7] T. Cormen, C. Leiserson, R. Rivest, and C. Stein.
Introduction to Algorithms, 2nd Ed. MIT Press.
[8] N. Donmez and K. Singh. Concepture: a regular language
based framework for recognizing gestures with varying and
repetitive patterns. In Proc. of SBIM ’12, pages 29–37.
[9] D. Freeman, H. Benko, M. R. Morris, and D. Wigdor.
Shadowguides: visualizations for in-situ learning of
multi-touch and whole-hand gestures. In Proc. of ITS ’09,
pages 165–172. ACM Press.
[10] U. Hinrichs and S. Carpendale. Gestures in the wild:
studying multi-touch gesture sequences on interactive
tabletop exhibits. In Proc. of CHI ’11, pages 3023–3032.
[11] Y. Jiang, F. Tian, X. Zhang, W. Liu, G. Dai, and H. Wang.
Unistroke gestures on multi-touch interaction: supporting
flexible touches with key stroke extraction. In Proc. of IUI
’12, pages 85–88. ACM Press.
[12] K. Kin, B. Hartmann, and M. Agrawala. Two-handed
marking menus for multitouch devices. ACM Trans.
Comput.-Hum. Interact., 18(3):16:1–16:23.
[13] K. Kin, B. Hartmann, T. DeRose, and M. Agrawala. Proton:
multitouch gestures as regular expressions. In Proc. of CHI
’12, pages 2885–2894. ACM Press.
[14] C. Kray, D. Nesbitt, J. Dawson, and M. Rohs. User-defined
gestures for connecting mobile phones, public displays, and
tabletops. In Proc. of MobileHCI ’10, pages 239–248.
[15] Y. Li. Protractor: A fast and accurate gesture recognizer. In
Proc. of CHI ’10, pages 2169–2172, New York, NY, USA,
2010. ACM Press.
[16] H. Lü and Y. Li. Gesture coder: a tool for programming
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
multi-touch gestures by demonstration. In Proc. of CHI ’12,
pages 2875–2884.
M. R. Morris, A. Huang, A. Paepcke, and T. Winograd.
Cooperative gestures: Multi-user gestural interactions for
co-located groupware. In Proc. of CHI ’06, pages
1201–1210. ACM Press.
U. Oh and L. Findlater. The challenges and potential of
end-user gesture customization. In Proc. of CHI ’13, pages
1129–1138.
Y. Rekik, L. Grisoni, and N. Roussel. Towards many gestures
to one command: A user study for tabletops. In Proc. of
INTERACT ’13. Springer-Verlag.
M. Ringel, K. Ryall, C. Shen, C. Forlines, and F. Vernier.
Release, relocate, reorient, resize: fluid techniques for
document sharing on multi-user interactive tables. In CHI EA
’04, pages 1441–1444.
D. Rubine. Specifying gestures by example. In Proc. of
SIGGRAPH ’91, pages 329–337. ACM Press.
J. Ruiz, Y. Li, and E. Lank. User-defined motion gestures for
mobile interaction. In Proc. of CHI ’11, pages 197–206.
ACM Press.
T. M. Sezgin and R. Davis. Hmm-based efficient sketch
recognition. In Proc. of IUI ’05, pages 281–283. ACM Press.
C. Sima and E. Dougherty. The peaking phenomenon in the
presence of feature-selection. Pattern Recognition Letters,
29:1667–1674.
R.-D. Vatavu. The effect of sampling rate on the performance
of template-based gesture recognizers. In Proc. of ICMI ’11,
pages 271–278. ACM Press.
R.-D. Vatavu. The impact of motion dimensionality and bit
cardinality on the design of 3D gesture recognizers. Int.
Journ. of Human-Computer Studies, 71(4):387–409.
R.-D. Vatavu, L. Anthony, and J. O. Wobbrock. Gestures as
point clouds: a $P recognizer for user interface prototypes.
In Proc. of ICMI ’12, pages 273–280. ACM Press.
R.-D. Vatavu, D. Vogel, G. Casiez, and L. Grisoni.
Estimating the perceived difficulty of pen gestures. In Proc.
of INTERACT’11, pages 89–106. Springer-Verlag.
A. Webb. Statistical pattern recognition. John Wiley & Sons,
Ltd.
J. O. Wobbrock, M. R. Morris, and A. D. Wilson.
User-defined gestures for surface computing. In Proc. of CHI
’09, pages 1083–1092. ACM Press.
J. O. Wobbrock, A. D. Wilson, and Y. Li. Gestures without
libraries, toolkits or training: a $1 recognizer for user
interface prototypes. In Proc. of UIST ’07, pages 159–168.
ACM Press.
M. Wu, C. Shen, K. Ryall, C. Forlines, and R. Balakrishnan.
Gesture registration, relaxation, and reuse for multi-point
direct-touch surfaces. In Proc. of TABLETOP ’06, pages
185–192.
M ATCH -U P (P OINTS P, C LUSTERS previous)
1
2
3
4
5
6
7
8
9
We provide pseudocode for the Match-Up technique that will run
at every timestamp t during the entire duration of articulation of the
multi-touch gesture and deliver its representation as key strokes. A
key stroke is represented by a cluster of points resulted from grouping together points that belong to similar strokes. P OINT is a structure that defines a touch point with position coordinates (x, y), the
two previous positions dp and ddp and identification id. P OINTS
is a list of points. C LUSTER is a structure that contains a list of
points and an id. C LUSTERS is a list of clusters.
while |clusters| ≥ 2 do
(A, B) ← M OST-S IMILAR -C LUSTERS(clusters)
if (A, B) == null then break
A.points ← A.points ∪ B.points
R EMOVE(clusters, B)
M ATCH -ID S(clusters, previous)
Return clusters
M OST-S IMILAR -C LUSTERS(C LUSTERS clusters)
1
2
3
4
5
6
7
8
9
εθ ← 30, εd ← 0.125 · I NPUT-S IZE
θmin ← ∞, (Amin , Bmin ) ← null
for each A ∈ clusters do
for each B ∈ clusters do
θ ← M INIMUM -A NGLE(A, B);
δ ← M INIMUM -D ISTANCE(A, B);
if (θ ≤ εθ and δ ≤ εd ) then
if (θ < θmin ) then
θmin ← θ, Amin ← A, Bmin ← B
M INIMUM -A NGLE(C LUSTER A, C LUSTER B)
7
θmin ← ∞
for each p ∈ A.points do
for each q ∈ B.points do
a ← S CALAR -P RODUCT(p − dp, q − dq);
b ← N ORM(p − dp) · N ORM(q − dq)
θ ← ACOS(a/b);
if (θ<θmin ) then θmin ← θ
8
Return θmin
1
2
3
4
5
6
M INIMUM -D ISTANCE(C LUSTER A, C LUSTER B)
5
δmin ← ∞
for each p ∈ A.points do
for each q ∈ B.points do
δ ← E UCLIDEAN -D ISTANCE(p, q)
if (δ < δmin ) then δmin ← δ
6
Return δmin ;
1
2
3
4
M ATCH -ID S(C LUSTERS current, C LUSTERS previous)
1
2
3
4
5
6
7
8
9
APPENDIX
clusters ← new C LUSTERS
for each p ∈ P such as dp 6= null do
I NSERT(clusters, new C LUSTER(p))
10
11
12
13
14
15
16
17
18
for each C ∈ current do
matched ← f alse;
for each K ∈ previous do
copy ← C OPY-P OINTS(K.points)
for each p ∈ C.points such as ddp 6= null do
matchedP t ← f alse;
for each q ∈ K.points do
if (pi .id == q.id) then
R EMOVE(copy, q)
matchedP t ← true;
break
if (not matchedP t) then continue K. /*Go to Line 3*/
if (S IZE(copy) 6= null) then continue K
C.id ← K.id
matched ← true
break
if (not matched) then C.id ← new id
Return current
Download