Sketch Recognition based on Manifold Learning Heeyoul Choi

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
Sketch Recognition based on Manifold Learning
Heeyoul Choi and Tracy Hammond
Department of Computer Science, Texas A&M University
3112 TAMU, College Station, TX 77843-3112, USA
{hchoi, hammond}@cs.tamu.edu
a new algorithm to measure dissimilarity distance between
shapes that is based on both spatial and temporal information. We also show how this algorithm can be modified to
accommodate for shapes drawn with multiple strokes. We
compare our method with the widely used Rubine featurebased method for recognizing shapes (Rubine 1991).
Abstract
Current feature-based methods for sketch recognition
systems rely on human-selected features. Certain machine learning techniques have been found to be good
nonlinear features extractors. In this paper, we apply a manifold learning method, kernel Isomap, with
a new algorithm for multi-stroke sketch recognition,
which significantly outperforms the standard featurebased techniques.
IMPLEMENTATION
In order to use manifold learning for recognition, we need
a projection property (or generalization property). A projection property provides the ability to project new test data
into the same feature space as training data. In contrast to
Isomap or LLE, kernel Isomap provides the projection property itself. Thus, we chose to use the kernel Isomap manifold learning technique. See (Choi & Choi 2007) for the
algorithm and the projection property.
INTRODUCTION
Sketching is a natural form of input for many domains,
such as drawing mathematical symbols, graphs, binary trees,
finite state machines, electrical circuit diagrams, military
course of action diagrams and many more. As in other
recognition domains (such as image and speech), feature selection is crucial for efficient and qualified performance in
sketch recognition. Previous research has provided several
suggestions for good feature sets. Rubine suggested 13 features based on drawing-style characteristics and time (Rubine 1991) and Long suggested 22 features modifying Rubine’s (Long et al. 2000). Such hand-selected feature sets
were well designed, but they are based on manual entry of
methods to extract them, which becomes tiresome. Moreover, feature design is sensitive to problems such as jitters.
More interestingly, machine learning research in other domains have shown that computers are able to select their own
features and often times provide clear advantages in classification.
To our knowledge, however, manifold learning has not yet
been applied to sketch recognition despite evidence that has
shown manifold learning methods to be good nonlinear feature extractors. Manifold learning is an effective method for
representation that works by recovering meaningful low dimensional structures hidden in high-dimensional data (Seung & Lee 2000). In this work, we apply the kernel Isomap
manifold learning method (Choi & Choi 2007) to classify
sketch data, because it has a projection property which provides the ability to project (map) new test data into (onto)
the same feature space as training data. Kernel Isomap requires a dissimilarity matrix to find a low-dimensional mapping from which it produces a feature set. We introduce
Dissimilarity from Sketch Data
To apply kernel Isomap to sketch classification, we need to
use the time structure in data, as well as, the generalization
property. In a kernel Isomap, what we need is not the data
points, but a dissimilarity matrix.
The first step in our algorithm is to scale each character
to be the same width and height. Then, we need a consistent (and large) number of points in each sketch. Thus, we
interpolate points between any two consecutive points that
are adjacent in time, but far away in distance to ensure that
each gesture has the same number of points. Some characters such as ‘4’ and ‘5’ are drawn using multiple strokes. We
simply connect and interpolate between the last point of the
previous stroke and the first point of the next stroke.
To calculate the dissimilarity matrix, we calculate the sum
of squared distance between two points of the same order in
each sketch. The dissimilarity between ith sketch data and
jth sketch data, Dij , is then given by
v
u S
uX
(1)
Dij = t (dk )2 ,
k=1
where dk is the Euclidean distance between kth points of
two sketches, and S is the number of points in one stroke. In
addition to normalizing each stroke by resampling, we use
different weights for different points. When we connect the
c 2008, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1786
two parts of ‘4’ or ‘5’, the connection causes confusion in
calculating the distance. If we use different weights so that
the fake points have less importance than real parts of the
stroke, we can eliminate some of the confusion.
100
l
e
e
l
ee e e
e
e
e
l
e
e
e ee
ee
e
50
ee
z
e
f zy
z yy
c
zzz z yyyy yy y
z
y
g
z
z
yy y
z
zzzgigg ggg i qq y
q
c
qg zzzz g qg fg ygg qq q y l
g
c f gz g g
ddd
q
q
c
d
y
q q
ccc
0 ddd
z
qq q
g s
cc
dddd i
h q
c
s
s
i cc
ddd ii
s
si s
d ii i
h
i cccc i i
s q s ss
i i
sc c s i
s
jj
f s s ss i s
h
jj f f fff s f i
f
t
ff f f sf f f f
jjjjjjj
j
f
i
uuv
−50
−80
−60
−40
−20
EXPERIMENTS
We carried out numerical experiments with three different
data sets: (a) 26 small letters in English written by one user
in one stroke with each letter having 25 data points; a third
of the examples of each letter were drawn two weeks later,
(b) 10 digits from 0 to 9, drawn by another user, where
each class has 25 characters and (c) 8 different mathematical
symbols (‘+’, ‘-’, ‘x’, ‘/’, ‘=’, ‘sin’, ‘cos’ and ‘tan’), drawn
by 10 subjects, where each symbol was drawn 5 times in a
single stroke. In order to show the improved accuracy of
our method, we compared our method with Rubine’s algorithm. Table 1 shows the hit rates of classification in three
data sets, respectively, by three approaches: (1) Rubine’s
method, (2) kernel Isomap (K-Isomap) and (3) weighted KIsomap (WK-Isomap). To make it fair, we extracted 13 features from the (W)K-Isomap1 as Rubine’s.
1
0.8
0.6
0.4
t
ji
oy
−1
−1
Figure 1 shows us how good the features are. We plotted the 2 best features from the English letters data set. We
can see that in K-Isomap, data points in the same class are
grouped together. (W)K-Isomap is better than Rubine’s for
the classification because it tries to find the best features to
represent the data set in an efficient way, whereas Rubine’s
features are already determined. We calculated the correlation matrix of each feature set (see Figure 2). We can
see that the Rubine features have a large amount of redundency, whereas the (W)K-Isomap features are uncorrelated
and more efficient.
to
l l
pp
pppp
pp
ppp
vo pp
x pp
ox
x
xx
d x xxx xn
m
m
x
w
mm
m
m
m
x
m
m d
mmo
b
m mm nnx
k m
xo
mum
x
k k km
m
m rn n
k kk
x
dk
noxx
k
n
n
k
k
no
kb
nnnn
k
onn
kk
k a pb
k w no
k
k
bb ww
x
k
n
a
o o
b
ap g b
b
w
w
w
w
w
bb bbbbtb w
wb w o n
wr
rw
o
brw
rrhrw
v v tt
r hru u
vw
h
va aavt avt vt taattvt vt ttrr rhb
o
hw
ou
aoat a ataa a ttatrur v hhrvw rrbrw
h
uu
v
h
v
r
u
b
h
a
v
u
o
o
u
h
j
h
a
a
h
uuuvuvuo
tvj hh uuru
j
v
o
0
20
e
cvzw
d
b nn nn ynrn
lkjih
m
o
g
rn
yxtu
m
n
m
nnrxxw
xvo
t
t
to
0.2 q
q
i
s
d
czssfa
0g
sq
g
q
os
o
q
−0.2 q
sd
qsfc
acg
sg
sc
−0.4 qcgqo
a
fqr
fcqg
so
cqfa
asax
−0.6
aa
a
np
aao
cgap
−0.8
Table 1: Averages of the hit rates of 50 times cross validation
with kNN classifier.
Methods
Alphabets
Digits
Math Symbols
Rubine’s
77.31%
91.80%
60.46%
K-Isomap
87.19%
95.90%
92.77%
WK-Isomap
87.04%
97.80%
92.41%
40
60
m
xr m
zrhm
dw
j ryx
xv
vxdu
e
uvz
xu
rrr
zk
edz
y
e
e
d
e
a
fsq
g
jih
u
v
z
n
p
rb
ylk
x
h
ll
f
k
ll
gk
p
paps p
pp
−0.5
ppp p p g
e
cyw
d
fq
a
b
n
to
ljih
vsm
p
0
m
p
0.5
1
Figure 1: The features of K-Isomap (top) and Rubine
method (bottom): Each axis means one feature.
2
2
4
4
6
6
8
8
10
10
12
12
2
CONCLUSION
4
6
8
10
12
2
4
6
8
10
12
Figure 2: Correlation matrices between 13 features (Red
color means high correlation): (Left) Rubine Features;
(Right) (W)K-Isomap Features.
In this paper, we developed a new technique for gesture classification using a manifold learning approach that combines
temporal and spatial information, while handling multiple
strokes with weighted distances. Experimental results confirmed the performance to be better than the Rubine feature
set.
References
Choi, H., and Choi, S. 2007. Robust Kernel Isomap. Pattern
Recognition 40(3):853–862.
Long, A. C.; Landay, J. A.; Rowe, L. A.; and Michiels, J. 2000.
Visual similarity of pen gestures. In Human Factors in Computing
Systems.
Rubine, D. 1991. Specifying gestures by example. Computer
Graphics 25(4):329–337.
Seung, H. S., and Lee, D. D. 2000. The manifold ways of perception. Science 290:2268–2269.
ACKNOLEDGEMENTS
This work was supported by KOSEF Grant-D00115. This work
funded in part by NSF IIS grant 0744150: Developing Perceptionbased Geometric Primitive-shape and Constraint Recognizers to
Empower Instructors to Build Sketch Systems in the Classroom.
1
coo
j on
t oo
vbt o
othn
yyotw
l l l ll l l l ll l ll
(W)K-Isomap means K-Isomap and WK-Isomap
1787