Unsupervised Order-Preserving Regression Kernel for Sequence Analysis Young-In Shin

advertisement
Unsupervised Order-Preserving Regression Kernel for Sequence Analysis
Young-In Shin
Department of Computer Sciences
1 University Station C0500
Austin, TX 78712-0233
codeguru@cs.utexas.edu
http://www.cs.utexas.edu/users/codeguru/
Abstract
one may directly match sequence. For example, in (Hassdonk & Burkhardt 2002), Gaussian DTW kernel is used to
compute the distance between two sequence. However, as
noted by the authors, it is not a metric since it lacks some
necessary properties, e.g. positive semidefiniteness, yielding sub-optimal solutions.
As work from past shows, it is desirable that a learning method rely on application-dependent features as little
as possible and that similarity is a metric with properties
owned by kernel functions. The method proposed in this
work meets all these criteria, which takes the latter approach
of direct sequence matching using a kernel function that
computes the similarity between sequence. Our approach
is based on unsupervised order-preserving regression, where
one finds a function that approximates a sequence of unlabelled data points with the constraint that data points projected onto the approximation function are in the same order as they are in a given sequence. Order-preservation is a
necessary property in sequence matching. However, we do
not perform regression over the sequence. We rather focus
on the order-preserving property of the projection indices,
which are the input parameters needed for regression. Note
that in unsupervised regression, projection indices are missing and must be somehow provided.
To introduce our kernel function, we first define projection indices as follows. Each data point xm
i in a sequence
xi , for m = 1, · · · , Ni , is associated with a project index,
m
n−1
n
for m ≥ 2,
n=2 xi − xi
≡
(1)
tm
i
0
for m = 1.
In this work, a generalized method for learning from
sequence of unlabelled data points based on unsupervised order-preserving regression is proposed. Sequence learning is a fundamental problem, which covers a wide area of research topic including, e.g. handwritten character recognition or speech and natural language processing. For this, one may compute feature vectors from sequence and learn a function in feature space or directly match sequence using methods
like dynamic time warping. The former approach is
not general in that they rely on sets of applicationdependent features, while, in the latter, matching is often inefficient or ineffective. Our method takes the latter approach, while providing a very simple and robust
matching. Results obtained from applying our method
on a few different types of data show that the method is
gerneral, while accuracy is enhanced or comparable.
Introduction
We consider the problem of learning from sequence of unlabelled data points. Learning from sequence is a fundamental problem, which covers a wide of research topic including, e.g. hand-written character recognition (Tapia &
Rojas 2003) speech and natural language processing, and
object detection. When dealing with sequence with variable
lengths, the difficulty lies in coming up with an effective
similarity measure for sequence. For this, one may compute feature vectors from sequence and learn a function in
feature space, implying inner product as the similarity measure, or one may directly match sequence using methods
like dynamic time warping (DTW). For example, in (Tapia
& Rojas 2003), features computed from hand-written figures and characters include coordinates of points, turning
angles and their changes, length position of each point, the
center of gravity of points, length, relative length, and accumulated angle, after preprocessing. They work well as
the result shows. But, since they are heuristically chosen
and application-dependent, one may have difficulty applying this method over data from other domains, not to mention that they are computationally expensive. Furthermore,
such features may not be effective enough. Alternatively,
1
2
≤ tm
for
Note the order-preserving property that tm
i
i
any m1 < m2 . Then, we define our kernel function as,
κ(xi , xj ) ≡
Nj
Ni n=1 m=1
n m
kx (xni , xm
j ) · kt (ti , tj ),
(2)
where kx and kt are any valid kernel functions. κ is a kernel function since multiplication and summation of kernel
functions is also a kernel function.
Suppose now, for example, that we are given a set of unlabelled training sequence, each drawn i.i.d. from a set X ,
i
i.e. S = {xi = [x1i , · · · , xN
i ] ∈ X , i = 1, · · · , }, where
k
D
xi ∈ R , Ni is the number of data points in xi , is the
c 2006, American Association for Artificial IntelliCopyright gence (www.aaai.org). All rights reserved.
1895
number of example sequence in S. It may be that Ni = Nj
for i = j. We may then wish to learn a function from S
that can determine whether an unseen test sequence x is also
from X . Since we defined our kernel, we could simply apply
any kernel based unsupervised learning algorithm, e.g. support vector novelty detection (SVND) (Scholkopf & Smola
2002) over S.
which ranges from −120◦ to 120◦ and d is the normalized distance to the obstacle in that direction, which ranges
from 0 to 1, where d = 1 corresponds to 4m. Our objective here is to detect blob of the scan which corresponds
to the ball. A blob is a sequence of only distance values;
di = [d1 , · · · , dNi ] and θ is not used here for rotation invariance. A ball blob will resemble an arc distorted by small
sensor noise. In Figure 3, red blob in the sensor plot is a
ball and other green and blue blobs are non ball objects, e.g.
box, table, etc. On its right, sample training data for ball and
non ball blobs are given. Our training data has 79 ball blobs
and test data has 204 ball and 804 non-ball blobs. We used
SVND and RBF was chosen for kx and kt with σ = 0.1.
Experiments
My method is applied to many different types of data.
Among them, we show results for hand-written numbers
and sensor data captured from Hokuyo URG-04LX laser
rangefinder. Hand-written numbers are represented as sequence of 2D point (x, y) of pixels on the screen. Our objective is to learn from a training set of sequence a function
that recognizes an unseen hand-written number. Training
set is composed of 200 examples, i.e. 20 examples for each
number from 0 to 9, and each of the two writers created 10
of the 20 samples. Following images show some training
samples for number ’1’ and ’5’ with number of data points
in each of them shown below.
400
400
400
400
350
350
350
350
300
300
300
300
250
250
250
250
200
200
200
200
150
100
100
150
150
200
250
300
350
400
100
100
19 points
150
150
200
250
300
350
400
9 points
100
100
200
250
300
350
400
14 points
100
100
350
350
300
300
300
300
250
250
250
250
200
200
200
200
250
300
350
14 points
150
150
200
250
300
350
15 points
150
150
0.15
0.27
60
0.26
0.145
0.25
0.14
0.6
0.24
0.23
0.135
30
150
0.22
0.4
0.13
0.21
0.2
0.125
0.2
0.19
0.12
180
0
5
10
15
20
25
0.18
30
0
5
10
15
20
25
0
0.15
0.301
0.3
0.145
0.299
0.14
210
330
0.298
0.135
0.297
0.13
0.296
0.125
240
300
270
Sensor Plot
0.12
0.295
0
5
10
15
20
25
0.294
30
Ball Blobs
0
2
4
6
8
10
12
14
Non Ball Blobs
Figure 3: Sample Training Data for Laser RangeFinder
150
200
250
300
350
Following table shows that a less than 1% of classification
error was obtained.
400
28 points
350
1
0.8
150
150
350
150
150
90
120
# SVs
54
Ball Error
0.49 %
Non Ball Error
0.62 %
200
200
250
300
350
17 points
150
150
200
250
300
Figure 4: Ball Blob Recognition
350
21 points
Figure 1: Sample Training Data for Hand-Written Numbers
Conclusion
We trained SVND for each number and RBF was chosen
for kx and kt with σ = 50. Test data is composed of 50
samples for each character, total 500 samples. If SNVD says
a test sample of its class is not novel or one of a different
class is novel, then we consider the classification is right.
The classification result is shown in the following table. The
classification error was less than 1% in most cases, while the
number of support vectors was on average about 90%. This
is due to relatively small numbers of training data.
The proposed kernel showed enhanced or comparable result,
while it is simple to compute. The intuition is that kx scores
high (low) when data points are close (far), while kt scores
high (low) when the order in the sequence is close (far). (2)
does not suffer from yielding sub-optimal solutions as DTW
kernel, while relying on heuristics or application-dependent
features as little as possible. Further challenges for future
research include making (2) transformation-invariant, realtime learning and prediction, and finding optimal projection
indices. I wish to point out that this method could also be
used for signature authentication.
No.
’1’
’2’
’3’
’4’
’5’
# SVs
17
18
15
19
17
Error
0.94 %
1.02 %
0.10 %
0.87 %
0.98 %
No.
’6’
’7’
’8’
’9’
’0’
# SVs
18
17
19
14
18
Error
1.15 %
0.38 %
0.14 %
0.07 %
0.01 %
References
Hassdonk, C. B. B., and Burkhardt, H. 2002. On-line handwriting recognition with support vector machines - a kernel
approach, Proc. of 8th IWFHR, pp. 49-54.
Scholkopf, B., and Smola, A. J. 2002. Learning with kernels - support vector machines, regularizations, optimization, and beyond.
Tapia, E., and Rojas, R. 2003. Recognition of on-line
handwritten mathematical formulas in the e-chalk system,
ICDAR.
Figure 2: Hand-written Number Recognition
Sensor data are represented as sequence of 2D points
(θ, d), where θ is the angle at which laser rays are shoot,
1896
Download