Linear Support Vector Approach for Pattern Recognition

advertisement
Linear Support Vector Approach for Pattern Recognition
Short Review
Oleg S. Seredin, September, 1999
This short text is a conspect of linear support vectors method (SV) for pattern recognition.
SV it is enough new popular branch in pattern recognition theory. The first publications were in
1992. The author of this theory, famous Russian scientist Vladimir Vapnik, whom working now
in AT&T Research Lab, sum up several his old ideas and invented this, so popular today,
approach.
We shall discuss about the elementary basis of linear SV, which was realized in program
Space5.2.
We are solving the classical pattern recognition task – the two classes training, based on set of
samples [1,2]. Training set contains N objects of two classes. Each object is represented by the
set of real features x j  R n and by the indices of class membership g j {1, 1} . I denote indices of
classes as 1 and –1, because it will be convenient in the next computations. We’ll solve the, so
called, geometrical task, or say more correctly deterministic task. It means that our goal is to find
hypersurface (hyperplane) which separates objects of different classes.
The idea, which Vapnik has suggested [3,4] is very simple. To find a direction vector a of
separable hyperplane that next goal function will be maximize.
J (a)  min aT x j  max aT x j  max , under constraint aT a  1 .
j:g j 1
j:g j  1
(1)
This criterion reflects reasonable desirability to find a direction where in the separable case the
gap between classes is maximal, and in the not separable case the overlap is minimal. If the
objects of two classes are not separable the criterion (1) is true, but the optimal value of the
objective function will be negative. Such a goal function is not convenient for numerical solving.
Lets consider another formulation. At first we suppose that classes are linear separable. In
such a case there exists a hyperplane aT x  b  0 , so that
aT x j  b   for g j  1 и aT x  b   for g j  1 , j  1,..., N ,
(2)
where   0 . In other words there are exist the margin between two classes. It is equal of 2 .
Vapnik calls the Optimal such hyperplane for which   max with constraints (2) and aT a  1 .
Let’s divide both constraints in (2) on  . The result is
1

Now we denote
1

a as a and
1

1
1 T
1
aT x j  b  1 ,
a x j  b  1.



b as b :
aT x j  b  1 for g j  1 and aT x  b  1 for g j  1 .
Or in more compact form
g j (aT x j  b)  1 ,
j  1,..., N .
(3)
Our goal is to maximize  . It means we should to minimize a . So we have objective function:
aT a  min , under constraints g j (aT x j  b)  1 ,
j  1,..., N .
(4)
The criterion (4) it is a standard task of quadratic programming.
Now, let’s suppose that classes are not linearly separable. Making the similar reasoning we must
minimize pattern overlap. It means that in such a case we shall obtain objective function:
aT a  max , under constraints g j (aT x j  b)  1 ,
j  1,..., N .
(5)
It is known from the numerical optimization theory that optimization with constraints has a two
settings – the primal and dual. For the cases (4) and (5) we have the primal task. Vapnik noticed
some interesting features of the dual task. It is not so difficult to show how obtain the dual task
based on primal. To do it is necessary to find the saddle point of a Lagrangian function, due
regard for Kuhn-Takker conditions. In primal task we can have one variable and N conditions. In
dual task we will have the N variables in objective function and one condition. So for primal task
(4) we have correspondence dual task:
N
W (1 ,...,  N )    j 
j 1
1 N N
 ( g j g k xTj x k ) j k  max,
2 j 1 k 1
N
(6)
  j g j  0,  j  0, j  1,..., N .
j 1
Here  j are, so-called, Lagrangian multipliers a working variables in dual task (6). Here is a
formula for variable conversions:
N
N
a   j g jx j
j 1
b
  j aT x j
j 1
N
j
j 1
.
Now lets return to the formula (5), and we’ll find a restriction. The problem is that (5) objective
function is convex, and for finding the global extreme we must to minimize convex function or to
maximize concave function.
That is why Vapnik decided do not use criterion (5) for non-separable case. He used a special
trick. He suggested to add some variables into criterion (5):
N
aT a  C   j  min ,
(5a)
j 1
where C - it is a positive constant, and constraints:
g j (aT x j  b)  1   j ,
 j  0, j  1,..., N .
The idea of the trick that we want “to shift” some objects to the side of correct class.
The dual quadratic task for primal criterion (5a) will be
N
W (1 ,..., N )    j 
j 1
N
 g
j 1
j
j
 0,
1 N N
( g j g k xTj x k ) j k  max,

2 j 1 k 1
(7)
0   j  C , j  1,..., N .
The formulas for variable conversions are the absolutely the same as in separable case.
It is necessary to add that there are exist several variants of the objective function (5a). The
difference is how to include  j . For instance it is possible to add
N
 (
j 1
N
2
or (  j ) 2 instead
j)
j 1
N

j 1
j
.
It is interesting that the only non zeros optimal variables in dual tasks  j  0 , are construct the
direction vector of optimal hyperplane, and objects of training sample for whom it is true there is
a support-vectors (or it is objects, which have minimal projection on direction vector for the first
class and maximal projection for the second one (1)).
There are exist a lot of philosophy about interpretation the support vectors and why they are so
useful. Some of them: (1) The pattern recognition task was formulated as optimization criterion.
(2) The number of support vectors gives you the relative accuracy of decision rule. (3) You have
not to store all another objects from sample.
Literature
1. Duda R. and Hart P. 1973. Pattern Classification and Scene Analysis. New York: Wiley.
2. Fukunaga K. Introduction to statistical pattern recognition. Academic Press, 1990.
3. Cortes C. and Vapnik V. Support-Vector Networks, Machine Learning, Vol.20, No.3, 1995.
4. Vapnik V. Statistical Learning Theory. John-Wiley & Sons, Inc. 1998.
Download