CS267-FINAL REPORT

advertisement
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
SUPPORT VECTOR MACHINE
INTRODUCTION ::
The number of documents on World Wide Web (Internet) is ever increasing and its growth is doubling
every day. To classify each of documents by humans is not possible and also not feasible. Managing
structure of such huge documents is not possible so we shall discuss few methods of organizing the
data into proper structure. As well as, we shall look into the details of classifying new data into the
already present category.
Support vector machines (SVM) is a set of related supervised learning method that can be used for







Text Classification.
Analyzing Data.
Recognize Patterns.
Regression Analysis.
Bio-informatics.
Signature/hand writing recognition.
E-mail Spam Categorization.
Supervised learning is the machine learning task of deducing a category from supervised training data.
The training data consist of a set of training examples. In supervised learning, each example is a pair
consisting of an input object and a desired output value. A supervised learning algorithm analyzes the
training data and then predicts the correct output categorization for given data-set input. For e.g.
Teacher teaches student to identify apple and oranges by giving some features of that. Next time when
student sees apple or orange he can easily classify the object based on his learning from his teacher,
this is called supervised learning. He can identify the object only if it is apple or orange, but if the given
object was grapes the student cannot identify it.
Sparse Matrix is the matrix containing many values that are 0. Computing many 0 in the matrix is time
consuming and utilizing lots of resources without giving optimized output. So this matrix is compressed
into Sparse Data which contains non-zero values of the Sparse Matrix. It is usually 2- dimensional array
which contains the non-zero value and the position in the original matrix. By this Sparse data, data is
easily compressed, and this compression almost always results in significantly less computer data
storage usage.
In my project I have utilized the Support Vector Machine (SVM) for text classification. In this the new
set of input data set is classified into the given category. SVM is not used for clustering the data into
new category, but it classifies data into already present categories.
1|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
UNDERSTANDING SUPPORT VECTOR MACHINE (SVM) FOR
LINEARLY SEPARABLE DATA
Consider each document to be a single dot in the figure. And dot of different color specifies different
category. Here we have documents of two category and we have to find the boundary separating two
documents.
The Margin of a linear classifier is the width by which the length of the boundary can be increased before hitting
the data points of different category. The line is safe to pick having the highest margin between the two datasets. The data points which lie on the margin are known as Support Vectors.
The next step is to find the hyper plane which best separates the two categories. SVM does this by taking a
set of points and separating those points using mathematical formulas. From that we can find the
positive and negative hyper plane. The mathematical formula for finding hyper plane is :
(w · x) + b = +1 (positive labels)
(w · x) + b = -1 (negative labels)
(w · x) + b = 0 (hyperplane)
From the equation above and using linear algebra we can find the values of w and b.Thus, we have the
model that contains the solution for w and b and with margin 2/√(w. w) . The margin is calculated as
follow.
Margin= 2/√(w. w)
In SVM, this model is used to classify new data. With the above solutions and calculated margin value,
new data can be classified into category. The following figure demonstrates the margin and support
vectors for linearly separable data.
2|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
Maximum margin and support vectors for the given data sets are shown in figure.
UNDERSTANDING SUPPORT VECTOR MACHINE (SVM) FOR
NON-LINEARLY SEPARABLE DATA
In the non-linearly separable plane, data are input in an input space that cannot be separated with a
linear hyperplane. To separate the data linearly, we have to map the points to a feature space using a
kernel method. After the data are separated in the feature space we can map the points back to the
input space with a curvy hyper plane. The following figure demonstrates the data flow of SVM.
3|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
In reality, you will find that most of the data sets are not as simple and well behaved. There will be
some points that are on the wrong side of the class, points that are far off from the classes, or points
that are mixed together in a spiral or checkered pattern. Researchers have looked into those problems
and tackled the problem to solve the few points that are in the wrong class, SVM minimized the
following equation to create what is called a soft-margin hyper plane.
The higher value of the C maximizes the margin value whereas the lower value of C lowers the margin value.
TYPES OF KERNEL ::
Computation of various points in the feature space can be very costly because feature space can be
typically said to be infinite-dimensional. The kernel function is used for to reduce these cost. The
reason is that the data points appear in dot product and the kernel function are able to compute the
inner products of these points. So there is no need of mapping the points explicitly in the feature
space. By using the kernel function we can directly compute the data points through inner product and
find equivalent points on the hyper plane.
The kernel functions which are being developed for SVM are still a research topic. No appropriate
kernel has been found out which is universal for all kind of data. Anybody can develop their own kernel
depending upon requirements.
The following are some basic types of kernel :
1.) Polynomial kernel with degree d.
2.) Radial basis function kernel with width s
Closely related to radial basis function of neural networks.
4|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
3.) Sigmoid with parameter k and q
4.) Linear Kernel
K(x,y)= x' * y
STRENGTH OF KERNELS ::
 Kernels are the most tricky and important part of using SVM because it creates the kernel
matrix, which summarize all the data.
 In practice, a low degree polynomial kernel or Radial Basis Function kernel with a
reasonable width is a good initial try for most applications.
 Linear kernel is considered to be the most important choice for text classification because
of the already-high-enough feature dimension.
 There are many ongoing research to estimate the kernel matrix.
SPARSE MATRIX AND SPARSE DATA ::
Sparse Matrix is the matrix containing many values that are 0. Computing many 0 in the matrix is time
consuming and utilizing lots of resources without giving optimized output. So this matrix is compressed
into Sparse Data which contains non-zero values of the Sparse Matrix. It is usually 2- dimensional array
which contains the non-zero value and the position in the original matrix. By this Sparse data, data is
easily compressed, and this compression almost always results in significantly less computer data
storage usage.
In SVM the speed of computation decreases as it contains use of the linear regression and it contains
many values in the training set whose term frequency value is zero. So lots of time is wasted by
computing through these values.
5|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
SVM algorithms speed up tremendously if the data is sparse i.e. it contains many values that are 0. The
reason for that is the Sparse Data compute lots of dot product and they iterate only over non-zero
values. So SVM can use only the sparse data during its computation so that the memory and data
storage are less utilized and so the cost is also reduced.
Storing a sparse matrix
The simple data structure used for a matrix is a two-dimensional array. Each entry in the array
represents an element ai,j of the matrix and can be accessed by the two indices i and j. For an m×n
matrix, enough memory to store at least (m×n) entries to represent the matrix is needed.
Substantial memory requirement reductions can be realized by storing only the non-zero entries. This
can yield huge savings in memory when compared to a simple approach. Different data structure can
be utilized depending on the number and distribution of the non-zero entries.
Formats can be divided into two groups:
 Those supporting efficient modification.
 Those supporting efficient matrix operations.
The efficient modification group includes DOK, LIL, and COO and is typically used to construct
the matrix. After the matrix is constructed, it is typically converted to a format, such as CSR or
CSC, which is more efficient for matrix operations.
Dictionary of keys (DOK)
DOK represents non-zero values as a dictionary mapping (row, column) tuples to values. Good method
for contructing sparse array, but poor for iterating over non-zero values in sorted order.
List of lists (LIL)
LIL stores one list per row, where each entry stores a column index and value. Typically, these entries
are kept sorted by column index for faster lookup.
Coordinate list (COO)
COO stores a list of (row, column, value) tuples. In this the entries are sorted (row index  then
column index  value) to improve random access times.
Yale format
The Yale Sparse Matrix Format stores an initial sparse m×n matrix,
Where M = row in three one-dimensional arrays.
NNZ = number of nonzero entries of M.
6|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
Array A = length= NNZ, and holds all nonzero entries. Order-top bottom right left.
Array IA= length is m + 1. IA(i) contains the index in A of the first nonzero element of row i.
Row i of the original matrix extends from A(IA(i)) to A(IA(i+1)-1), i.e. from the start
of one row to the last index before the start of the next.
Array JA= column index of each element of A, length= NNZ.
Taking the example of the following and then computing various value in matrix to appropriate values.
[1200]
[0390]
[0140]
So computing it we get values as,
A = [ 1 2 3 9 1 4 ] , IA = [ 0 2 4 6 ]
and JA = [ 0 1 1 2 1 2 ].
ADVANTAGES AND DISADVANTAGES OF
SUPPORT VECTOR MACHINE (SVM)
ADVANTAGES:
 In high dimensional spaces Support Vector Machines are very effective.
 When number of dimensions is greater than the number of samples in such cases also
it is found to be very effective.
 Memory Efficient because it uses subset of training points(support vectors) as decisive
factors for classification.
 Versatile: For different decision function we can define different kernel as long as they
provide correct result. Depending upon our requirement we can define our own
kernel.
DISADVANTAGES:
 If the number of features is much greater than the number of samples, the method is
likely to give poor performances. It is useful for small training samples.
 SVMs do not directly provide probability estimates, so these must be calculated using
indirect techniques.
 We can have Non-traditional data like strings and trees as input to SVM instead of
featured vectors.
 Should select appropriate kernel for their project according to requirement
7|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
DESCRIPTION OF THE EXAMPLE ::
As shown in figure, we can see that these points lay on a 1-dimensional plane and cannot be separated
by a linear hyper plane. Following steps are followed
1.) Map into feature space.
2.) Use Polynomial kernel Φ(X1) = (X1, X1^2) to map points on the two dimensional plane.
3.) Compute the positive , negative and zero hyperplane.
4.) We get the support vectors and the margin value from it.
From these value of the margin we can classify the new input data set into different class depending
upon their values.
8|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
SOURCE CODE FOR 1-DIMENSIONAL LINEAR CLASSIFIER OF DATA IN
SVM USING POLYNOMIAL KERNEL
#include<stdio.h>
#include<conio.h>
#include<math.h>
#include<iostream.h>
void main()
{
int data_set[4][2]={{1,0},{-1,1},{-1,2},{1,3}};
int data_set_after_kernel[4][3];
int i,j,k,l;
float d1,d2,D,w11,w12,w1,w21,w22,w2,b1,b2,b;
//to calculate the dataset with the polynomial kernel so getting the new
//data set as
class value(x) value(pow(x,2))
for(i=0 ; i<4; i++)
{
for(j=0;j<3;j++)
{
if(j==2)
data_set_after_kernel[i][j]=data_set[i][j-1]*data_set[i][j-1];
else
data_set_after_kernel[i][j]=data_set[i][j];
}
}
clrscr();
printf("\n");
for(k=0;k<4;k++)
{
for(l=0;l<3;l++)
{
9|Page
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
printf("%d \t",data_set_after_kernel[k][l]);
}
printf("\n");
}
//plot this points on the feature space and now finding the
//hyperplane we will use the equation as (w.x)+b=labels
//here we have labels +1,0,-1.
//w1x1+w2x2+b=+1
//w1x1+w2x2+b=+1
//w1x1+w2x2+b=-1
//compute the value of D to cuompute the value of 3 variable w1,w1 and b.
d1=((data_set_after_kernel[0][1]*data_set_after_kernel[1][2]*1)+(data_set_after_kernel[0][2]*data_s
et_after_kernel[3][1]*1)+(1*data_set_after_kernel[1][1]*data_set_after_kernel[3][2]));
d2=((data_set_after_kernel[0][1]*1*data_set_after_kernel[3][2])+(data_set_after_kernel[0][2]*data_s
et_after_kernel[1][1]*1)+(1*data_set_after_kernel[1][2]*data_set_after_kernel[3][1]));
D=d1-d2;
//calculate the value of variable w1
w11=((data_set_after_kernel[0][2]*1*(1*data_set_after_kernel[1][0]))+(1*data_set_after_kernel[1][2]*(-1*data_set_after_kernel[3][0]))+((1*data_set_after_kernel[0][0])*1*data_set_after_kernel[3][2]));
w12=((data_set_after_kernel[0][2]*1*(1*data_set_after_kernel[3][0]))+(1*data_set_after_kernel[3][2]*(-1*data_set_after_kernel[1][0]))+((1*data_set_after_kernel[0][0])*1*data_set_after_kernel[1][2]));
w1=(w11-w12)/D;
//calculate the value of variable w2
w21=((data_set_after_kernel[0][1]*1*(1*data_set_after_kernel[3][0]))+(1*data_set_after_kernel[3][1]*(-1*data_set_after_kernel[1][0]))+((1*data_set_after_kernel[0][0])*1*data_set_after_kernel[1][1]));
w22=((data_set_after_kernel[0][1]*1*(1*data_set_after_kernel[1][0]))+(1*data_set_after_kernel[1][1]*(-1*data_set_after_kernel[3][0]))+((1*data_set_after_kernel[0][0])*1*data_set_after_kernel[3][1]));
w2=(w21-w22)/D;
//calculate the variable b in the following steps
b1=(data_set_after_kernel[0][1]*data_set_after_kernel[3][2]*(1*data_set_after_kernel[1][0]))+(data_set_after_kernel[0][2]*data_set_after_kernel[1][1]*(1*data_set_after_kernel[3][0]))+(data_set_after_kernel[1][2]*data_set_after_kernel[3][1]*(1*data_set_after_kernel[0][0]));
10 | P a g e
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
b2=(data_set_after_kernel[0][1]*data_set_after_kernel[1][2]*(1*data_set_after_kernel[3][0]))+(data_set_after_kernel[0][2]*data_set_after_kernel[3][1]*(1*data_set_after_kernel[1][0]))+(data_set_after_kernel[1][1]*data_set_after_kernel[3][2]*(1*data_set_after_kernel[0][0]));
b=(b1-b2)/D;
printf("The value of w1 is: %f \n",w1);
printf("The value of w2 is: %f \n",w2);
printf("The value of b is: %f \n",b);
//Points of the positive y==0plane can be calculated as follows::
//w1x1+w2x2+b=+1
float data_set_positive[4][2];
for(int x=0;x<4;x++)
{
for(int y=0;y<2;y++)
{
if(y==0)
data_set_positive[x][y]=data_set_after_kernel[x][1];
else
data_set_positive[x][y]=(1-b-(w1*data_set_after_kernel[x][1]))/w2;
}
}
//Points of the negative plane can be calculated as follows::
//w1x1+w2x2+b=-1
float data_set_negative[4][2];
for(int r=0;r<4;r++)
{
for(int t=0;t<2;t++)
{
if(t==0)
data_set_negative[r][t]=data_set_after_kernel[r][1];
else
data_set_negative[r][t]=(-1-b-(w1*data_set_after_kernel[r][1]))/w2;
}
}
//Points of the zero plane can be calculated as follows::
//w1x1+w2x2+b=0
float data_set_zero[4][2];
11 | P a g e
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
for(int e=0;e<4;e++)
{
for(int f=0;f<2;f++)
{
if(f==0)
data_set_zero[e][f]=data_set_after_kernel[e][1];
else
data_set_zero[e][f]=(-b-(w1*data_set_after_kernel[e][1]))/w2;
}
}
//printing the hyperplane points as follows::
printf("\n");
for(k=0;k<4;k++)
{
for(l=0;l<2;l++)
{
printf("%f \t",data_set_positive[k][l]);
}
printf("\n");
}
printf("\n");
for(k=0;k<4;k++)
{
for(l=0;l<2;l++)
{
printf("%f \t",data_set_negative[k][l]);
}
printf("\n");
}
printf("\n");
for(k=0;k<4;k++)
{
for(l=0;l<2;l++)
{
printf("%f \t",data_set_zero[k][l]);
}
printf("\n");
}
//calculating the margin for these dataset we get the following.
//we will use the following formula for calculating the margin.
12 | P a g e
CS267 TOPICS IN DATABASE SYSTEMS
PROF. T.Y. LIN
SUPPORT VECTOR MACHINE
(SVM)
PARIN SHAH
007332832
//2/SQRT(w.w)
float margin;
margin=2/sqrt((pow(w1,2)+pow(w2,2)));
printf("\n The margin for the given dataset is : %f", margin);
}
REFERENCES ::
1.) http://xanadu.cs.sjsu.edu/~drtylin/classes/cs267/project/tam_ngo/.
2.) http://www.wikipedia.com/.
3.) http://www.support-vector.net/icml-tutorial.pdf/
4.) http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf/
5.) http://en.wikipedia.org/wiki/Support_vector_machine/.
6.) http://en.wikipedia.org/wiki/Sparse_data/.
13 | P a g e
Download