Combination of Polynomial and Radial Basis Function

advertisement
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
Increasing Efficiency of Support Vector Machine using the Novel
Kernel Function: Combination of Polynomial and Radial Basis
Function
1
Hetal Bhavsar, 2Amit Ganatra
1
Assistant Professor, Department of Computer Science and Engineering,
The M. S. University of Baroda, Vadodara, Gujarat, India
2
Dean, Faculty of Technology and Engineering, CHARUSAT, Changa, Gujarat, India
E-mail: 1het_bhavsar@yahoo.co.in, 2amitganatra.ce@ecchanga.ac.in
Abstract - Support Vector Machine (SVM) is one of the
most robust and accurate method amongst all the
supervised machine learning techniques. Still, the
performance of SVM is greatly influenced by the selection
of kernel function. This research analyses the
characteristics of the two well known existing kernel
functions, local Gaussian Radial Basis Function and global
Polynomial kernel function. Based on the analysis a new
kernel function has been proposed which we call as
“Radial Basis Polynomial Kernel (RBPK)”. The RBPK
improves the learning as well as generalization capability
of SVM. The performance of the proposed kernel function
is illustrated on several datasets in comparison with single
existing kernels. The result on different datasets from
various domains has shown better learning and prediction
ability of Support Vector Machine for the RBPK.
Index Terms - Support vector machine, kernel function,
sequential minimal optimization, feature space, polynomial
kernel, Radial Basis function
I. INTRODUCTION
Support Vector Machine (SVM) is a supervised machine
learning method, based on the statistical learning theory
and VC dimension concept. It is based on structural risk
minimization principle which minimizes an upper bound
on the expected risk, as opposed to empirical risk
minimization principle that minimizes the error on the
training data and exploits a margin-based criterion that
is attractive for many classification applications like
Handwritten digit recognition, Object recognition,
Speaker Identification, Face detection in images, text
categorization, Image classification, Biosequence
analysis [4], [21], [22], [23]. It uses Generalization and
regularization theory, which gives the principle way to
choose a hypothesis [5], [7]. Training a Support Vector
Machine comprises of solving large quadratic
programming (QP) problems, which requires O(m2)
space complexity and O(m3) time complexity, where m
is the number of training samples [4], [10]. To solve
these issues many algorithms and implementation
techniques have been developed to train SVM for
massive datasets. The proposed research uses Sequential
Minimal Optimization (SMO), a special case of
decomposition method where in each sub problem; two
coefficients are optimized per iteration. SMO maintains
kernel matrix of size equal to total number of samples in
the dataset, which allows it to handle very large training
sets [17].
In the real world, not all the datasets can be linearly
separable. Kernel functions, the key technology to SVM,
are used to map data from input space to higher
dimensional feature space, which makes classification
problem linear in that feature space [4]. Cover’s theorem
guarantees that any dataset becomes arbitrarily separable
as the data dimension grows [3].
The QP problem for training an SVM with kernel
function is
l
W(α) = ∑ αi –
i=1
Subject to:
∑li=1 αi yi
l
l
1
∑ ∑ αi αj yi yj K(x⃗i , x⃗ j )
2
i=1 j=1
= 0 and C ≥ αi ≥ 0
where C is the regularization parameter and K(x⃗ i , x⃗j ) is
the kernel function, both supplied by the user; and the
variables αi are Lagrange multipliers.
Linear kernel, Polynomial kernel, Radial Basis Function
(RBF) and Sigmoid kernel are common and well known
prime kernel functions. The feature space of every
kernel is different, so representation in new feature
space is different. The selection of kernel functions will
have a direct impact on the performance of SVM. The
values of parameters of kernel function’s (like d in
polynomial kernel, σ in RBF function and p in sigmoid
kernel) and regularization parameter C has a great
impact on complexity and generalization error of the
classifier. Choosing the optimal values of these
parameters is also very important along with the
selection of kernel function [18], [20].
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
17
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
Many researchers have already worked to propose a
solution in this era.The clinical kernel function which
takes into account the type and range of each variable is
proposed in [8]. This requires the specification of each
type of variable, as well as the minimal and maximal
possible value for continuous and ordinal variables
based on the training data or on a priori knowledge. A
new method of modifying a kernel to improve the
performance of a SVM classifier which is based on
information-geometric consideration of the structure of
the Riemannian geometry induced by the kernel is
proposed in [1]. A new kernel by using convex
combination of good characteristics of polynomial and
RBF kernels is proposed in [19]. To guarantee that the
mixed kernel is admissible, optimal minimal coefficient
has to be determined. The advantages of linear and
Gaussian RBF kernel function are combined to propose
a new kernel function in [20]. This results into the better
capability of generalization and prediction, but the
method they used to choose the best set of parameters
(C, σ, λ) is time consuming, requiring O(N3) time
complexity where N is the number of training samples.
The compound kernel taking polynomial kernel, the
RBF and the Fourier kernel is given in [2]. The Minimax
probability machine (MPM) whose performance
depends on its kernel function is evaluated by replacing
Euclidean distance in the Gaussian kernel with a more
generalized Minkovsky’s distance, which result into
better prediction accuracy than the Euclidean distance
[15]. [9] proposed dynamic SVM by distributing kernel
function showing that recognition question of a target
feature is determined by a few samples in the local space
taking it as the centre and the influence of other samples
can be neglected. [24] showed that there may be the risk
of losing information while multiple kernel learning
methods try to average out the kernel matrices in one
way or another. In order to avoid learning any weight
and suit for more kernels, the new kernel matrix is
proposed which composed of the original, different
kernel matrices, by constructing a larger matrix in which
the original ones are still present. The compositional
kernel matrix is s times larger than the base kernels. A
new mechanism to optimize the parameters of combined
kernel function by using large margin learning theory
and a genetic algorithm, which aims to search the
optimal parameters for the combined kernel function is
proposed in [13]. However, the training speed is slow
when the dataset becomes large. The influence of the
model parameters of the SVMs using RBF and the
scaling kernel function on the performance of SVM are
studied by simulation in [12]. The penalty factor is
mainly used to control the complexity of the model and
the kernel parameter mainly influences the
generalization of SVM. They showed that when the two
types of parameters function jointly, the optimum in the
parameter space can be obtained.
However, the choice of the SVM kernel function is still
a relatively complex and difficult issue. This research
analyzed the key characteristics of two very well known
kernel functions: RBF kernel and Polynomial kernel,
and proposed a new kernel function combining the
advantages of two, which has better learning and better
prediction ability.
II. SMO
Decomposition techniques speed up the SVM training
by dividing the original QP problem into smaller pieces,
thereby reducing the size of each QP problem. Chunking
algorithm, Osuna’s decomposition algorithm are well
known decomposition algorithms [17]. Since these
techniques require many passes over the data set, they
need a longer training time to reach a reasonable level of
convergence.
SMO is a special case of decomposition method where
in each sub problem two coefficients are optimized per
iteration which is solved analytically. The advantages of
SMO are: It is simple, easy to implement, generally
faster, and has better scaling properties for difficult
SVM problems than the standard SVM training
algorithm [14], [17]. It maintains kernel matrix of size
which equal to total number of samples in dataset and
thus scales between linear and cubic in the sample set
size.
To find an optimal point of (1), SMO algorithm uses the
Karush-Kuhn-Tucker (KKT) conditions. The KKT
conditions are necessary and sufficient conditions for an
optimal point of a positive definite QP problem. The QP
problem is solved when, for all i, the following KKT
conditions are satisfied:
1
α i = 0 ⟺ yi u i ≥ 1
0 < α i < C ⟺ yi ui =
(2)
α i = C ⟺ yi ui ≤ 1
Where ui is the output of the SVM for ith training
sample. The KKT conditions can be evaluated on one
example at a time, which forms the basis for SMO
algorithm. When it is satisfied by every multiplier, the
algorithm terminates. The KKT conditions are verified
to within ε, which typically range from 10−2 to10−3 .
III. KERNEL FUNCTIONS
Kernels are used in Support Vector Machines to map the
nonlinear inseparable data into a higher dimensional
feature space where the computational power of the
linear learning machine is increased [10], [16]. Using
the kernel function, the optimization classification
function in the high dimension can be given as:
f(x) = sgn (∑Ns
⃗ i , x⃗ ) + b
i=1 αi yi K(x
(3)
Here, K(x⃗ i , x⃗j ) is called a kernel function in SVM. It
measures the similarity or distance between the two
vectors. A kernel function K: χ × χ → R in κ is valid if
there is some feature mapping Φ, such that
K(x⃗ i , x⃗ j ) =
Φ(x⃗ i ). Φ(x⃗ j )
(4)
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
18
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
Thus, we can calculate the dot product of
(Φ(x⃗i ), Φ(x⃗ j )) without explicitly applying function Φ to
input vector. The kernel function can transform the dot
product operations in high dimension space into the
kernel function operations in input space as long as it
satisfies the Mercer condition [5], [11], [18]; thereby it
avoids the problem of computing directly in high
dimension space and solves the dimension adversity.
The performance of SVM largely depends on the kernel
function. Every kernel function has its own advantages
and disadvantages. Various possibilities of kernels exist
and it is difficult to explain their individual
characteristics. A single kernel function may not have a
good learning as well as generalization capability. As a
solution, the good characteristics of two or more kernels
should be combined.
Mainly there are two types of Kernel functions of
support vector machine: local kernel function and global
kernel function. In global kernel function samples far
away from each other has impact on the value of kernel
function; however in local kernel function only samples
closed to each other has impact on the value of kernel
function. Polynomial kernel is an example of global
kernel function and RBF kernel is an example of local
kernel function.
Fig.1. A local RBF kernel function with different value
of σ
B.
Polynomial kernel
The polynomial kernel function is defined as
d
k(x⃗i , x⃗ j ) = (x⃗i ∙ x⃗j + 1)
(6)
Where d is the degree of the kernel. In Fig. 2 the global
effect of the Polynomial kernel of various degrees is
shown over the data space [-1, 1] with test input 0.2,
which shows that every data point from the data space
has an influence on the kernel value of the test point,
irrespective of its actual distance from test point.
A. RBF kernel
The RBF kernel function is the most widely used kernel
function because of its good learning ability among all
the single kernel functions.
k(x⃗i , x⃗j ) = e−‖x⃗i−x⃗j‖
2
where, σ = mean ‖x⃗ i − x⃗j ‖
2
⁄2σ2
(5)
2
The RBF can be well adapted under many conditions,
low-dimension, high-dimension, small sample, large
sample, etc. RBF has the advantage of having fewer
parameters. A large number of numerical experiments
proved that the learning ability of the RBF is inversely
proportional to the parameter σ .σ determines the area of
influence over the data space. Fig. 1 shows the local
effect of RBF kernel for a chosen test input 0.2 over the
data space [-1, 1], for different values of the width σ. A
larger value of σ will give a smoother decision surface
and more regular decision boundary. This is because an
RBF with large σ allow a support vector to have a strong
influence over a large area. If σ is very small, we can see
in fig. 1 that only samples whose distances are close to σ
can be affected. Since, it affect on the data points in the
neighbourhood of the test point, it can be call as local
kernel..
Fig. 2. A global polynomial kernel function with
different values of d
The polynomial kernel is a global kernel function with
good generalization ability, which can affect the value of
global kernel, yet without the strong learning ability like
local kernel function RBF.
IV. PROPOSED KERNEL FUNCTION
For a SVM classifier, choosing a specific kernel
function means choosing a way of mapping to project
the input space into a feature space. A learning model,
which is judged by its learning ability and prediction
ability, was built up by choosing a specific kernel
function. Thus, to build up a model which has good
learning as well as good prediction ability, this research
has combined the advantages of both, local RBF kernel
function and global Polynomial kernel function.
A novel kernel function called Radial Basis Polynomial
Kernel (RBPK) is now defined as:
d
(xi . xj + c)
k(xi , xj ) = exp (
)
σ2
(7)
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
19
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
Where c>0 and d>0.
This RBPK kernel function takes advantage of good
prediction ability from polynomial kernel and good
learning ability from RBF kernel function.
The Mercer’s theorem provides the necessary and
sufficient condition for a valid kernel function. It says
that a kernel function is a permissible kernel if the
corresponding kernel matrix is symmetric and positive
semi-definite [7], [11], [18]. Since, the RBPK kernel
satisfies the Mercer’s theorem it is a permissible kernel.
V. EXPERIMENT
A. Dataset Description
In order to validate the classification ability of SVM
using RBPK kernel, this research conducted several
experiments on standard scaled datasets. Along with
this, the comparison has also been done with other
existing kernel functions like Linear, Polynomial, and
RBF. The datasets considered for the simulation are iris,
heart, glass, a1a (adult) and letter dataset from LIBSVM
[6], usps dataset1, dna dataset2 and web8 dataset3.
1
http://wwwstat.stanford.edu/~tibs/ElemStatLearn/data.h
tml.
Classified Instances (CCI), number of Incorrectly
Classified Instances (ICI), number of support vectors
(SV), precision, True Positive Rate (TPR) and False
Positive Rate (FPR) are also used to measure the
efficiency of RBPK kernel compared to other kernel
functions. The Relative Operating Characteristic (ROC)
curve has been shown to visually depict the performance
of the classification model.
Depending on kernel type, the different kernel
parameters have to be set. Regularization parameter C,
which controls the trade-off between maximizing the
margin and minimizing the training error term, is set to 1
for all experiments. As for linear kernel no parameter is
needed to be set. The polynomial kernel is executed for
degree values 1, 3 and 5, gamma values for RBF is set to
0.01, 0.05, 0.08, 0.1and same values of degree and
gamma are used in RBPK kernel. Result table for
different datasets shows only those values of parameters
for which maximum accuracy is obtained for that kernel
after simulation.
The simulation results have been taken by running SMO
algorithm using LIBSVM framework in eclipse on Intel
core i5-2430M CPU@ 2.4GHz with 4GB of RAM
Machine.
VI. EVALUATION
2
https://www.sgi.com/tech/mlc/db/
3
http://users.cecs.anu.u.au/~xzhang/data/
A.
Results with cross validation Method
The details of each dataset are shown in Table 1.The
datasets are from multiple fields, varied in terms of
number of instances, number of attributes, number of
classes and all are of multivariate type.
Result obtained after running Iris, Heart and Glass
datasets with different kernel functions and parameter
tuning are shown in Table 2, Table 3 and Table 4
respectively.
Table 1: Dataset Description
Table 2: Result for IRIS Dataset
Datas
et
Iris
Heart
Glass
No.
of
classe
s
3
2
7
No. of
Feature
s
4
13
10
No. of
Training
instance
s
150
270
214
No. of
Testing
Instanc
es
a1a
Dna
Letter
Usps
web8
2
3
26
10
2
123
180
16
256
300
1605
2000
15000
2007
45546
30956
1186
5000
7291
13699
B.
Experiment setup and Preliminaries
Method
Used for
Validatio
n
Cross
Validatio
n
Kernel
Function
Linear
Polynomi
al
RBF
RBPK
Holdout
As for experiments’ setup, all tests were accomplished
as follows: To evaluate the classification accuracy of
RBPK kernel function compared to other kernel
function, two test methods: cross-validation and hold out
method (Han et al., 2006), are used for different
datasets. The cross-validation method has been used for
the Iris, Heart and Glass data sets with k=10 folds and
holdout method is used for a1a, Dna, Letter, Usps and
Web8 datasets with separate training and testing datasets
available. The other measures like number of Correctly
Parameter
s
Accura
cy (%)
97.33
Tr_Ti
me
(Sec)
0.02
CC
I
146
IC
I
4
d=3
ϒ=0.08
d=5,
ϒ=0.08
73.33
96.66
0.026
0.027
110
145
40
5
S
V
40
11
3
80
98
0.022
147
3
32
ICI
SV
43
91
15
8
11
4
10
8
Table 3: Result for Heart Dataset
Kernel
Function
Accurac
y (%)
Tr_Ti
me
(Sec)
84.07
0.067
d=3
82.22
0.071
ϒ=0.05
d=3,
ϒ=0.01
82.96
0.076
84.44
0.078
Parameter
s
Linear
Polynom
ial
RBF
RBPK
C
CI
22
7
22
2
22
4
22
8
48
46
42
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
20
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
Polynomial
RBF
RBPK
Accuracy
(%)
Tr_Time
(Sec)
CCI
ICI
SV
64.48
0.078
138
76
162
d=3
47.66
0.094
102
112
189
ϒ=0.1
d=5,
ϒ=0.08
58.88
0.093
126
88
177
71.49
0.094
153
61
134
The above result shows that the classification accuracy
of RBPK kernel is highest compared to other existing
kernel functions for all the datasets considered. The
testing time of SVM depends on the number of SV. The
RBPK kernel results into less number of SV, which
reduces overall complexity of the model. Table 4 shows
that for the glass dataset, there is a drastic increase in
accuracy of around 7%. Since, the datasets are small;
training time of SVM with all the kernels is almost
similar.
A.
B. Results with Hold out Method
Experimental results for a1a dataset are shown in Table
5 and Fig. 3. RBPK kernel (with d=1, ϒ=0.05) gives
highest accuracy compared with other existing kernel
functions. The increase in accuracy is only 0.17%, but it
leads to 55 more correct classifications. Polynomial
kernel gives worst performance than any other kernel
function. Linear kernel and RBF kernel give nearly
similar performance but the number of support vector is
less in linear kernel, which shows the testing time of
linear kernel is less compared to RBF kernel.
Table 5: Result for a1a Dataset
Statistics
No. of Instances
CCI vs. ICI
RBF
ϒ=0.05
84.23
0.246
4.144
26073
4883
691
0.834
0.842
0.359
CCI
RBPK
d=1,
ϒ=0.05
84.4
0.265
4.165
26128
4828
650
0.838
0.844
0.328
ROC for a1a dataset
ICI
0.85
0.84
0.83
0.82
0.81
0
0.5
FPR
Kernel Function
(a)
Table 6: Result for DNA Dataset
Statistics
Kernel Function
Linear
Polynomial
Parameters
Accuracy(%)
Tr_Time (s)
Ts_Time (s)
CCI
ICI
SV
Precision
TPR
FPR
1500
93.08
0.749
0.312
1104
82
396
0.94
0.94
0.048
CCI vs. ICI
RBF
ϒ=0.01
94.86
1.374
0.811
1125
61
1026
0.958
0.959
0.027
d=3
50.84
2.575
1.263
603
583
1734
0.259
0.51
0.51
CCI
RBPK
d=3,
ϒ=0.01
95.36
2.06
1.03
1131
55
1274
0.963
0.963
0.026
ROC for DNA dataset
ICI
1.5
1000
500
1
0.5
0
0
d=1
82.13
0.203
3.494
25423
30956
790
0.822
0.821
0.518
TPR
30000
25000
20000
15000
10000
5000
0
83.82
0.244
2.568
25947
5009
588
0.833
0.838
0.32
Table 6 and Fig. 4 show the result for DNA dataset after
tuning parameters for different kernel functions. It
shows that using RBPK kernel around 0.5% of accuracy
is increased with the highest TPR and lowest value of
FPR. Compared to RBF kernel it takes almost similar
testing time with having around 200 more number of
SVs.
0
Kernel Function
Linear
Polynomial
Parameters
Accuracy(%)
Tr_Time (s)
Ts_Time (s)
CCI
ICI
SV
Precision
TPR
FPR
Though number of SVs with RBPK is less compared to
polynomial and RBF kernel, it takes more testing time.
Higher value of Precision and TPR show the better
efficiency of RBPK kernel.
TPR
Para
meters
No. of Instances
Table 4: Result for Glass Dataset
Kernel
Function
Linear
1
(b)
Fig. 3. For a1a dataset (a) a comparison of CCI vs. ICI
and (b) ROC curve
0.5
FPR
Kernel Function
(a)
1
(b)
Fig. 4. For dna dataset (a) a comparison of CCI vs. ICI
and (b) ROC curve
Table 7 and Fig. 5 show the result for Letter dataset
after tuning parameters for different kernel functions.
Table 7: Results for Letter Dataset
Statistics
Parameters
Accuracy
(%)
Tr_Time
(s)
Ts_Time
(s)
CCI
ICI
SV
Precision
TPR
FPR
Kernel Function
Linear
Polynomial
d=3
RBF
ϒ=0.1
RBPK
d=5, ϒ=0.1
84.3
37.58
84.6
95.88
7.064
38.454
12.391
8.03
5.07
4215
785
8770
0.872
0.87
0.002
9.918
1879
3121
14462
0.598
0.387
0.023
9.259
4230
770
10882
0.882
0.872
0.002
5.696
4794
206
5382
0.988
0.987
0.0005
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
21
CCI
ultimately in testing time. Polynomial kernel gives
accuracy of 96.99%, but its FPR is highest, which is
0.97. This is because of imbalance in the distribution of
data. With the RBPK kernel the TPR is highest and FPR
is least, which indicates that it works better for
imbalanced data.
ROC for letter dataset
ICI
1.5
1
0.5
0
0
0.02
FPR
Kernel Function
(a)
0.04
Table 9: Results for Web8 Dataset
Statistics
(b)
Fig. 5. For letter dataset (a) a comparison of CCI vs. ICI
and (b) ROC curve
Compared to the existing kernel, RBPK kernel with
parameters (d=5, ϒ=0.1) drastically increase accuracy of
classification by ~11%. Number of SVs in RBPK is
almost half the number of SVs in RBF, which is the best
existing kernel function, which affects the testing time
of RBPK kernel. The FPR is almost nearly zero and
TPR is highest for RBPK kernel, which is shown in Fig.
5. (b).
Result for Usps dataset is shown in Table 8 and Fig. 6.
The highest accuracy obtained after parameter tuning
with existing kernel is 94.97% with RBF kernel. With
the RBPK kernel it is increased by 0.7%. Though the
numbers of SVs with RBPK kernel are very high
compared to RBF kernel, it takes almost similar testing
time as that of RBF kernel.
Table 8: Results for Usps Dataset
No. of Instances
Parameters
Accuracy(%)
Tr_Time (s)
Ts_Time (s)
CCI
ICI
SV
Precision
TPR
FPR
2500
2000
1500
1000
500
0
CCI vs. ICI
CCI
RBPK
d=1, ϒ=0.05
95.62
15.569
5.42
1919
88
2029
0.948
0.948
0.003
0.95
0.94
0.93
0.92
0.91
0
0.005
0.01
FPR
Kernel Function
(a)
Tr_Time (s)
Ts_Time (s)
CCI
ICI
SV
Precision
TPR
FPR
15000
98.8
27.70
4
2.2
13547
152
1356
0.989
0.989
0.326
CCI vs. ICI
RBPK
d=3
ϒ=0.08
d=3,
ϒ=0.05
96.99
99.21
99.51
18.505
4.82
13288
411
2678
0.941
0.97
0.97
108.159
23.387
13592
107
3515
0.992
0.992
0.234
5306.575
5.79
13632
67
2124
0.995
0.995
0.134
ROC for web8 dataset
CCI
1
ICI
10000
5000
0
0.99
0.98
0.97
0
RBF
ϒ=0.01
94.97
11.32
5.02
1833
174
41
0.942
0.941
0.005
ROC for usps dataset
ICI
Accuracy(
%)
RBF
0.96
Kernel Function
Linear
Polynomial
d=3
93.02
93.77
5.906
17.29
2.847
6.481
1867
1882
140
125
992
2692
0.923
0.926
0.92
0.927
0.008
0.006
TPR
Statistics
Parameters
Kernel Function
Polynomi
Linear
al
TPR
CCI vs. ICI
No. of Instances
2500
2000
1500
1000
500
0
TPR
No. of Instances
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
(b)
Fig. 6. For usps dataset (a) a comparison of CCI vs. ICI
and (b) ROC curve
Kernel Function
(a)
1
2
FPR
(b)
Fig. 7. For web8 dataset (a) a comparison of CCI vs.
ICI and (b) ROC curve
In summarization, the datasets that have been used in the
analysis are of varied instances, features and classes
from different domains. The ranges of training instances
were from 150 to 45000, testing instances from 1100 to
30000, features were from 4 to 300, and classes were
from 2 to 26. To get a better classification effect,
choosing value of parameters are very important. It has
been observed that, as the value of degree increases in
polynomial kernel, the accuracy is decreased.
Polynomial kernel gives better accuracy for degree 3.
RBF kernel works better for value of γ=0.05 and above.
Similarly, for most of the datasets RBPK kernel works
better for degree 3 and γ=0.05 and above. As shown in
the Fig. 8 and Fig. 9, using RBPK kernel, the accuracy
of SVM classifier in correctly classifying instances is
increased by around 0.2% to 11%. From the results of
iris, glass, DNA, letter and Usps dataset it can be
observed that the RBPK kernel function gives better
feature space representation for multiclass classification
as well. Mapping the data into new feature space using
RBPK kernel function mostly reduces the number of
support vectors as shown in Fig. 10 and Fig 11, which
may over all reduce model complexity as well as testing
Result for Web8 dataset is shown in Table 9 and Fig. 7.
The highest accuracy obtained after parameter tuning
with RBPK is 99.51%, followed by RBF kernel with
99.21%. Though the difference in accuracy is 0.3%
only, there is a drastic reduction in number of SVs and
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
22
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
Accuracy
time. The result showed that RBPK obtained much
better TPR and precision and lowest FPR for all datasets
compared to other existing kernel functions. The RBPK
kernel matrix required the storage proportional to
number of training samples.
120
100
80
60
40
20
0
REFERENCES
[1]
Amari, Shun-ichi, and Si Wu. "Improving
support vector machine classifiers by modifying
kernel functions." Neural Networks 12(6): 783789, 1999.
[2]
An-na, Wang, Z. Yue, H. Yun-tao, and Li Y., "A
novel construction of SVM compound kernel
function." International Conference on Logistics
Systems and Intelligent Management, vol. 3, pp.
1462-1465. IEEE, 2010.
[3]
Burbidge, Robert, and B. Buxton, "An
introduction to support vector machines for data
mining." Keynote papers, young OR12 : 3-15,
2001.
[4]
Burges, J. C. Christopher, "A tutorial on support
vector machines for pattern recognition." Data
mining and knowledge discovery vol. 2, no. 2:
121-167, 1998
[5]
Campbell, Colin, and Y. Ying, "Learning with
support vector machines."Synthesis Lectures on
Artificial Intelligence and Machine Learning vol.
5, no. 1: 1-95, 2011.
[6]
Chang, Chih-Chung, and C. J. Lin. "LIBSVM: a
library for support vector machines." ACM
Transactions on Intelligent Systems and
Technology (TIST) vol. 2, no. 3: 27, 2011.
[7]
Cortes, Corinna, and V. Vapnik. "Support-vector
networks." Machine learning vol. 20, no. 3: 273297, 1995.
[8]
Daemen, Anneleen, and B. De Moor.
"Development of a kernel function for clinical
data." In Engineering in Medicine and Biology
Society, 2009. EMBC 2009. Annual International
Conference of the IEEE, pp. 5913-5917. IEEE,
2009.
[9]
Guangzhi, Shi, D. Lianglong, H. Junchuan, and
Z. Yanxia. "Dynamic support vector machine by
distributing kernel function." In Advanced
Computer Control (ICACC), 2010 2nd
International Conference on, vol. 2, pp. 362-365.
IEEE, 2010.
[10]
Han, Jiawei, and M. Kambe,” Data Mining,
Southeast Asia Edition: Concepts and
Techniques”. Morgan kaufmann, 2006.
[11]
Herbrich, Ralf. "Learning Kernel classifiers:
theory and algorithms (adaptive computation and
machine learning)." MIT press, 2001.
[12]
Jin, Yan, J. Huang, and J.Zhang, "Study on
influences of model parameters on the
performance of SVM." In International
Linear
Polynomial
RBF
RBPK
Iris
Heart Glass
Fig. 8. Overall performance of different kernel function
with cross-validation method
Accuracy
be higher as compared to using single existing kernel
function. It can be applied to different kind of fields, and
is not sensitive to domain data.
120
100
80
60
40
20
0
Linear
Polynomial
RBF
RBPK
Fig. 9. Overall performance of different kernel function
with holdout method
200
150
Linear
100
Polynomial
RBF
50
RBPK
0
Iris
Heart
Glass
Fig. 10. Effect on number of Support Vectors with
different kernel function with cross-validation method
No. of SVs
20000
15000
10000
5000
0
Linear
Polynomial
RBF
RBPK
Fig. 11. Effect on number of Support Vectors with
different kernel function with holdout method
VII. CONCLUSION
The proposed RBPK kernel function combines
advantages of two very well known kernel functions,
RBF and Polynomial kernel. By choosing appropriate
kernel parameters, it results into better generalization,
learning and predicting capability. Better predicting
results can be obtained for binary as well as multiclass
datasets. By using RBPK function, the accuracy rate will
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
23
International Journal on Advanced Computer Theory and Engineering (IJACTE)
_______________________________________________________________________________________________
Conference
on Electrical
and
Control
Engineering (ICECE), pp. 3667-3670. IEEE,
2011.
[13]
[14]
[15]
[16]
[17]
Lu, Mingzhu, C. P. Chen, J. Huo, and X. Wang,
"Optimization of combined kernel function for
svm based on large margin learning theory." In
IEEE International Conference on Systems, Man
and Cybernetics, 2008. SMC 2008., pp. 353-358.
IEEE, 2008.
G. Mak, "The implementation of support vector
machines using the sequential minimal
optimization algorithm." PhD diss., McGill
University, 2000.
Mu, Xiangyang, and Y. Zhou. "A Novel
Gaussian Kernel Function for Minimax
Probability Machine." In Intelligent Systems,
2009. GCIS'09. WRI Global Congress on, vol. 3,
pp. 491-494. IEEE, 2009.
Muller, K., S. Mika, G. Ratsch, Koji Tsuda, and
Bernhard Scholkopf. "An introduction to kernelbased learning algorithms." IEEE Transactions
on Neural Networks, 12, no. 2: 181-201, 2001.
J. C. Platt, "Sequential Minimal Optimization: A
Fast Algorithm for Training Support Vector
Machines", Microsoft Research 1998
[19]
Smits, F. Guido and E. M. Jordaan. "Improved
SVM regression using mixtures of kernels."
In Neural
Networks,
2002.
IJCNN'02.
Proceedings of the 2002 International Joint
Conference on, vol. 3, pp. 2785-2790. IEEE,
2002.
[20]
Song, Huazhu, Zichun Ding, C. Guo, Z. Li, and
H. Xia. "Research on combination kernel
function
of
support
vector
machine."
In International Conference on Computer Science
and Software Engineering, 2008, vol. 1, pp. 838841. IEEE, 2008.
[21]
Vapnik, Vladimir N. "An overview of statistical
learning theory."Neural
Networks, IEEE
Transactions on 10, no. 5: 988-999,1999.
[22]
Van Luxburg, Ulrike, and B. Schölkopf.
"Statistical learning theory: Models, concepts,
and results." arXiv preprint arXiv:0810.4752,
2008.
[23]
Yu, Hwanjo, and S. Kim. "SVM Tutorial—
Classification, Regression and Ranking."
In Handbook of Natural Computing, pp. 479-506.
Springer Berlin Heidelberg, 2012.
[24]
Zhang, Rui, and X. Duan. "A new compositional
kernel method for multiple kernels." In Computer
Design and Applications (ICCDA), 2010
International Conference on, vol. 1, pp. V1-27.
IEEE, 2010.
[18] Schölkopf, Bernhard, and A. J. Smola, “Learning
with kernels: support vector machines,
regularization, optimization, and beyond”. MIT
press, 2002.

_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014
24
Download