Research Journal of Applied Sciences, Engineering and Technology 4(14): 2067-2071,... ISSN: 2040-7467

advertisement
Research Journal of Applied Sciences, Engineering and Technology 4(14): 2067-2071, 2012
ISSN: 2040-7467
© Maxwell Scientific Organization, 2012
Submitted: December 27, 2011
Accepted: January 16, 2012
Published: July 15, 2012
A Direct-Construction Based Fuzzy Support Vector Classifier
1, 2
Yong Zhang, 3Jianying Wang, 1Min Ji and 1Dan Huang
College of Computer and Information Technology, Liaoning Normal University,
Dalian 116081 China
2
College of Computer Science and Engineering, Dalian University of Technology,
Dalian 116024, China
3
College of History Culture and Tourism, Liaoning Normal University,
Dalian 116081, China
1
Abstract: This study presents a novel direct-construction based fuzzy multiclass support vector classifier based
on previous multi-class classification method by Crammer and Singer (2001). In our proposed method, the
membership degree is computed by fuzzy c-means clustering, the optimal problem and its constraints of multiclass classification are reconstructed and its corresponding Lagrangian formula is re-deduced. Experimental
comparison with the previous study indicates that our method can obtain better classification ratio.
Key words: Classifier, fuzzy c-means, multi-class classification, support vector machine
INTRODUCTION
Support Vector Machine (SVM) was first introduced
by Vapnik and his colleagues (Vapnik, 1995). It is an
approximate implementation to the Structure Risk
Minimization (SRM) principle in statistical-learning
theory, rather than the Empirical Risk Minimization
(ERM) method. This SRM principle is based on the fact
that the generalization error is bounded by the sum of the
empirical error and a confidence interval term depending
on the VC dimension. In general, SVMs seek to minimize
an upper bound of the generalization error rather than to
minimize the training errors. This makes SVMs have
good generalization abilities and hence provides a
mechanism for avoiding overfitting of data. SVM was
initially designed to solve pattern recognition problems
(Chiu and Chen, 2009; Min and Cheng, 2009). Recently,
with the introduction of Vapnik ,-insensitive loss
function, SVM has been extended to function
approximation and regression estimation problems (Wu,
2009, 2010).
SVM initially solved 2-class classification problem.
However, when SVMs are extended to multi-class
problems, unclassifiable regions exist. Nowadays there
are some methods to solve this problem. One is to
construct several 2-class classifiers and then to assemble
a multi-class classifier, such as one-against-one, oneagainst-all and DAGSVMs (Platt et al., 2002; Hsu and
Lin, 2002). The second method is to construct multi-class
classifier directly, such as k-SVM proposed by Weston
and Watkins (1998). Zhang et al. (2007) also proposed a
fuzzy compensation multi classification SVM based on
Weston and Walkins’ idea. Crammer and Singer (2001)
implemented a multiclass kernel-based classification
vector machine algorithm which adopted a direct method
for training multiclass predictors.
In many real applications, the observed input data
cannot be measured precisely and usually described in
linguistic levels or ambiguous metrics. Moreover, the
obtained training data is often contaminated by noises.
The standard SVM is very sensitive to outliers and noises.
Some techniques have been found to tackle this problem
for SVM in the literature of data classification. One of the
methods is to use averaging algorithm to relax the effect
of outliers in (Song et al., 2002; Hu and Song, 2004).
However, in this kind of method the additional parameter
is difficult to be tuned. The second method, proposed in
(Inoue and Abe, 2001; Lin and Wang, 2002; Lin and
Wang, 2003) and named as a Fuzzy SVM (FSVM), is to
apply fuzzy membership to the training data to relax the
effect of the outliers. Ji et al. (2010) propose support
vector machine for classification based on fuzzy training
data. The proposed method introduces the support vector
machine which the training examples are fuzzy input and
gives some solving procedure of the support vector
machine with fuzzy training data.
In this study, we obtain the fuzzy membership degree
by using fuzzy c-means clustering algorithm. Fuzzy cmeans Clustering Method (FCM) (Dunn, 1974; Bezdek,
1980) is the most common clustering method based on the
minimization of the distance-based objective function.
Aiming to multiclass classification and noise data
Corresponding Author: Yong Zhang, College of Computer and Information Technology, Liaoning Normal University, Dalian
116081 China
2067
Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012
problems, Zhang et al. (2007) also proposed a fuzzy
compensation multi classification SVM based on fuzzy
theories. However, this method is inconvenient to
optimization due to many parameters.
This study implements a new direct constructing
SVM based on the method proposed by Crammer and
Singer (2001). Experimental results show that the
proposed method obtain better classification ratio.
LITERATURE REVIEW
Direct constructed multiclass classifier: Multiclass
classification learning is a critical problem of machine
learning. We first give a description of multiclass
classification. Let T = {(x1, y1), (x2, y2), …, (xl, yl)}, X ×
Y be a set of l training examples, where xi , X = RN, yi ,
Y ={1, 2, …, k}, i = 1, 2, …, l. The solution to multiclass
problem is to find a decision-making function f(x): X ÷ Y
so that the misclassification ratio is likely to small while
classifying unknown samples. In other words, to solve
multiclass classification problem is to find a rule to divide
a pattern space into k sections.
Crammer and Singer gave a quadratic optimization
problem to multiclass classification in Crammer and
Singer (2001) as follows:
min
M
1
β M
2
idea of partial membership expressed by the degree of
membership of each pattern in a given cluster. Dunn
(1974) presented the first fuzzy clustering methods based
on an adequacy criterion defined by the Euclidean
distance. Bezdek (1980) further generalized this method.
These fuzzy c-means (FCM) algorithms are very popular
and have been applied successfully in many areas such as
taxonomy, image processing, information retrieval, data
mining, etc. (Tan and Matlsa, 2011; Izakian and
Abraham, 2011; Tang et al., 2010).
FCM algorithm starts with an initial guess for the
cluster centers, which intends to mark the mean location
of each cluster. The initial guess for these cluster centers
is most likely incorrect. Additionally, FCM assigns every
data point a membership grade for each cluster. By
iteratively updating the cluster centers and the
membership grades for each data point, FCM iteratively
moves the cluster centers to the "right" location within a
data set. This iteration is based on minimizing an
objective function that represents the distance from any
given data point to a cluster center weighted by that data
point's membership grade. Namely:
c
+
∑ξi
(1)
i =1
c
i =1
where, $ > 0 is a regularization constant, >i $ 0 are the
slack variables, M is a k×l matrix, M 22 is the l2-norm
defined as:
2
2
= ( M1 ,..., M k )
2
2
=
∑ i , j Mij2
Mr is the rth row vector of matrix M. *p,q is equal 1 if p =
q and 0 otherwise.
The corresponding decision-making function is as
follows:
H M ( x ) = arg max
r =1,..., k
{ Mr . x}
)
s. t . ∑ uik = 1, 0 ≤ uik ≤ 1
st . M yi . xi + δ yi , r − M r . xi ≥ 1 − ξi , ∀ i , r: i = 1,..., l , r = 1,..., k
M
l
i = 1k = 1
l
2
2
(
min J (U ,V ) = ∑ ∑ uikmd x k , vi
(2)
(4)
where, c is the number of clusters and selected as a
specified value in this study, l is the number of data
points, uik ,[0,1] denotes the degree to which the sample
xk belongs to the ith cluster, m is the fuzzy parameter
controlling the speed and achievement of clustering, d (xk,
vi) = 2xk, ...,vi22 denotes the distance between point xk and
the cluster center vi and V is the set of cluster centers or
prototypes (vi , Rp).
The objective function J(U,V) is minimized via an
iterative process in which the degrees of membership uij
and the cluster centers vi are updated:
vi =
Therefore, a classification rule can be defined as
follows. For a new sample x, it will be classified to the rth
class so that the inner product (Mr. x) between the rth row
of matrix M and the sample x is maximum.
Fuzzy c-means clustering: Clustering methods seek to
organize a set of items into clusters such that items within
a given cluster have a high degree of similarity, whereas
items belonging to different clusters have a high degree of
dissimilarity.
However, conventional hard clustering methods
restrict each point of the data set to exactly one cluster.
Fuzzy clustering generates a fuzzy partition based on the
(3)
∑ lj =1(uij )m x j
∑ lj =1(uij )m
1
uij =
∑
d ij
c
k =1 (
d ik
2
(5)
(6)
) m− 1
FUZZY DIRECT CONSTRUCTING
MULTICLASS CLASSIFIER
Inspired by some works, this study does a fuzzy
process to direct multiclass classification based on
2068
Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012
Crammer and Singer (2001). The proposed algorithm
introduces fuzzy membership degree and assigns a weight
to each sample according to its importance, thereby
reducing the effect of noise and outlier to classification
and enhancing SVM classification performance.
Introducing fuzzy theories, the quadratic optimization
problem of Eq. (1) is as follows:
min
M
l
1
β M
2
2
2
∑ tiξi
+
(7)
i=t
subject to:
M yi . xi + δ yi , r − Mr . xi ≥ 1 − ξi , ∀ i , r:i = 1,..., l , r = 1,..., k
According to Eq. (8), we obtain:
(
)
⎡ l
⎤
M r = β −1 ⎢ ∑ ti δ yi ,r − α i ,r xi ⎥
⎣ i =1
⎦
From Eq. (13), we can find each row vector Mr of
matrix M is linear combinations of the samples x1,..., xl
and the coefficient of a sample xi to row vector Mr is ti
* yi, r – "i, r. We claim a sample xi is a support vector if
there is a row Mr for which this coefficient is not zero.
Obviously, for each row vector Mr of the matrix Mr we
can divide support vectors into two subsets as follows:
(8)
⎡
⎤
Mr = β −1 ⎢
ti − αi , r xi +
− αi ,r xi ⎥
⎢
⎥
i: yi ≠ r
⎣ i: yi = r
⎦
∑(
where, $ > 0 is a regularization constant, >i $ 0 are the
slack variables, M is a k×l matrix M 22 , is the l2-norm
)
∑(
)
defined as:
In above equation, the first section
M
2
2
2
2
= M1 ,..., M k
(13)
= ∑ ij M ij2 , ti
(14)
∑(ti − αi , r )xi
i: yi = r
is
over all vectors that belong to the rth class. The second
ti is the fuzzy membership degree defined as fuzzy cmeans clustering. Mr is the rth row vector of matrix
M.*p, q is equal to 1 if p = q and 0 otherwise.
Note that inequation (8) has no constraints on slack
variables >i. For r = yi, inequation (8) is Mr. xi + *r. r –Mr.
xi $ 1- >i. Obviously, *r, r = 1. So we can get >i $ 0. In
other words, the constraints on slack variables >i are hid
in inequation (8).
Using Lagrangian multiplier method, the dual
formulation of Eq. (7) is as follows:
∑(− αi , r )xi
section
is over the rest of the vectors whose
i: yi ≠ r
labels are different from r.
Substituting Eq. (11) to Eq. (9), we obtain:
1
β
2
Q(α ) =
k
2
r =1
l
−
l
∑ Mr 2 + ∑ tiξi + ∑αi , r xi . Mr − ∑αi , r xi . M yi
i =1
i ,r
i, r
k
∑ξi ∑αi , r + ∑αi , r (1 − δi ,r )
i=l
r =1 4
123
(15)
i, r
= ti
1
L( M , ξ , α ) = β
2
∑
r =1
k
∑
l
2
Mr 2
+
1
= β
Mr
2 r =1
∑ tiξi
i =1
∑ ∑ ai , r [ Mr . xi − M yi . xi − δ yi , r + 1 − ξi ]
l
+
k
k
(10)
1
β
2
Differentiating in >i, Mr according to Eq. (9) and
setting their derivative values are zero, respectively, we
obtain:
∂L
= ti −
∂ξi
k
r =1
r =1
∑αi , r = 0 ⇒ ∑αi , r = ti
∂L
= βM r +
∂M r
⎛
⎞
⎜
αi , r xi −
αi ,q ⎟ xi
⎜
⎟
⎠
i =1
i , yi = r ⎝ q
1424
3
l
∑
∑ ∑
l
l
i =1
i =1
= ti
= βM r +
∑αi , r xi . Mr − ∑αi , r xi . M yi + ∑αi , r (1 − δ i ,r )
i, r
i, r
i, r
We substitute Mr in the above equation using Eq. (13)
and get:
subject to:
k
+
(9)
i =1 r =1
αi , r ≥ 0, ∀ i = 1,..., l , r = 1,..., k
2
2
k
∑ Mr 2 = 2 β ∑ Mr . Mr
2
1
=
1
β
2
r =1
r
=
∑[β −1∑ xi (tiδ yi, r − αi , r )][β −1∑ x j (t jδ yi , r − α j , r )
r
1 −1
β
2
i
(16)
j
∑ xi . x j ∑(tiδ yi , r − αi, r )(tiδ yj , r − α j, y )
i, j
r
(11)
∑αi , r xi . Mr = ∑αi , r xi . β −1∑ x j (t jδ yj, r − α j, y )
i, r
(12)
i, r
= β −1
j
∑ xi . x j ∑αi , r (t jδ yj, r − α j, y )
i, j
∑αi , r xi − ∑ tiδ yi , r xi = 0
2069
r
(17)
Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012
∑αi, r xi . M yi = ∑αi , r xi .β −1∑ x j (t jδ yj , yi − α j, yi )
i, r
i, y
= β −1
∑
j
xi . x j
∑
i, j
= β −1
(
αi , r t jδ yj ,
)
r
∑ xi . x j (t jδ yj, yi − α j , yi )∑αi , r
i, j
= β −1
yi
− α j , yi
Table 1: Data sets
Data set
Iris
Wine
Glass
Vehicle
Vowel
Noise-data
(18)
r
∑ ti (t jδ yj , yi − α j , yi )xi . x j
#training data
150
178
214
846
528
1000
#attributes
4
13
9
18
10
10
#class
3
3
6
4
11
4
i, j
= β −1
∑ xi . x j ∑ tiδ yi, y (t jδ yj, yr − α j , r )
i, j
r
Note that "i is equal to ti in Eq. (18). From Eq. (17)
and Eq. (18), we get:
∑α
i,r
xi . M r −
∑α
i,r
= β
i,r
i,r
−1
∑ x .x ∑ α
i
j
i, j
r
i, r
xi . M
(t
j
)
(
δ yj , r − α j , r − β −1 ∑ xi . x j ∑ t i δ yi , r t j δ yj , r − α j , r
(
r
)(
i, j
)
r
)
(19)
Finally, we substitute equations (16-19) to Eq. (15)
and get object function:
max Q(α ) =
α
∑
1 −1
β
xi . x j
2
i, j
− β −1
∑(tiδ yi , r − αi , r )(tiδ yj , r − α j , r )
r
∑ xi . x j ∑(tiδ yi , r − αi , r )(tiδ yj , r − α j , r ) + ∑αi , r (1 − δi , r )
i, j
= −
r
∑
1 −1
β
xi . x j
2
i, j
i, r
∑ (t δ
i yi , r
)(
) ∑α (1 − δ )
− αi , r tiδ yj , r − α j , r +
r
i, r
(20)
i, r
i, r
Subject to "i,( $0, œi = 1,..., l, r =1,..., k.
In Eq. (20), we usually replace the inner product (xi
,xj) with kernel function K(xi ,xj) = N(xi)T N (xj). Solving
this optimization problem, we get the following
classification rule:
H M ( x ) = arg max{ M r .φ ( x )}
= arg max{φ ( x ).( β − 1
r
= arg max{β − 1
r
CS-SVM
2.67
1.13
28.04
13.24
1.30
22.60
Our proposed method
2.67
1.13
25.23
8.98
0.65
18.80
yi
= β −1 ∑ xi . x j ∑ t i δ yj , r − α i , r t i δ yj , r − α j , r
i, j
Table 2: Experimental results
Date set
1-a-1
k-SVM
Iris
2.67
2.67
Wine
0.56
1.13
Glass
28.51
28.97
Vehicle
13.36
13.00
Vowel
1.08
1.52
Noise-data 24.40
28.70
∑ (tiδ yi , r − αi , r )φ ( xi ))}
i
(21)
∑ (tiδ yi , r − αi , r ) K ( xi , x)}
i
Hsu and Lin (2002) implement the direct multiclass
classification method by Crammer and Singer (2001). On
the basis of Hsu and Lin (2002), we also implement the
fuzzy direct multiclass classification method proposed in
this study.
vowel. In order to validate the applicability of the
proposed algorithm for noise data, we also construct a
data set with noise using a data builder, named as noisedata in Table 1. Their related parameters are listed in
Table 1. In Table 1, #pts denotes the number of data
points, #att denotes the number of the attributes and
#class denotes the number of the classifications.
In this experiment, we compared our proposed
method with CS-SVM proposed by Crammer and Singer
(2001), 1-against-1 SVM and k-SVM (Weston and
Watkins, 1998). In order to fairly evaluate experimental
results, we use RBF kernel function K (xi, xj) = e -(5xi -xj52
and use the 10-fold cross-validation method to estimate
the generalization errors of the classifiers in the
experiment. For parameter optimization, we adopt the grid
search method so as to obtain the best classification
precision of each method in which ( is limited to [24, 23,
…, 2-10] and C is limited to [212, 211, …, 2-2]. Note that C
= $-1 in CS-SVM method and our proposed method.
In the experiments, we use LIBSVM (Chang and Lin,
2010) a s SVM prototype in 1-against-1 and k-SVM and
choose BSVM (http://www.csie.ntu.edu.tw/~cjlin/bsvm)
as SVM prototype in FC-SVM and our proposed method.
The experimental results are summarized in Table 2. Note
that values in Table 2 denote the misclassification rate.
From the experimental results in Table 2, we can see that
our proposed method obtains better classification ratio
than CS-SVM. The accuracy rate in k-SVM and CS-SVM
are lower. The accuracy rate can be improved when the
training data is fuzziness. In addition, the proposed
method also obtains better classification ratio for our
constructed noise-data, which shows the proposed method
is effective to noise data.
EXPERIMENTAL ANALYSIS
CONCLUSION
We evaluated our method by using a collection of
five benchmark data set from the UCI machine learning
repository (http://www.ics.uci.edu/~mlearn/MLR
epository. html), including iris, wine, glass, vehicle and
In this study, a direct constructed fuzzy support
vector machine is proposed, which can be used to deal
with the outlier sensitivity problem in traditional
2070
Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012
multiclass classification problems. In our proposed
method, we reconstruct optimization problem and its
constraints, reconstruct Lagrangian formula and present
the theoretic deduction. The fuzzy membership degree is
computed by fuzzy c-means clustering which generates
different weights for training data and outliers according
to their relative importance in the training set.
Experimental results show that the proposed method can
reduce the effect of outliers and yield higher classification
rate than other existing methods do.
ACKNOWLEDGMENT
This study is supported by China Postdoctoral
Science Foundation (No. 20110491530), Science
Research Plan of Liaoning Education Bureau (No.
L2011186) and Dalian Science and Technology Planning
Project of China (No. 2010J21DW019).
REFERENCES
Bezdek, J.C., 1980. A convergence theorem for the fuzzy
ISODATA clustering algorithms. IEEE Trans.
Pattern Anal. Machine Intell., 2: 1-8.
Chang, C.C. and C.J. Lin, 2010. LIBSVM: A Library for
Support Vector Machines. Retrieved from: available
athttp://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.
pdf.
Chiu, D.Y. and P.J. Chen, 2009. Dynamically exploring
internal mechanism of stock market by fuzzy-based
support vector machines with high dimension input
space and genetic algorithm. Expert Syst. Appl.,
36(2): 1240-1248.
Crammer, K. and Y. Singer, 2001. On the algorithmic
implementation of multiclass kernel-based vector
machines. J. Machine Learning Res., 2: 265-292.
Dunn, J.C., 1974. A fuzzy relative to the ISODATA
process and its use in detecting compact, wellseparated clusters. J. Cyberne, 3: 32-57.
Hsu, C.W. and C.J. Lin, 2002. A comparison of methods
for multi-class support vector machines. IEEE T.
Neural Networ., 13(2): 415-425.
Hu, W.J. and Q. Song, 2004. An accelerated
decomposition algorithm for robust support vector
machines. IEEE Trans. Circuits Syst. II: Express
Briefs, 51(5): 234-240.
Inoue, T. and S. Abe, 2001. Fuzzy support vector
machines for pattern classification. In: Proceedings
of International Joint Conference on Neural
Networks, 2: 1449-1454.
Izakian, H. and A. Abraham, 2011. Fuzzy C-means and
fuzzy swarm for fuzzy clustering problem. Expert
Syst. Appl., 38(3): 1835-1838.
Ji, A.B., J.H. Pang and H.J. Qiu, 2010. Support vector
machine for classification based on fuzzy training
data. Expert Syst. Appl., 37: 3495-3498.
Lin, C.F. and S.D. Wang, 2002. Fuzzy support vector
machines. IEEE T. Neural Networ., 13(2): 464-471.
Lin, C.F. and S.D. Wang, 2003. Training algorithms for
fuzzy support vector machines with noisy data. In:
Proceedings of the IEEE 8th Workshop on Neural
Networks for Signal Processing, pp: 517-526.
Min, R. and H.D. Cheng, 2009. Effective image retrieval
using dominant color descriptor and fuzzy support
vector machine. Pattern Rec., 42(1): 147-157.
Platt, J.C., N. Cristianini and J. Shawe-Taylor, 2002.
Large Margin DAG's for Multiclass Classification.
In: Solla, S.A., T.K. Leen and K.R. M ler, (Eds.),
Advances in Neural Information Proceeding System
12: 547-553.
Song, Q., W.J. Hu and W.F. Xie, 2002. Robust support
vector machine with bullet hole image classification.
IEEE T. Syst. Man Cy., 32(4): 440-448.
Tang, C., S. Wang and W. Xu, 2010. New fuzzy c-means
clustering model based on the data weighted
approach. Data Knowl. Eng., 69(9): 881-900.
Tan, K.S. and N.A. MatIsa, 2011. Color image
segmentation using histogram thresholding-fuzzy cmeans hybrid approach. Pattern Rec., 44(1): 1-15.
Vapnik, V., 1995. The Nature of Statistical Learning
Theory. Springer-Verlag, New York.
Weston, J. and C. Watkins, 1998. Multi-Class Support
Vector Machines. Department of Computer Science,
Royal Holloway University of London Technical
Report, SD2TR298204.
Wu, Q., 2009. The forecasting model based on wavelet
m-support vector machine. Expert Syst. Appl., 36(4):
7604-7610.
Wu, Q., 2010. Regression application based on fuzzy vsupport vector machine in symmetric triangular fuzzy
space. Expert Syst. Appl., 36(4): 2808-2814.
Zhang, Y., Z.X. Chi, X.D. Liu and X.H. Wang, 2007. A
novel fuzzy compensation multi-class support vector
machines. Appl. Intell., 27(1): 21-28.
2071
Download