Research Journal of Applied Sciences, Engineering and Technology 4(14): 2067-2071, 2012 ISSN: 2040-7467 © Maxwell Scientific Organization, 2012 Submitted: December 27, 2011 Accepted: January 16, 2012 Published: July 15, 2012 A Direct-Construction Based Fuzzy Support Vector Classifier 1, 2 Yong Zhang, 3Jianying Wang, 1Min Ji and 1Dan Huang College of Computer and Information Technology, Liaoning Normal University, Dalian 116081 China 2 College of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China 3 College of History Culture and Tourism, Liaoning Normal University, Dalian 116081, China 1 Abstract: This study presents a novel direct-construction based fuzzy multiclass support vector classifier based on previous multi-class classification method by Crammer and Singer (2001). In our proposed method, the membership degree is computed by fuzzy c-means clustering, the optimal problem and its constraints of multiclass classification are reconstructed and its corresponding Lagrangian formula is re-deduced. Experimental comparison with the previous study indicates that our method can obtain better classification ratio. Key words: Classifier, fuzzy c-means, multi-class classification, support vector machine INTRODUCTION Support Vector Machine (SVM) was first introduced by Vapnik and his colleagues (Vapnik, 1995). It is an approximate implementation to the Structure Risk Minimization (SRM) principle in statistical-learning theory, rather than the Empirical Risk Minimization (ERM) method. This SRM principle is based on the fact that the generalization error is bounded by the sum of the empirical error and a confidence interval term depending on the VC dimension. In general, SVMs seek to minimize an upper bound of the generalization error rather than to minimize the training errors. This makes SVMs have good generalization abilities and hence provides a mechanism for avoiding overfitting of data. SVM was initially designed to solve pattern recognition problems (Chiu and Chen, 2009; Min and Cheng, 2009). Recently, with the introduction of Vapnik ,-insensitive loss function, SVM has been extended to function approximation and regression estimation problems (Wu, 2009, 2010). SVM initially solved 2-class classification problem. However, when SVMs are extended to multi-class problems, unclassifiable regions exist. Nowadays there are some methods to solve this problem. One is to construct several 2-class classifiers and then to assemble a multi-class classifier, such as one-against-one, oneagainst-all and DAGSVMs (Platt et al., 2002; Hsu and Lin, 2002). The second method is to construct multi-class classifier directly, such as k-SVM proposed by Weston and Watkins (1998). Zhang et al. (2007) also proposed a fuzzy compensation multi classification SVM based on Weston and Walkins’ idea. Crammer and Singer (2001) implemented a multiclass kernel-based classification vector machine algorithm which adopted a direct method for training multiclass predictors. In many real applications, the observed input data cannot be measured precisely and usually described in linguistic levels or ambiguous metrics. Moreover, the obtained training data is often contaminated by noises. The standard SVM is very sensitive to outliers and noises. Some techniques have been found to tackle this problem for SVM in the literature of data classification. One of the methods is to use averaging algorithm to relax the effect of outliers in (Song et al., 2002; Hu and Song, 2004). However, in this kind of method the additional parameter is difficult to be tuned. The second method, proposed in (Inoue and Abe, 2001; Lin and Wang, 2002; Lin and Wang, 2003) and named as a Fuzzy SVM (FSVM), is to apply fuzzy membership to the training data to relax the effect of the outliers. Ji et al. (2010) propose support vector machine for classification based on fuzzy training data. The proposed method introduces the support vector machine which the training examples are fuzzy input and gives some solving procedure of the support vector machine with fuzzy training data. In this study, we obtain the fuzzy membership degree by using fuzzy c-means clustering algorithm. Fuzzy cmeans Clustering Method (FCM) (Dunn, 1974; Bezdek, 1980) is the most common clustering method based on the minimization of the distance-based objective function. Aiming to multiclass classification and noise data Corresponding Author: Yong Zhang, College of Computer and Information Technology, Liaoning Normal University, Dalian 116081 China 2067 Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012 problems, Zhang et al. (2007) also proposed a fuzzy compensation multi classification SVM based on fuzzy theories. However, this method is inconvenient to optimization due to many parameters. This study implements a new direct constructing SVM based on the method proposed by Crammer and Singer (2001). Experimental results show that the proposed method obtain better classification ratio. LITERATURE REVIEW Direct constructed multiclass classifier: Multiclass classification learning is a critical problem of machine learning. We first give a description of multiclass classification. Let T = {(x1, y1), (x2, y2), …, (xl, yl)}, X × Y be a set of l training examples, where xi , X = RN, yi , Y ={1, 2, …, k}, i = 1, 2, …, l. The solution to multiclass problem is to find a decision-making function f(x): X ÷ Y so that the misclassification ratio is likely to small while classifying unknown samples. In other words, to solve multiclass classification problem is to find a rule to divide a pattern space into k sections. Crammer and Singer gave a quadratic optimization problem to multiclass classification in Crammer and Singer (2001) as follows: min M 1 β M 2 idea of partial membership expressed by the degree of membership of each pattern in a given cluster. Dunn (1974) presented the first fuzzy clustering methods based on an adequacy criterion defined by the Euclidean distance. Bezdek (1980) further generalized this method. These fuzzy c-means (FCM) algorithms are very popular and have been applied successfully in many areas such as taxonomy, image processing, information retrieval, data mining, etc. (Tan and Matlsa, 2011; Izakian and Abraham, 2011; Tang et al., 2010). FCM algorithm starts with an initial guess for the cluster centers, which intends to mark the mean location of each cluster. The initial guess for these cluster centers is most likely incorrect. Additionally, FCM assigns every data point a membership grade for each cluster. By iteratively updating the cluster centers and the membership grades for each data point, FCM iteratively moves the cluster centers to the "right" location within a data set. This iteration is based on minimizing an objective function that represents the distance from any given data point to a cluster center weighted by that data point's membership grade. Namely: c + ∑ξi (1) i =1 c i =1 where, $ > 0 is a regularization constant, >i $ 0 are the slack variables, M is a k×l matrix, M 22 is the l2-norm defined as: 2 2 = ( M1 ,..., M k ) 2 2 = ∑ i , j Mij2 Mr is the rth row vector of matrix M. *p,q is equal 1 if p = q and 0 otherwise. The corresponding decision-making function is as follows: H M ( x ) = arg max r =1,..., k { Mr . x} ) s. t . ∑ uik = 1, 0 ≤ uik ≤ 1 st . M yi . xi + δ yi , r − M r . xi ≥ 1 − ξi , ∀ i , r: i = 1,..., l , r = 1,..., k M l i = 1k = 1 l 2 2 ( min J (U ,V ) = ∑ ∑ uikmd x k , vi (2) (4) where, c is the number of clusters and selected as a specified value in this study, l is the number of data points, uik ,[0,1] denotes the degree to which the sample xk belongs to the ith cluster, m is the fuzzy parameter controlling the speed and achievement of clustering, d (xk, vi) = 2xk, ...,vi22 denotes the distance between point xk and the cluster center vi and V is the set of cluster centers or prototypes (vi , Rp). The objective function J(U,V) is minimized via an iterative process in which the degrees of membership uij and the cluster centers vi are updated: vi = Therefore, a classification rule can be defined as follows. For a new sample x, it will be classified to the rth class so that the inner product (Mr. x) between the rth row of matrix M and the sample x is maximum. Fuzzy c-means clustering: Clustering methods seek to organize a set of items into clusters such that items within a given cluster have a high degree of similarity, whereas items belonging to different clusters have a high degree of dissimilarity. However, conventional hard clustering methods restrict each point of the data set to exactly one cluster. Fuzzy clustering generates a fuzzy partition based on the (3) ∑ lj =1(uij )m x j ∑ lj =1(uij )m 1 uij = ∑ d ij c k =1 ( d ik 2 (5) (6) ) m− 1 FUZZY DIRECT CONSTRUCTING MULTICLASS CLASSIFIER Inspired by some works, this study does a fuzzy process to direct multiclass classification based on 2068 Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012 Crammer and Singer (2001). The proposed algorithm introduces fuzzy membership degree and assigns a weight to each sample according to its importance, thereby reducing the effect of noise and outlier to classification and enhancing SVM classification performance. Introducing fuzzy theories, the quadratic optimization problem of Eq. (1) is as follows: min M l 1 β M 2 2 2 ∑ tiξi + (7) i=t subject to: M yi . xi + δ yi , r − Mr . xi ≥ 1 − ξi , ∀ i , r:i = 1,..., l , r = 1,..., k According to Eq. (8), we obtain: ( ) ⎡ l ⎤ M r = β −1 ⎢ ∑ ti δ yi ,r − α i ,r xi ⎥ ⎣ i =1 ⎦ From Eq. (13), we can find each row vector Mr of matrix M is linear combinations of the samples x1,..., xl and the coefficient of a sample xi to row vector Mr is ti * yi, r – "i, r. We claim a sample xi is a support vector if there is a row Mr for which this coefficient is not zero. Obviously, for each row vector Mr of the matrix Mr we can divide support vectors into two subsets as follows: (8) ⎡ ⎤ Mr = β −1 ⎢ ti − αi , r xi + − αi ,r xi ⎥ ⎢ ⎥ i: yi ≠ r ⎣ i: yi = r ⎦ ∑( where, $ > 0 is a regularization constant, >i $ 0 are the slack variables, M is a k×l matrix M 22 , is the l2-norm ) ∑( ) defined as: In above equation, the first section M 2 2 2 2 = M1 ,..., M k (13) = ∑ ij M ij2 , ti (14) ∑(ti − αi , r )xi i: yi = r is over all vectors that belong to the rth class. The second ti is the fuzzy membership degree defined as fuzzy cmeans clustering. Mr is the rth row vector of matrix M.*p, q is equal to 1 if p = q and 0 otherwise. Note that inequation (8) has no constraints on slack variables >i. For r = yi, inequation (8) is Mr. xi + *r. r –Mr. xi $ 1- >i. Obviously, *r, r = 1. So we can get >i $ 0. In other words, the constraints on slack variables >i are hid in inequation (8). Using Lagrangian multiplier method, the dual formulation of Eq. (7) is as follows: ∑(− αi , r )xi section is over the rest of the vectors whose i: yi ≠ r labels are different from r. Substituting Eq. (11) to Eq. (9), we obtain: 1 β 2 Q(α ) = k 2 r =1 l − l ∑ Mr 2 + ∑ tiξi + ∑αi , r xi . Mr − ∑αi , r xi . M yi i =1 i ,r i, r k ∑ξi ∑αi , r + ∑αi , r (1 − δi ,r ) i=l r =1 4 123 (15) i, r = ti 1 L( M , ξ , α ) = β 2 ∑ r =1 k ∑ l 2 Mr 2 + 1 = β Mr 2 r =1 ∑ tiξi i =1 ∑ ∑ ai , r [ Mr . xi − M yi . xi − δ yi , r + 1 − ξi ] l + k k (10) 1 β 2 Differentiating in >i, Mr according to Eq. (9) and setting their derivative values are zero, respectively, we obtain: ∂L = ti − ∂ξi k r =1 r =1 ∑αi , r = 0 ⇒ ∑αi , r = ti ∂L = βM r + ∂M r ⎛ ⎞ ⎜ αi , r xi − αi ,q ⎟ xi ⎜ ⎟ ⎠ i =1 i , yi = r ⎝ q 1424 3 l ∑ ∑ ∑ l l i =1 i =1 = ti = βM r + ∑αi , r xi . Mr − ∑αi , r xi . M yi + ∑αi , r (1 − δ i ,r ) i, r i, r i, r We substitute Mr in the above equation using Eq. (13) and get: subject to: k + (9) i =1 r =1 αi , r ≥ 0, ∀ i = 1,..., l , r = 1,..., k 2 2 k ∑ Mr 2 = 2 β ∑ Mr . Mr 2 1 = 1 β 2 r =1 r = ∑[β −1∑ xi (tiδ yi, r − αi , r )][β −1∑ x j (t jδ yi , r − α j , r ) r 1 −1 β 2 i (16) j ∑ xi . x j ∑(tiδ yi , r − αi, r )(tiδ yj , r − α j, y ) i, j r (11) ∑αi , r xi . Mr = ∑αi , r xi . β −1∑ x j (t jδ yj, r − α j, y ) i, r (12) i, r = β −1 j ∑ xi . x j ∑αi , r (t jδ yj, r − α j, y ) i, j ∑αi , r xi − ∑ tiδ yi , r xi = 0 2069 r (17) Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012 ∑αi, r xi . M yi = ∑αi , r xi .β −1∑ x j (t jδ yj , yi − α j, yi ) i, r i, y = β −1 ∑ j xi . x j ∑ i, j = β −1 ( αi , r t jδ yj , ) r ∑ xi . x j (t jδ yj, yi − α j , yi )∑αi , r i, j = β −1 yi − α j , yi Table 1: Data sets Data set Iris Wine Glass Vehicle Vowel Noise-data (18) r ∑ ti (t jδ yj , yi − α j , yi )xi . x j #training data 150 178 214 846 528 1000 #attributes 4 13 9 18 10 10 #class 3 3 6 4 11 4 i, j = β −1 ∑ xi . x j ∑ tiδ yi, y (t jδ yj, yr − α j , r ) i, j r Note that "i is equal to ti in Eq. (18). From Eq. (17) and Eq. (18), we get: ∑α i,r xi . M r − ∑α i,r = β i,r i,r −1 ∑ x .x ∑ α i j i, j r i, r xi . M (t j ) ( δ yj , r − α j , r − β −1 ∑ xi . x j ∑ t i δ yi , r t j δ yj , r − α j , r ( r )( i, j ) r ) (19) Finally, we substitute equations (16-19) to Eq. (15) and get object function: max Q(α ) = α ∑ 1 −1 β xi . x j 2 i, j − β −1 ∑(tiδ yi , r − αi , r )(tiδ yj , r − α j , r ) r ∑ xi . x j ∑(tiδ yi , r − αi , r )(tiδ yj , r − α j , r ) + ∑αi , r (1 − δi , r ) i, j = − r ∑ 1 −1 β xi . x j 2 i, j i, r ∑ (t δ i yi , r )( ) ∑α (1 − δ ) − αi , r tiδ yj , r − α j , r + r i, r (20) i, r i, r Subject to "i,( $0, œi = 1,..., l, r =1,..., k. In Eq. (20), we usually replace the inner product (xi ,xj) with kernel function K(xi ,xj) = N(xi)T N (xj). Solving this optimization problem, we get the following classification rule: H M ( x ) = arg max{ M r .φ ( x )} = arg max{φ ( x ).( β − 1 r = arg max{β − 1 r CS-SVM 2.67 1.13 28.04 13.24 1.30 22.60 Our proposed method 2.67 1.13 25.23 8.98 0.65 18.80 yi = β −1 ∑ xi . x j ∑ t i δ yj , r − α i , r t i δ yj , r − α j , r i, j Table 2: Experimental results Date set 1-a-1 k-SVM Iris 2.67 2.67 Wine 0.56 1.13 Glass 28.51 28.97 Vehicle 13.36 13.00 Vowel 1.08 1.52 Noise-data 24.40 28.70 ∑ (tiδ yi , r − αi , r )φ ( xi ))} i (21) ∑ (tiδ yi , r − αi , r ) K ( xi , x)} i Hsu and Lin (2002) implement the direct multiclass classification method by Crammer and Singer (2001). On the basis of Hsu and Lin (2002), we also implement the fuzzy direct multiclass classification method proposed in this study. vowel. In order to validate the applicability of the proposed algorithm for noise data, we also construct a data set with noise using a data builder, named as noisedata in Table 1. Their related parameters are listed in Table 1. In Table 1, #pts denotes the number of data points, #att denotes the number of the attributes and #class denotes the number of the classifications. In this experiment, we compared our proposed method with CS-SVM proposed by Crammer and Singer (2001), 1-against-1 SVM and k-SVM (Weston and Watkins, 1998). In order to fairly evaluate experimental results, we use RBF kernel function K (xi, xj) = e -(5xi -xj52 and use the 10-fold cross-validation method to estimate the generalization errors of the classifiers in the experiment. For parameter optimization, we adopt the grid search method so as to obtain the best classification precision of each method in which ( is limited to [24, 23, …, 2-10] and C is limited to [212, 211, …, 2-2]. Note that C = $-1 in CS-SVM method and our proposed method. In the experiments, we use LIBSVM (Chang and Lin, 2010) a s SVM prototype in 1-against-1 and k-SVM and choose BSVM (http://www.csie.ntu.edu.tw/~cjlin/bsvm) as SVM prototype in FC-SVM and our proposed method. The experimental results are summarized in Table 2. Note that values in Table 2 denote the misclassification rate. From the experimental results in Table 2, we can see that our proposed method obtains better classification ratio than CS-SVM. The accuracy rate in k-SVM and CS-SVM are lower. The accuracy rate can be improved when the training data is fuzziness. In addition, the proposed method also obtains better classification ratio for our constructed noise-data, which shows the proposed method is effective to noise data. EXPERIMENTAL ANALYSIS CONCLUSION We evaluated our method by using a collection of five benchmark data set from the UCI machine learning repository (http://www.ics.uci.edu/~mlearn/MLR epository. html), including iris, wine, glass, vehicle and In this study, a direct constructed fuzzy support vector machine is proposed, which can be used to deal with the outlier sensitivity problem in traditional 2070 Res. J. Appl. Sci. Eng. Technol., 4(14): 2067-2071, 2012 multiclass classification problems. In our proposed method, we reconstruct optimization problem and its constraints, reconstruct Lagrangian formula and present the theoretic deduction. The fuzzy membership degree is computed by fuzzy c-means clustering which generates different weights for training data and outliers according to their relative importance in the training set. Experimental results show that the proposed method can reduce the effect of outliers and yield higher classification rate than other existing methods do. ACKNOWLEDGMENT This study is supported by China Postdoctoral Science Foundation (No. 20110491530), Science Research Plan of Liaoning Education Bureau (No. L2011186) and Dalian Science and Technology Planning Project of China (No. 2010J21DW019). REFERENCES Bezdek, J.C., 1980. A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE Trans. Pattern Anal. Machine Intell., 2: 1-8. Chang, C.C. and C.J. Lin, 2010. LIBSVM: A Library for Support Vector Machines. Retrieved from: available athttp://www.csie.ntu.edu.tw/~cjlin/papers/libsvm. pdf. Chiu, D.Y. and P.J. Chen, 2009. Dynamically exploring internal mechanism of stock market by fuzzy-based support vector machines with high dimension input space and genetic algorithm. Expert Syst. Appl., 36(2): 1240-1248. Crammer, K. and Y. Singer, 2001. On the algorithmic implementation of multiclass kernel-based vector machines. J. Machine Learning Res., 2: 265-292. Dunn, J.C., 1974. A fuzzy relative to the ISODATA process and its use in detecting compact, wellseparated clusters. J. Cyberne, 3: 32-57. Hsu, C.W. and C.J. Lin, 2002. A comparison of methods for multi-class support vector machines. IEEE T. Neural Networ., 13(2): 415-425. Hu, W.J. and Q. Song, 2004. An accelerated decomposition algorithm for robust support vector machines. IEEE Trans. Circuits Syst. II: Express Briefs, 51(5): 234-240. Inoue, T. and S. Abe, 2001. Fuzzy support vector machines for pattern classification. In: Proceedings of International Joint Conference on Neural Networks, 2: 1449-1454. Izakian, H. and A. Abraham, 2011. Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Syst. Appl., 38(3): 1835-1838. Ji, A.B., J.H. Pang and H.J. Qiu, 2010. Support vector machine for classification based on fuzzy training data. Expert Syst. Appl., 37: 3495-3498. Lin, C.F. and S.D. Wang, 2002. Fuzzy support vector machines. IEEE T. Neural Networ., 13(2): 464-471. Lin, C.F. and S.D. Wang, 2003. Training algorithms for fuzzy support vector machines with noisy data. In: Proceedings of the IEEE 8th Workshop on Neural Networks for Signal Processing, pp: 517-526. Min, R. and H.D. Cheng, 2009. Effective image retrieval using dominant color descriptor and fuzzy support vector machine. Pattern Rec., 42(1): 147-157. Platt, J.C., N. Cristianini and J. Shawe-Taylor, 2002. Large Margin DAG's for Multiclass Classification. In: Solla, S.A., T.K. Leen and K.R. M ler, (Eds.), Advances in Neural Information Proceeding System 12: 547-553. Song, Q., W.J. Hu and W.F. Xie, 2002. Robust support vector machine with bullet hole image classification. IEEE T. Syst. Man Cy., 32(4): 440-448. Tang, C., S. Wang and W. Xu, 2010. New fuzzy c-means clustering model based on the data weighted approach. Data Knowl. Eng., 69(9): 881-900. Tan, K.S. and N.A. MatIsa, 2011. Color image segmentation using histogram thresholding-fuzzy cmeans hybrid approach. Pattern Rec., 44(1): 1-15. Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York. Weston, J. and C. Watkins, 1998. Multi-Class Support Vector Machines. Department of Computer Science, Royal Holloway University of London Technical Report, SD2TR298204. Wu, Q., 2009. The forecasting model based on wavelet m-support vector machine. Expert Syst. Appl., 36(4): 7604-7610. Wu, Q., 2010. Regression application based on fuzzy vsupport vector machine in symmetric triangular fuzzy space. Expert Syst. Appl., 36(4): 2808-2814. Zhang, Y., Z.X. Chi, X.D. Liu and X.H. Wang, 2007. A novel fuzzy compensation multi-class support vector machines. Appl. Intell., 27(1): 21-28. 2071