International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ Increasing Efficiency of Support Vector Machine using the Novel Kernel Function: Combination of Polynomial and Radial Basis Function 1 Hetal Bhavsar, 2Amit Ganatra 1 Assistant Professor, Department of Computer Science and Engineering, The M. S. University of Baroda, Vadodara, Gujarat, India 2 Dean, Faculty of Technology and Engineering, CHARUSAT, Changa, Gujarat, India E-mail: 1het_bhavsar@yahoo.co.in, 2amitganatra.ce@ecchanga.ac.in Abstract - Support Vector Machine (SVM) is one of the most robust and accurate method amongst all the supervised machine learning techniques. Still, the performance of SVM is greatly influenced by the selection of kernel function. This research analyses the characteristics of the two well known existing kernel functions, local Gaussian Radial Basis Function and global Polynomial kernel function. Based on the analysis a new kernel function has been proposed which we call as “Radial Basis Polynomial Kernel (RBPK)”. The RBPK improves the learning as well as generalization capability of SVM. The performance of the proposed kernel function is illustrated on several datasets in comparison with single existing kernels. The result on different datasets from various domains has shown better learning and prediction ability of Support Vector Machine for the RBPK. Index Terms - Support vector machine, kernel function, sequential minimal optimization, feature space, polynomial kernel, Radial Basis function I. INTRODUCTION Support Vector Machine (SVM) is a supervised machine learning method, based on the statistical learning theory and VC dimension concept. It is based on structural risk minimization principle which minimizes an upper bound on the expected risk, as opposed to empirical risk minimization principle that minimizes the error on the training data and exploits a margin-based criterion that is attractive for many classification applications like Handwritten digit recognition, Object recognition, Speaker Identification, Face detection in images, text categorization, Image classification, Biosequence analysis [4], [21], [22], [23]. It uses Generalization and regularization theory, which gives the principle way to choose a hypothesis [5], [7]. Training a Support Vector Machine comprises of solving large quadratic programming (QP) problems, which requires O(m2) space complexity and O(m3) time complexity, where m is the number of training samples [4], [10]. To solve these issues many algorithms and implementation techniques have been developed to train SVM for massive datasets. The proposed research uses Sequential Minimal Optimization (SMO), a special case of decomposition method where in each sub problem; two coefficients are optimized per iteration. SMO maintains kernel matrix of size equal to total number of samples in the dataset, which allows it to handle very large training sets [17]. In the real world, not all the datasets can be linearly separable. Kernel functions, the key technology to SVM, are used to map data from input space to higher dimensional feature space, which makes classification problem linear in that feature space [4]. Cover’s theorem guarantees that any dataset becomes arbitrarily separable as the data dimension grows [3]. The QP problem for training an SVM with kernel function is l W(α) = ∑ αi – i=1 Subject to: ∑li=1 αi yi l l 1 ∑ ∑ αi αj yi yj K(x⃗i , x⃗ j ) 2 i=1 j=1 = 0 and C ≥ αi ≥ 0 where C is the regularization parameter and K(x⃗ i , x⃗j ) is the kernel function, both supplied by the user; and the variables αi are Lagrange multipliers. Linear kernel, Polynomial kernel, Radial Basis Function (RBF) and Sigmoid kernel are common and well known prime kernel functions. The feature space of every kernel is different, so representation in new feature space is different. The selection of kernel functions will have a direct impact on the performance of SVM. The values of parameters of kernel function’s (like d in polynomial kernel, σ in RBF function and p in sigmoid kernel) and regularization parameter C has a great impact on complexity and generalization error of the classifier. Choosing the optimal values of these parameters is also very important along with the selection of kernel function [18], [20]. _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 17 International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ Many researchers have already worked to propose a solution in this era.The clinical kernel function which takes into account the type and range of each variable is proposed in [8]. This requires the specification of each type of variable, as well as the minimal and maximal possible value for continuous and ordinal variables based on the training data or on a priori knowledge. A new method of modifying a kernel to improve the performance of a SVM classifier which is based on information-geometric consideration of the structure of the Riemannian geometry induced by the kernel is proposed in [1]. A new kernel by using convex combination of good characteristics of polynomial and RBF kernels is proposed in [19]. To guarantee that the mixed kernel is admissible, optimal minimal coefficient has to be determined. The advantages of linear and Gaussian RBF kernel function are combined to propose a new kernel function in [20]. This results into the better capability of generalization and prediction, but the method they used to choose the best set of parameters (C, σ, λ) is time consuming, requiring O(N3) time complexity where N is the number of training samples. The compound kernel taking polynomial kernel, the RBF and the Fourier kernel is given in [2]. The Minimax probability machine (MPM) whose performance depends on its kernel function is evaluated by replacing Euclidean distance in the Gaussian kernel with a more generalized Minkovsky’s distance, which result into better prediction accuracy than the Euclidean distance [15]. [9] proposed dynamic SVM by distributing kernel function showing that recognition question of a target feature is determined by a few samples in the local space taking it as the centre and the influence of other samples can be neglected. [24] showed that there may be the risk of losing information while multiple kernel learning methods try to average out the kernel matrices in one way or another. In order to avoid learning any weight and suit for more kernels, the new kernel matrix is proposed which composed of the original, different kernel matrices, by constructing a larger matrix in which the original ones are still present. The compositional kernel matrix is s times larger than the base kernels. A new mechanism to optimize the parameters of combined kernel function by using large margin learning theory and a genetic algorithm, which aims to search the optimal parameters for the combined kernel function is proposed in [13]. However, the training speed is slow when the dataset becomes large. The influence of the model parameters of the SVMs using RBF and the scaling kernel function on the performance of SVM are studied by simulation in [12]. The penalty factor is mainly used to control the complexity of the model and the kernel parameter mainly influences the generalization of SVM. They showed that when the two types of parameters function jointly, the optimum in the parameter space can be obtained. However, the choice of the SVM kernel function is still a relatively complex and difficult issue. This research analyzed the key characteristics of two very well known kernel functions: RBF kernel and Polynomial kernel, and proposed a new kernel function combining the advantages of two, which has better learning and better prediction ability. II. SMO Decomposition techniques speed up the SVM training by dividing the original QP problem into smaller pieces, thereby reducing the size of each QP problem. Chunking algorithm, Osuna’s decomposition algorithm are well known decomposition algorithms [17]. Since these techniques require many passes over the data set, they need a longer training time to reach a reasonable level of convergence. SMO is a special case of decomposition method where in each sub problem two coefficients are optimized per iteration which is solved analytically. The advantages of SMO are: It is simple, easy to implement, generally faster, and has better scaling properties for difficult SVM problems than the standard SVM training algorithm [14], [17]. It maintains kernel matrix of size which equal to total number of samples in dataset and thus scales between linear and cubic in the sample set size. To find an optimal point of (1), SMO algorithm uses the Karush-Kuhn-Tucker (KKT) conditions. The KKT conditions are necessary and sufficient conditions for an optimal point of a positive definite QP problem. The QP problem is solved when, for all i, the following KKT conditions are satisfied: 1 α i = 0 ⟺ yi u i ≥ 1 0 < α i < C ⟺ yi ui = (2) α i = C ⟺ yi ui ≤ 1 Where ui is the output of the SVM for ith training sample. The KKT conditions can be evaluated on one example at a time, which forms the basis for SMO algorithm. When it is satisfied by every multiplier, the algorithm terminates. The KKT conditions are verified to within ε, which typically range from 10−2 to10−3 . III. KERNEL FUNCTIONS Kernels are used in Support Vector Machines to map the nonlinear inseparable data into a higher dimensional feature space where the computational power of the linear learning machine is increased [10], [16]. Using the kernel function, the optimization classification function in the high dimension can be given as: f(x) = sgn (∑Ns ⃗ i , x⃗ ) + b i=1 αi yi K(x (3) Here, K(x⃗ i , x⃗j ) is called a kernel function in SVM. It measures the similarity or distance between the two vectors. A kernel function K: χ × χ → R in κ is valid if there is some feature mapping Φ, such that K(x⃗ i , x⃗ j ) = Φ(x⃗ i ). Φ(x⃗ j ) (4) _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 18 International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ Thus, we can calculate the dot product of (Φ(x⃗i ), Φ(x⃗ j )) without explicitly applying function Φ to input vector. The kernel function can transform the dot product operations in high dimension space into the kernel function operations in input space as long as it satisfies the Mercer condition [5], [11], [18]; thereby it avoids the problem of computing directly in high dimension space and solves the dimension adversity. The performance of SVM largely depends on the kernel function. Every kernel function has its own advantages and disadvantages. Various possibilities of kernels exist and it is difficult to explain their individual characteristics. A single kernel function may not have a good learning as well as generalization capability. As a solution, the good characteristics of two or more kernels should be combined. Mainly there are two types of Kernel functions of support vector machine: local kernel function and global kernel function. In global kernel function samples far away from each other has impact on the value of kernel function; however in local kernel function only samples closed to each other has impact on the value of kernel function. Polynomial kernel is an example of global kernel function and RBF kernel is an example of local kernel function. Fig.1. A local RBF kernel function with different value of σ B. Polynomial kernel The polynomial kernel function is defined as d k(x⃗i , x⃗ j ) = (x⃗i ∙ x⃗j + 1) (6) Where d is the degree of the kernel. In Fig. 2 the global effect of the Polynomial kernel of various degrees is shown over the data space [-1, 1] with test input 0.2, which shows that every data point from the data space has an influence on the kernel value of the test point, irrespective of its actual distance from test point. A. RBF kernel The RBF kernel function is the most widely used kernel function because of its good learning ability among all the single kernel functions. k(x⃗i , x⃗j ) = e−‖x⃗i−x⃗j‖ 2 where, σ = mean ‖x⃗ i − x⃗j ‖ 2 ⁄2σ2 (5) 2 The RBF can be well adapted under many conditions, low-dimension, high-dimension, small sample, large sample, etc. RBF has the advantage of having fewer parameters. A large number of numerical experiments proved that the learning ability of the RBF is inversely proportional to the parameter σ .σ determines the area of influence over the data space. Fig. 1 shows the local effect of RBF kernel for a chosen test input 0.2 over the data space [-1, 1], for different values of the width σ. A larger value of σ will give a smoother decision surface and more regular decision boundary. This is because an RBF with large σ allow a support vector to have a strong influence over a large area. If σ is very small, we can see in fig. 1 that only samples whose distances are close to σ can be affected. Since, it affect on the data points in the neighbourhood of the test point, it can be call as local kernel.. Fig. 2. A global polynomial kernel function with different values of d The polynomial kernel is a global kernel function with good generalization ability, which can affect the value of global kernel, yet without the strong learning ability like local kernel function RBF. IV. PROPOSED KERNEL FUNCTION For a SVM classifier, choosing a specific kernel function means choosing a way of mapping to project the input space into a feature space. A learning model, which is judged by its learning ability and prediction ability, was built up by choosing a specific kernel function. Thus, to build up a model which has good learning as well as good prediction ability, this research has combined the advantages of both, local RBF kernel function and global Polynomial kernel function. A novel kernel function called Radial Basis Polynomial Kernel (RBPK) is now defined as: d (xi . xj + c) k(xi , xj ) = exp ( ) σ2 (7) _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 19 International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ Where c>0 and d>0. This RBPK kernel function takes advantage of good prediction ability from polynomial kernel and good learning ability from RBF kernel function. The Mercer’s theorem provides the necessary and sufficient condition for a valid kernel function. It says that a kernel function is a permissible kernel if the corresponding kernel matrix is symmetric and positive semi-definite [7], [11], [18]. Since, the RBPK kernel satisfies the Mercer’s theorem it is a permissible kernel. V. EXPERIMENT A. Dataset Description In order to validate the classification ability of SVM using RBPK kernel, this research conducted several experiments on standard scaled datasets. Along with this, the comparison has also been done with other existing kernel functions like Linear, Polynomial, and RBF. The datasets considered for the simulation are iris, heart, glass, a1a (adult) and letter dataset from LIBSVM [6], usps dataset1, dna dataset2 and web8 dataset3. 1 http://wwwstat.stanford.edu/~tibs/ElemStatLearn/data.h tml. Classified Instances (CCI), number of Incorrectly Classified Instances (ICI), number of support vectors (SV), precision, True Positive Rate (TPR) and False Positive Rate (FPR) are also used to measure the efficiency of RBPK kernel compared to other kernel functions. The Relative Operating Characteristic (ROC) curve has been shown to visually depict the performance of the classification model. Depending on kernel type, the different kernel parameters have to be set. Regularization parameter C, which controls the trade-off between maximizing the margin and minimizing the training error term, is set to 1 for all experiments. As for linear kernel no parameter is needed to be set. The polynomial kernel is executed for degree values 1, 3 and 5, gamma values for RBF is set to 0.01, 0.05, 0.08, 0.1and same values of degree and gamma are used in RBPK kernel. Result table for different datasets shows only those values of parameters for which maximum accuracy is obtained for that kernel after simulation. The simulation results have been taken by running SMO algorithm using LIBSVM framework in eclipse on Intel core i5-2430M CPU@ 2.4GHz with 4GB of RAM Machine. VI. EVALUATION 2 https://www.sgi.com/tech/mlc/db/ 3 http://users.cecs.anu.u.au/~xzhang/data/ A. Results with cross validation Method The details of each dataset are shown in Table 1.The datasets are from multiple fields, varied in terms of number of instances, number of attributes, number of classes and all are of multivariate type. Result obtained after running Iris, Heart and Glass datasets with different kernel functions and parameter tuning are shown in Table 2, Table 3 and Table 4 respectively. Table 1: Dataset Description Table 2: Result for IRIS Dataset Datas et Iris Heart Glass No. of classe s 3 2 7 No. of Feature s 4 13 10 No. of Training instance s 150 270 214 No. of Testing Instanc es a1a Dna Letter Usps web8 2 3 26 10 2 123 180 16 256 300 1605 2000 15000 2007 45546 30956 1186 5000 7291 13699 B. Experiment setup and Preliminaries Method Used for Validatio n Cross Validatio n Kernel Function Linear Polynomi al RBF RBPK Holdout As for experiments’ setup, all tests were accomplished as follows: To evaluate the classification accuracy of RBPK kernel function compared to other kernel function, two test methods: cross-validation and hold out method (Han et al., 2006), are used for different datasets. The cross-validation method has been used for the Iris, Heart and Glass data sets with k=10 folds and holdout method is used for a1a, Dna, Letter, Usps and Web8 datasets with separate training and testing datasets available. The other measures like number of Correctly Parameter s Accura cy (%) 97.33 Tr_Ti me (Sec) 0.02 CC I 146 IC I 4 d=3 ϒ=0.08 d=5, ϒ=0.08 73.33 96.66 0.026 0.027 110 145 40 5 S V 40 11 3 80 98 0.022 147 3 32 ICI SV 43 91 15 8 11 4 10 8 Table 3: Result for Heart Dataset Kernel Function Accurac y (%) Tr_Ti me (Sec) 84.07 0.067 d=3 82.22 0.071 ϒ=0.05 d=3, ϒ=0.01 82.96 0.076 84.44 0.078 Parameter s Linear Polynom ial RBF RBPK C CI 22 7 22 2 22 4 22 8 48 46 42 _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 20 International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ Polynomial RBF RBPK Accuracy (%) Tr_Time (Sec) CCI ICI SV 64.48 0.078 138 76 162 d=3 47.66 0.094 102 112 189 ϒ=0.1 d=5, ϒ=0.08 58.88 0.093 126 88 177 71.49 0.094 153 61 134 The above result shows that the classification accuracy of RBPK kernel is highest compared to other existing kernel functions for all the datasets considered. The testing time of SVM depends on the number of SV. The RBPK kernel results into less number of SV, which reduces overall complexity of the model. Table 4 shows that for the glass dataset, there is a drastic increase in accuracy of around 7%. Since, the datasets are small; training time of SVM with all the kernels is almost similar. A. B. Results with Hold out Method Experimental results for a1a dataset are shown in Table 5 and Fig. 3. RBPK kernel (with d=1, ϒ=0.05) gives highest accuracy compared with other existing kernel functions. The increase in accuracy is only 0.17%, but it leads to 55 more correct classifications. Polynomial kernel gives worst performance than any other kernel function. Linear kernel and RBF kernel give nearly similar performance but the number of support vector is less in linear kernel, which shows the testing time of linear kernel is less compared to RBF kernel. Table 5: Result for a1a Dataset Statistics No. of Instances CCI vs. ICI RBF ϒ=0.05 84.23 0.246 4.144 26073 4883 691 0.834 0.842 0.359 CCI RBPK d=1, ϒ=0.05 84.4 0.265 4.165 26128 4828 650 0.838 0.844 0.328 ROC for a1a dataset ICI 0.85 0.84 0.83 0.82 0.81 0 0.5 FPR Kernel Function (a) Table 6: Result for DNA Dataset Statistics Kernel Function Linear Polynomial Parameters Accuracy(%) Tr_Time (s) Ts_Time (s) CCI ICI SV Precision TPR FPR 1500 93.08 0.749 0.312 1104 82 396 0.94 0.94 0.048 CCI vs. ICI RBF ϒ=0.01 94.86 1.374 0.811 1125 61 1026 0.958 0.959 0.027 d=3 50.84 2.575 1.263 603 583 1734 0.259 0.51 0.51 CCI RBPK d=3, ϒ=0.01 95.36 2.06 1.03 1131 55 1274 0.963 0.963 0.026 ROC for DNA dataset ICI 1.5 1000 500 1 0.5 0 0 d=1 82.13 0.203 3.494 25423 30956 790 0.822 0.821 0.518 TPR 30000 25000 20000 15000 10000 5000 0 83.82 0.244 2.568 25947 5009 588 0.833 0.838 0.32 Table 6 and Fig. 4 show the result for DNA dataset after tuning parameters for different kernel functions. It shows that using RBPK kernel around 0.5% of accuracy is increased with the highest TPR and lowest value of FPR. Compared to RBF kernel it takes almost similar testing time with having around 200 more number of SVs. 0 Kernel Function Linear Polynomial Parameters Accuracy(%) Tr_Time (s) Ts_Time (s) CCI ICI SV Precision TPR FPR Though number of SVs with RBPK is less compared to polynomial and RBF kernel, it takes more testing time. Higher value of Precision and TPR show the better efficiency of RBPK kernel. TPR Para meters No. of Instances Table 4: Result for Glass Dataset Kernel Function Linear 1 (b) Fig. 3. For a1a dataset (a) a comparison of CCI vs. ICI and (b) ROC curve 0.5 FPR Kernel Function (a) 1 (b) Fig. 4. For dna dataset (a) a comparison of CCI vs. ICI and (b) ROC curve Table 7 and Fig. 5 show the result for Letter dataset after tuning parameters for different kernel functions. Table 7: Results for Letter Dataset Statistics Parameters Accuracy (%) Tr_Time (s) Ts_Time (s) CCI ICI SV Precision TPR FPR Kernel Function Linear Polynomial d=3 RBF ϒ=0.1 RBPK d=5, ϒ=0.1 84.3 37.58 84.6 95.88 7.064 38.454 12.391 8.03 5.07 4215 785 8770 0.872 0.87 0.002 9.918 1879 3121 14462 0.598 0.387 0.023 9.259 4230 770 10882 0.882 0.872 0.002 5.696 4794 206 5382 0.988 0.987 0.0005 _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 21 CCI ultimately in testing time. Polynomial kernel gives accuracy of 96.99%, but its FPR is highest, which is 0.97. This is because of imbalance in the distribution of data. With the RBPK kernel the TPR is highest and FPR is least, which indicates that it works better for imbalanced data. ROC for letter dataset ICI 1.5 1 0.5 0 0 0.02 FPR Kernel Function (a) 0.04 Table 9: Results for Web8 Dataset Statistics (b) Fig. 5. For letter dataset (a) a comparison of CCI vs. ICI and (b) ROC curve Compared to the existing kernel, RBPK kernel with parameters (d=5, ϒ=0.1) drastically increase accuracy of classification by ~11%. Number of SVs in RBPK is almost half the number of SVs in RBF, which is the best existing kernel function, which affects the testing time of RBPK kernel. The FPR is almost nearly zero and TPR is highest for RBPK kernel, which is shown in Fig. 5. (b). Result for Usps dataset is shown in Table 8 and Fig. 6. The highest accuracy obtained after parameter tuning with existing kernel is 94.97% with RBF kernel. With the RBPK kernel it is increased by 0.7%. Though the numbers of SVs with RBPK kernel are very high compared to RBF kernel, it takes almost similar testing time as that of RBF kernel. Table 8: Results for Usps Dataset No. of Instances Parameters Accuracy(%) Tr_Time (s) Ts_Time (s) CCI ICI SV Precision TPR FPR 2500 2000 1500 1000 500 0 CCI vs. ICI CCI RBPK d=1, ϒ=0.05 95.62 15.569 5.42 1919 88 2029 0.948 0.948 0.003 0.95 0.94 0.93 0.92 0.91 0 0.005 0.01 FPR Kernel Function (a) Tr_Time (s) Ts_Time (s) CCI ICI SV Precision TPR FPR 15000 98.8 27.70 4 2.2 13547 152 1356 0.989 0.989 0.326 CCI vs. ICI RBPK d=3 ϒ=0.08 d=3, ϒ=0.05 96.99 99.21 99.51 18.505 4.82 13288 411 2678 0.941 0.97 0.97 108.159 23.387 13592 107 3515 0.992 0.992 0.234 5306.575 5.79 13632 67 2124 0.995 0.995 0.134 ROC for web8 dataset CCI 1 ICI 10000 5000 0 0.99 0.98 0.97 0 RBF ϒ=0.01 94.97 11.32 5.02 1833 174 41 0.942 0.941 0.005 ROC for usps dataset ICI Accuracy( %) RBF 0.96 Kernel Function Linear Polynomial d=3 93.02 93.77 5.906 17.29 2.847 6.481 1867 1882 140 125 992 2692 0.923 0.926 0.92 0.927 0.008 0.006 TPR Statistics Parameters Kernel Function Polynomi Linear al TPR CCI vs. ICI No. of Instances 2500 2000 1500 1000 500 0 TPR No. of Instances International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ (b) Fig. 6. For usps dataset (a) a comparison of CCI vs. ICI and (b) ROC curve Kernel Function (a) 1 2 FPR (b) Fig. 7. For web8 dataset (a) a comparison of CCI vs. ICI and (b) ROC curve In summarization, the datasets that have been used in the analysis are of varied instances, features and classes from different domains. The ranges of training instances were from 150 to 45000, testing instances from 1100 to 30000, features were from 4 to 300, and classes were from 2 to 26. To get a better classification effect, choosing value of parameters are very important. It has been observed that, as the value of degree increases in polynomial kernel, the accuracy is decreased. Polynomial kernel gives better accuracy for degree 3. RBF kernel works better for value of γ=0.05 and above. Similarly, for most of the datasets RBPK kernel works better for degree 3 and γ=0.05 and above. As shown in the Fig. 8 and Fig. 9, using RBPK kernel, the accuracy of SVM classifier in correctly classifying instances is increased by around 0.2% to 11%. From the results of iris, glass, DNA, letter and Usps dataset it can be observed that the RBPK kernel function gives better feature space representation for multiclass classification as well. Mapping the data into new feature space using RBPK kernel function mostly reduces the number of support vectors as shown in Fig. 10 and Fig 11, which may over all reduce model complexity as well as testing Result for Web8 dataset is shown in Table 9 and Fig. 7. The highest accuracy obtained after parameter tuning with RBPK is 99.51%, followed by RBF kernel with 99.21%. Though the difference in accuracy is 0.3% only, there is a drastic reduction in number of SVs and _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 22 International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ Accuracy time. The result showed that RBPK obtained much better TPR and precision and lowest FPR for all datasets compared to other existing kernel functions. The RBPK kernel matrix required the storage proportional to number of training samples. 120 100 80 60 40 20 0 REFERENCES [1] Amari, Shun-ichi, and Si Wu. "Improving support vector machine classifiers by modifying kernel functions." Neural Networks 12(6): 783789, 1999. [2] An-na, Wang, Z. Yue, H. Yun-tao, and Li Y., "A novel construction of SVM compound kernel function." International Conference on Logistics Systems and Intelligent Management, vol. 3, pp. 1462-1465. IEEE, 2010. [3] Burbidge, Robert, and B. Buxton, "An introduction to support vector machines for data mining." Keynote papers, young OR12 : 3-15, 2001. [4] Burges, J. C. Christopher, "A tutorial on support vector machines for pattern recognition." Data mining and knowledge discovery vol. 2, no. 2: 121-167, 1998 [5] Campbell, Colin, and Y. Ying, "Learning with support vector machines."Synthesis Lectures on Artificial Intelligence and Machine Learning vol. 5, no. 1: 1-95, 2011. [6] Chang, Chih-Chung, and C. J. Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) vol. 2, no. 3: 27, 2011. [7] Cortes, Corinna, and V. Vapnik. "Support-vector networks." Machine learning vol. 20, no. 3: 273297, 1995. [8] Daemen, Anneleen, and B. De Moor. "Development of a kernel function for clinical data." In Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE, pp. 5913-5917. IEEE, 2009. [9] Guangzhi, Shi, D. Lianglong, H. Junchuan, and Z. Yanxia. "Dynamic support vector machine by distributing kernel function." In Advanced Computer Control (ICACC), 2010 2nd International Conference on, vol. 2, pp. 362-365. IEEE, 2010. [10] Han, Jiawei, and M. Kambe,” Data Mining, Southeast Asia Edition: Concepts and Techniques”. Morgan kaufmann, 2006. [11] Herbrich, Ralf. "Learning Kernel classifiers: theory and algorithms (adaptive computation and machine learning)." MIT press, 2001. [12] Jin, Yan, J. Huang, and J.Zhang, "Study on influences of model parameters on the performance of SVM." In International Linear Polynomial RBF RBPK Iris Heart Glass Fig. 8. Overall performance of different kernel function with cross-validation method Accuracy be higher as compared to using single existing kernel function. It can be applied to different kind of fields, and is not sensitive to domain data. 120 100 80 60 40 20 0 Linear Polynomial RBF RBPK Fig. 9. Overall performance of different kernel function with holdout method 200 150 Linear 100 Polynomial RBF 50 RBPK 0 Iris Heart Glass Fig. 10. Effect on number of Support Vectors with different kernel function with cross-validation method No. of SVs 20000 15000 10000 5000 0 Linear Polynomial RBF RBPK Fig. 11. Effect on number of Support Vectors with different kernel function with holdout method VII. CONCLUSION The proposed RBPK kernel function combines advantages of two very well known kernel functions, RBF and Polynomial kernel. By choosing appropriate kernel parameters, it results into better generalization, learning and predicting capability. Better predicting results can be obtained for binary as well as multiclass datasets. By using RBPK function, the accuracy rate will _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 23 International Journal on Advanced Computer Theory and Engineering (IJACTE) _______________________________________________________________________________________________ Conference on Electrical and Control Engineering (ICECE), pp. 3667-3670. IEEE, 2011. [13] [14] [15] [16] [17] Lu, Mingzhu, C. P. Chen, J. Huo, and X. Wang, "Optimization of combined kernel function for svm based on large margin learning theory." In IEEE International Conference on Systems, Man and Cybernetics, 2008. SMC 2008., pp. 353-358. IEEE, 2008. G. Mak, "The implementation of support vector machines using the sequential minimal optimization algorithm." PhD diss., McGill University, 2000. Mu, Xiangyang, and Y. Zhou. "A Novel Gaussian Kernel Function for Minimax Probability Machine." In Intelligent Systems, 2009. GCIS'09. WRI Global Congress on, vol. 3, pp. 491-494. IEEE, 2009. Muller, K., S. Mika, G. Ratsch, Koji Tsuda, and Bernhard Scholkopf. "An introduction to kernelbased learning algorithms." IEEE Transactions on Neural Networks, 12, no. 2: 181-201, 2001. J. C. Platt, "Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines", Microsoft Research 1998 [19] Smits, F. Guido and E. M. Jordaan. "Improved SVM regression using mixtures of kernels." In Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint Conference on, vol. 3, pp. 2785-2790. IEEE, 2002. [20] Song, Huazhu, Zichun Ding, C. Guo, Z. Li, and H. Xia. "Research on combination kernel function of support vector machine." In International Conference on Computer Science and Software Engineering, 2008, vol. 1, pp. 838841. IEEE, 2008. [21] Vapnik, Vladimir N. "An overview of statistical learning theory."Neural Networks, IEEE Transactions on 10, no. 5: 988-999,1999. [22] Van Luxburg, Ulrike, and B. Schölkopf. "Statistical learning theory: Models, concepts, and results." arXiv preprint arXiv:0810.4752, 2008. [23] Yu, Hwanjo, and S. Kim. "SVM Tutorial— Classification, Regression and Ranking." In Handbook of Natural Computing, pp. 479-506. Springer Berlin Heidelberg, 2012. [24] Zhang, Rui, and X. Duan. "A new compositional kernel method for multiple kernels." In Computer Design and Applications (ICCDA), 2010 International Conference on, vol. 1, pp. V1-27. IEEE, 2010. [18] Schölkopf, Bernhard, and A. J. Smola, “Learning with kernels: support vector machines, regularization, optimization, and beyond”. MIT press, 2002. _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -3, Issue -5, 2014 24