The 10th International PSU Engineering Conference May 14-15, 2012 Comparison of Support Vector Machine’s Kernel Function for Unsmoke Sheet Rubber Price Forecasting Abstract The development of natural rubber price forecasting model is the interesting and important topic for all stakeholders in automobile tire industry. About 6-36 percentages of automobile tires are the natural rubber especially unsmoke sheet rubber. Nevertheless the factors influence the natural rubber price regarded as fluctuated value and non-stationary dataset is also studied problem. Therefore the selection of technical forecasting model which is appropriate for such a factors data type is needed to be concerned. Support vector machine technique has potential to be served as a powerful tool for making forecasting model on the non-stationary of unsmoke sheet rubber price data. In additional, SVM-based forecasting model necessitates the selection of appropriated kernel function and constraint. The work presented in this paper aims to examine the effect of different kernel function type, namely, linear, polynomial, sigmoid, and radial basis function. The forecasting of unsmoke sheet rubber price has been done against the monitored time series database of forward contract rubber price, crude oil price, and foreign currency exchange rate. The effect of constraint is also studied. The experiment shows that the radial basis kernel function grants the highest recognition rate for the unsmoke sheet rubber price forecasting. Keywords: Price forecasting model, Unsmoke sheet rubber, Support vector machine, Kernel Function, Radial Basis Function 1. Introduction In automobile tire industry, price of unsmoke sheet rubber (USS rubber) is initial raw material cost of their supply chain. Therefore precisely multipleperiod-ahead forecasting price is major supporting information of the significant strategies. The strategic planning allow business to provide a wide range of advantage in business operating activities [1] for instance pricing management, inventory management, treasury management, investment budget, etc. As a result, it led to USS rubber price forecasting model become one of the challenging applications of modern time series forecasting. Now and then, statistical forecasting model has been widely used for time series natural rubber price forecasting e.g. Box and Jenkins’s Auto Regressive Integrated Moving Average (ARIMA) technique, Holt-Winters Smoothing with no Seasonal, Simple Exponential Smoothing (SES), and Ordinary Least Square (OLS). They, however, are general univariate model and developed based on the assumption that forecasted times series are linear and stationary [2-5]. Moreover, only short-term forecasting is applicable. In recent year, Artificial Neural Networks (ANNs) was introduced to time series forecasting field [6-8]. It was domain in term of the pattern classification and pattern recognition. ANNs performs better result than statistical models on natural rubber price forecasting, specifically, for more irregular series, for multivariate data and for multiple-period-ahead forecasting [9-12]. Although ANNs show in better performance, there are several disadvantages: (i) the dependency on large number of parameters, e.g. network size, learning parameters and initial weight chosen, (ii) possibility of being trapped into local minima resulting in a very slow convergence, and (iii) over-fitting on training data resulting in poor generalization ability. Later Support Vector Machine (SVM) has emerged as a new and powerful technique for learning from data in many fields. SVM is in particular for solving classification and regression problems with better performance [13-16]. The main advantage of SVM is its ability to minimize structure risk as opposed to empirical risk minimization as employed by ANNs technique. SVM technique is better to be corporate with non-stationary, no obvious trend and no seasonality data. It is compliance with data pattern of USS rubber price and their influential factors. The aim of this paper is to investigate the effect of various kernel functions type on predicting USS rubber price at Hatyai’s central rubber market. The forecasting performance is evaluated recognition rate in term of upward or downward trend of the USS rubber market price. The effect of selecting regularization parameter of SVMs on prediction error is also investigated. 2. Support Vector Machine Support Vector Machine is one of Kernel-based classifiers which represent a major development in machine learning algorithms. It represents extension to nonlinear models of generalized portrait algorithm developed by Vladimir Vapnik [17]. Support vector machines (SVMs) are a group of supervised learning methods that can be applied to classification (pattern recognition) and regression (function approximation). The goal of SVM modeling is to find a maximum margin hyperplane to separate the classes obviously. They maximize the distance of the hyperplane from the nearest training examples to separate clusters of vector. They separate cases with one category of the target variables are on one side of the plane and cases with the other category are on the other size of the plane. Consequently the hyperplane obtained is called the optimal separating hyperplane (OSH). Training examples that are closest to the maximum margin hyperplane are called support vectors. A unique feature of SVMs is that they are resistant to the over-fitting problem. This is because SVM implements the structural risk minimization principle whereas ANN implements the empirical risk minimization principle. The former seeks to minimize the misclassification error or deviation from correct solution of the training data, but the later searches to minimize an upper bound of generalization error. Furthermore, SVMs possess the well-known ability of being universal approximators of any multivariate function to any desired degree of accuracy. From this reason, they are of particular interest for modeling the unknown, partially known, highly nonlinear, complex systems, plants or processes [18]. 2.1 Support Vector Classification Model If the data is linearly separable, a hyperplane separating binary decision classes in the two attribute case can be represented as the following equation: y = w0 + w1x1 + w2x2 (1) where y is the outcome, xi are the attribute values, and there are three weights wi to be learned by the learning algorithm. Whereas the maximum margin hyperplane can be represented as following equation in terms of the support vectors: y = b + Σαi yi x(i) x (2) where y is the class value of training example x(i), the vector x represents a test example, the vectors x( i) are the support vectors and · represents the dot product. In this equation, b and αi are parameters that determine the hyperplane. Finding the support vectors and determining the parameters b and αi is equivalent to solve a linearly constrained quadratic programming problem. If the data is not linearly separable, as in this case, SVM transforms inputs into high-dimensional feature space. This is done by using a kernel function as follows: y = b + Σαi yi K(x(i),x) (3) 2.2 Parameter and Kernel Function SVM is a kernel-based algorithm. A kernel is a function that transforms the input data to a highdimensional space where the problem is solved. The kernel functions can be linear or nonlinear. In the “simplest” pattern recognition tasks, SVMs use a linear separating hyperplane to create a classifier with a maximal margin. In order to do that, the learning problem for the SVMs will be cast as constrained nonlinear optimization problem. In this setting the cost function will be quadratic and the constraints linear i.e. one will have to solve a classic quadratic programming problem. SVM models have a cost parameter, C, that controls the tradeoff between allowing training errors and forcing rigid margins to allow some flexibility in separating the categories. Then it creates a soft margin that permits some misclassifications as shown in Figure 1. Increasing the value of C increases the cost of misclassifying points and forces the creation of a more accurate model that may not generalize well. Figure 1. Trading off error by parameter C. In cases when given classes cannot be linearly separated in original input space, SVMs transforms the original input space into a higher dimensional feature space. This transformation can be achieved by using various nonlinear mapping such as polynomial, sigmoid tanh, radial basis as shown in Figure 2. After the specific nonlinear transformation, nonlinearly separable problems in the input space can become linearly separable problems in a feature space [19]. The kernel functions used in this study are the followings. Linear : K(xi,xj) = <xi xj> Polynomial : K(xi,xj) = (<xi xj> + 1)d Gaussian RBF : K(xi,xj) = exp(- xi - xj2) Sigmoid Tanh : K(xi,xj) = tanh (α <xi xj> + ) Figure 2. Mapping data from low to high dimension by kernel function. 3. Research Methodology In this study, we experimented with 4 influent factors against USS rubber price at Hatyai’s central rubber market. The factors are daily price-differential time series database of the forward contract rubber (RSS#3) price at Tokyo market (TOCOM), crude oil price at New York market (NYMEX). The remaining factors are foreign currency exchange rate of Thai Baht per US Dollar and Japanese Yen. We used data only data at working day of Hatyai’s central rubber market. Total 1,333 dataset coming from the period of January 2005 to December 2010 was represented to classify upward/downward trend of the daily rubber market price. The available dataset was divided into training and testing set. We experimented with four different kernel function type, namely linear, sigmoid tanh, radial basis function and polynomial. The objective is to investigate the effect of kernel type and determine the most suitable kernel function for such a particular dataset. In the first set of experiment, we collect and rearrange all daily price change from previous day of input and output data. These dataset are utilized as training and testing dataset which be prepared under two different kinds of data classification assumption. They consist of 5 classes and 8 classes. These were classified upon the non-concerned and concerned assumption about input data’s behavior affecting USS rubber price changes’ range respectively. The reason why we do that is to demonstrate the effective of each SVMs generated model comparatively. A total of 1,333 daily dataset will be allocated as training and testing data in term of 2 assumptions. The first assumption is to test efficiency of model which be generated by their own training data – it was called “Known test set”. All daily dataset served as training dataset and some parts of training dataset were selected randomly as testing dataset. The ratio of splitting for random testing is 15%. The second assumption’s aim is to evaluate result of generated model by distinguishable data between training and testing dataset. It was called “Unknown test set”. The ratio of splitting for training and testing is 80:20 Under experimental stage, the four different kernel functions, namely linear, sigmoid tanh, radial basis, and polynomial are used for model generating. The parameter optimization (constraint-C) is also tested. The arranged training dataset under 5-class and 8-class assumption will be trained by each 4 kernel functions with vary range of C parameter (C = 1, 10, 100 and 1,000). The 64 forecasting models will be generated. Then the accurate evaluation of each generated model will be done by Mean Square Error (MSE) method over two types of the testing data assumption way. 4. Result and Discussion In the first set of our experiment, all arranged vectors of 1,333 dataset were trained to generate the forecasting model. They were trained over 4 types of SVM’s kernel mapping function with vary of the regularization parameter C. From this part, the 64 forecasting models were generated upon the training dataset under 5-class and 8-class assumption. Then the prediction performance which was measured in term of “the percentage of accuracy on test set” is demonstrated as following Figure 3 & 4. We found that the polynomial at all C value took long time for training. The sigmoid kernel function at C-100 and C1,000 is also equal performed. The linear, RBF at all C value and sigmoid at C-1 and C-10 are suitable for such a data type to classify the upward or downward trend of USS rubber price. Figure 3. % Accuracy on Known and Unknown Test Set (5 Classes) Figure 4. % Accuracy on Known and Unknown Test Set (8 Classes) Refer to the transformed result of Figure 3 to be Figure 5 in term of bar chart; it was shown the illustration of the generating model’s efficiency. This model was done over 5-class assumption gathering with using all daily dataset served as training dataset. Meanwhile the 15% of training dataset were selected randomly as testing dataset. It was called “Known test set”. The experiment demonstrated that the radial basis function (RBF) at C-1,000 obtained the best result (55.03%). Linear at C-10 and sigmoid at C-1 performed lower accuracy percentage in the result of 31.50% and 17.47%, respectively. On the other sides, the experiment was taken under the same training dataset as before however, the testing dataset was discriminated from training dataset in ratio of 80:20. It was called “Unknown test set”. The RBF at C-1 obtained the best result (33.22%) as shown in Figure 6. Linear at C-100 and sigmoid at C-10 performed lower accuracy percentage in the result of 28.26% and 18.66%, respectively. Figure 5. % Accuracy on Known Test Set (5 Classes) Figure 8. % Accuracy on Unknown Test Set (8 Classes) 5. Conclusion Figure 6. % Accuracy on Unknown Test Set (5 Classes) From the transformed result of Figure 4 to be Figure 7 in term of bar chart, the experiment showed the result over 8-class assumption training set. It was shown that the RBF with regularization parameter C1,000 gained better performance than others on the known testing dataset (65.16%). Besides, its accuracy percentage was better than one of 5-class assumption generated model (55.03%) as well. Linear at C-10 and sigmoid at C-10 performed lower accuracy percentage in the result of 39.59% and 38.77%, respectively. Meanwhile the RBF at C-100 gained as the best result as generating model trained with RBF at C-1,000 over the unknown testing dataset (48.24%) as shown in Figure 8. This accuracy percentage is also performed better than 5-class assumption generating model (33.22%). Linear at C-1,000 and sigmoid at C-10 performed lower accuracy percentage in the result of 35.45% and 37.80%, respectively. Figure 7. % Accuracy on Known Test Set (8 Classes) This paper investigates the performance of support vector machine for making upward/downward trend prediction of USS rubber price at Hatyai’s central rubber market. It was examined in term of kernel function type and regularization parameter selection. The experiment demonstrated that the radial basis function dominates the best result in forecasting. It provided the best performance on decreasing the prediction error over either concerned or nonconcerned factor’s behavior assumption. Furthermore, the SVM model generated under concern-behavior assumption gained more accurate. While C-100 and C-1,000 produced equally good result. However, the percentage of prediction accuracy still is not in satisfied status. This outcome encourages us to analyze the suitable way to classify the appropriation of classification. Consequently, further improvement in prediction performance may be achieved by input feature reclassification. SVM classification should be reclassified more concentrate on each historical input features’ behavior affecting the range of USS rubber price change. The optimization of constraint value (regularization parameter) is also concerned and is currently under investigation. References [1] Gusuma, P. 2010. Financial Management in Industrial estate, Bangkok Press Center, Bangkok, Thailand. [2] Suppanunta, R. 2009. Forecasting Model of RSS3 Price in Future Market. Kasetsart University Journal of Economics, 16(1): 54-74. [3] Suwimon, T. 2008. Short-term Price Forecasting of RSS3 in Agricultural Future Exchange of Thailand, Chiang Mai University, Chiang Mai, Thailand. [4] Pagagrong, T. 2002. The Price forecasting of Agricultural Products in Agricultural Future Exchange of Thailand, Kasetsart University, Bangkok, Thailand. [5] Aat, P. 2007. Direction and Alteration of Thai Natural Rubber Industries in the Next Five Years. Chamber of Commerce University Journal of Academy, 27(3): 91-119. [6] Wu, S.Jr., Han, J., Annambhotla, S. and Bryant, S. 2005. Artificial Neural Networks for Forecasting Watershed Runoff and Stream Flows. Journal of Hydrologic Eng, 5: 216-222. [7] Haoffi, Z., Guoping, X., Fagting, Y. and Han, Y. 2007. A Neural Network Model Based on The Multi-stage Optimization Approach for Short-term Food Price Forecasting in China. Expert Syst. Appl, 33: 347-356. [8] Zou, H.F., Xia, G.P., Yang, F.T. and Wang, H.Y. 2007. An Investigation and Comparison of Artificial Neural Network and Time Series Models for Chinese Food Grain Price Forecasting. Neurocomputing, 70: 2913-2923. [9] Porntip, C. 2005. The Comparison of Rubber Price Forecasting Using Box and Jenkins, Transfer Function, and Artificial Neural Networks. King Mongkut’s University of Technology Ladgabang, Bangkok, Thailand. [10]Panida, S. and Jitian, X. 2008. Developing Forecasts for Thai Rubber Latex Prices: NonNeural Network Training and Neural Network Training Approaches, School of Computer and Security Science, Edith Cowan University, Perth, Western Australia. [11]Pongsiri, S., Pranee, N. and Suda, T. 2007. Time Series Forecasting Using a Combined ARIMA and Artificial Neural Network Model. Proceedings of National Academic Research Conference, Bangkok, Thailand, 2007: 1-7. [12]Jarumon, N., Payong, M. and Srimaj, W. 2009. The Comparison Study Techniques on Time Series Prediction of Rubber Price using Artificial Neural Networks, Polynomial Regression and Support Vector Regression. The 14th National Graduate Research Conference, King Mongkut’s University of Technology North Bangkok, Thailand, Sep. 10-11, 2009. [13]Samsudin, R., Shabri, A. and Saad, P. 2010. A Comparison of Time Series Forecasting using Support Vector Machine and Artificial Neural Network Model. Journal of Applied Sciences, 10(11): 950-958. [14]Wei, Z.L. and Wen, J.W. 2005. Potential Assessment of the “Support Vector Machine” Method in Forecasting Ambient Air Pollutant Trends. Proceedings of the Chemosphere, China, 2005: 693-701. [15]Wei, H., Yoshiteru, N. and Shou, Y.W. 2005. Forecasting Stock Market Movement Direction with Support Vector Machine. Proceedings of the Computer & Operations Research, China, 2005: 2513-2522. [16]Rohit, C. and Kumkum, G. 2008. A Hybrid Machine Learning System for Stock Market Forecasting. Proceedings of the World Academy of Science, Engineering and Technology, 2008: 315-318. [17]Vapnik, V. 1995. The Nature of Statistical Learning Theory, Springer-Verlag, New York. [18]Kecman, V. 2005. Support Vector Machines – An Introduction, StudFuzz, Springer-Verlag Berlin Heidelberg. [19]Wang, L. 2005. Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing), Springer, New York.