Supplementary II: The setup of SVM model Description

Supplementary II: The setup of SVM model Description: Supplementary II describes the theoretical derivation of SVM model. Supposed that the regression function can be written in Equation A.1, 𝑓(𝑥) = w ∙ x + b Eq.(A. 1) For training sample set（x1,y1),(x2,y2)...(xl,yl), Equation A.2 can be achieved as below when the intensive function ε is adopted as loss function . L𝜀 (f(x𝑖 ) − 𝑦𝑖 ) = { 0, |𝑓(𝑥𝑖 ) − 𝑦𝑖 | < 𝜀 |𝑓(𝑥𝑖 ) − 𝑦𝑖 | − 𝜀, |𝑓(𝑥𝑖 − 𝑦𝑖 | ≥ 𝜀 Eq.(A.2) So the hyper plane (𝑤∙𝑥 + b= 0) can be constructed and samples can be divided. Then, the linear regression problem can be transformed into optimization problem, which can be described in Equation A.3. min 1 ‖𝑤‖2 2 s. t. (w ∙ x𝑖 ) + 𝑏 − 𝑦𝑖 ≤ 𝜀 ∗ , 𝑖 = 1, … , 𝑙 Eq.(A. 3) y𝑖 − (w ∙ x𝑖 ) + 𝑏 ≤ 𝜀∗ , 𝑖 = 1, … , 𝑙 Considering some potential errors, we introduce two relaxing factors 𝜉𝑖 , 𝜉𝑖∗ ≥ 0, 𝑖 = 1, … , 𝑙. Now the optimization function can be written in Equation A.4. 1 min 2 ‖𝑤‖2 + 𝐶 ∑𝑙𝑖=1(𝜉𝑖 + 𝜉𝑖∗ ) S.t. (𝑤 ∙ 𝑥𝑖 ) + 𝑏 − 𝑦𝑖 ≤ 𝜀 ∗ + 𝜉𝑖 , i=1,...l, Eq.(A.4) 𝑦𝑖 − (𝑤 ∙ 𝑥𝑖 ) − 𝑏 ≤ 𝜀 ∗ + 𝜉𝑖∗ , 𝑖 = 1, … 𝑙, 𝜉𝑖 , 𝜉𝑖∗ ≥ 0, 𝑖 = 1, … , 𝑙. Where, constant C is used as penalty factor, which controls the tradeoff between the 1 complexity of the decision function and the number of training examples miscalculated. Then, to construct the Lagrange Equation based on Equation A.4. 1 𝐿(𝑤, 𝑏, 𝜉, 𝛼) = 2 ‖𝑤‖2 + 𝐶 ∑𝑙𝑖=1(𝜉𝑖 + 𝜉𝑖∗ ) − ∑𝑙𝑖=1 𝛼𝑖 (𝜀 + 𝜉𝑖 + 𝑦𝑖 − (𝑤 ∙ 𝑥𝑖 ) − 𝑏) − ∑𝑙𝑖=1 𝑎𝑖∗ (ε + 𝜉𝑖∗ − 𝑦𝑖 + (𝑤 ∙ 𝑥𝑖 )) + 𝑏) − ∑𝑙𝑖=1(𝜆𝑖 𝜉𝑖 + 𝜆∗𝑖 𝜉𝑖∗ ) Eq.(A.5) Where 𝛼𝑖∗ , 𝜆𝑖 𝜆∗𝑖 are Lagrange multiplier. In order to ensure Equation A.5 to get minimum , the results of differentiating Equation A.5 with w, b, 𝜉𝑖 should be 0 as shown in Equation A.6. ∂L ∂w = 0 → 𝑤 − ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) ∙ 𝑥𝑖 = 0 ∂L = 0 → ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) = 0 ∂b ∂L ∂ξ𝑖 ∂L ∂ξ∗𝑖 = 0 → 𝐶 − 𝛼𝑖 − 𝜆 𝑖 = 0 Eq.(A.6) = 0 → 𝐶 − 𝛼𝑖∗ − 𝜆∗𝑖 = 0 When plugging Equation A.6 into Equation A.5, it transforms into a quadratic programming problem as described in Equation A.7. 𝑙 𝑙 𝑙 𝑖,𝑗=1 𝑖=1 𝑖=1 1 min ∑ (𝛼𝑖 − 𝛼𝑖∗ )(𝛼𝑗 − 𝛼𝑗∗ )(𝑥𝑖 ∙ 𝑥𝑗 ) + ∑ 𝛼𝑖 (𝜀 − 𝑦𝑖 ) + ∑ 𝛼𝑖∗ (𝜀 + 𝑦𝑖 ) 2 s.t. ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) = 0 0 ≤ 𝛼𝑖 ≤ 𝐶, 𝑖 = 1, … , 𝑙 Eq.(A.7) 0 ≤ 𝛼𝑖∗ ≤ 𝐶, 𝑖 = 1, … , 𝑙 Thus, the support vector regression problem can be treated as quadratic programming problem and the Lagrangian multiplier αi and α*i can be calculated. At the same time, w also can be achieved using the training samples and Lagrangian multiplier. 2 w = ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) ∙ 𝑥𝑖 Eq.(A.8) So，the linear regression function can be achieved as follow： 𝑓(𝑥) = ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ )(𝑥, 𝑥𝑖 ) + 𝑏 Eq.(A.9) For the non-linear regression problems, x in the primary X variable space can be mapped into the higher-dimensional Hilbert space F using non-linear transform method (x→z=∅(x) and transform into a new linear regression problem. That is to solve the linear regression problem Z=∅(x). So Equation A.5 can be written as below: 1 𝐿(𝑤, 𝑏, 𝜉, 𝛼) = 2 ‖𝑤‖2 + 𝐶 ∑𝑙𝑖=1(𝜉𝑖 + 𝜉𝑖∗ ) − ∑𝑙𝑖=1 𝛼𝑖 (𝜀 + 𝜉𝑖 + 𝑦𝑖 − (𝑤 ∙ 𝑧𝑖 ) − 𝑏) − ∑𝑙𝑖=1 𝑎𝑖∗ (ε + 𝜉𝑖∗ − 𝑦𝑖 + (𝑤 ∙ 𝑧𝑖 )) + 𝑏) − ∑𝑙𝑖=1(𝜆𝑖 𝜉𝑖 + 𝜆∗𝑖 𝜉𝑖∗ ) Where, 𝑧𝑖𝑇 ∙ 𝑧𝑗 = ∅(𝑥𝑖 )𝑇 ∙ ∅(𝑥𝑗 ) = 𝐾(𝑥𝑖 , 𝑥𝑗 ) and 𝐾(𝑥𝑖 , 𝑥𝑗 ) is (10) named Kernel function. Therefore, the decision function for the non-linear regression problem can be defined as follow： f(x) = ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ )𝐾(𝑥, 𝑥𝑖 ) + 𝑏 (11) where, b = 𝑦𝑗 − ∑𝑙𝑖=1(𝛼𝑖∗ − 𝛼𝑖 )𝐾(𝑥𝑖 , 𝑥𝑗 )+ε (12) The complete algorithm of the SVM for dealing with nonlinear regression problems is as follows. (1) Select parameters ε > 0 and C >0 and the appropriate kernel K(xi, xj) and construct the following optimization problem (Equation A.13): 𝑙 𝑙 𝑙 𝑖,𝑗=1 𝑖=1 𝑖=1 1 𝑚𝑖𝑛 ∑ (𝛼𝑖∗ − 𝛼𝑖 )(𝛼𝑗∗ − 𝛼𝑗 )𝐾(𝑥𝑖 ∙ 𝑥𝑗 ) + 𝜀 ∑(𝛼𝑖∗ + 𝛼𝑖 ) − ∑ 𝑦𝑖 (𝛼𝑖∗ − 𝛼𝑖 ) 2 𝑙 𝑠. 𝑡. ∑(𝛼𝑖∗ − 𝛼𝑖 ) = 0 𝑖=1 3 𝐶 0 ≤ 𝛼𝑖 , 𝛼𝑖∗ ≤ 𝑙 , 𝑖 = 1,2, … , 𝑙 ̅̅̅1∗ , … , 𝛼̅𝑙 ，𝛼 ̅̅̅𝑙∗ )𝑇 ； Obtain the optimal solution 𝛼̅ (∗) = （𝛼̅1 ，𝛼 (2) Construction the decision function: 𝑙 𝑓(𝑥) = ∑𝑖=1(𝛼̅𝑖∗ − 𝛼̅𝑖 ) 𝐾(𝑥𝑖 , 𝑥) + 𝑏 ∗ Where 𝑏 ∗ = 𝑦𝑖 − ∑ 𝑙 Eq.(A.14) (𝛼̅𝑗∗ − 𝛼̅𝑗 )𝐾(𝑥𝑖 ∙ 𝑥𝑗 ) ± 𝜀, the positive sign is selected when 𝑗=1 𝐶 𝐶 0 < 𝛼̅𝑗 < 𝑙 , and the negative sign is selected when 0 < 𝛼̅𝑗∗ < 𝑙 . The radial basis function(RBF) kernel is a reasonable first choice in the practical application of SVM (Yan et al., 2014; Zhu et al., 2014). There are two parameters in this function: γ and C, which is the kernel parameter and the penalty parameter of the error classification, respectively. RBF is defined as： K(x, 𝑥𝑖 ) = exp(−𝛾‖𝑥 − 𝑥𝑖 ‖2 ) , 𝛾 > 0 (15) The selection of the γ and penalization parameters C have a great influence on the performance of SVM. In order to obtain the best parameters, the grid searching method was adopted within the range between 2-4 to 24. The mean-squared-error (MSE) between the true value and predicted value is used for evaluate the model's performance. The model with lowest MSE is treated as the best one: MSE = 2 ∑𝑁 𝑖=1(𝑌𝑚 −𝑌𝑝 ) 𝑁 4 (16)

Supplementary II: The setup of SVM model Description

Related documents

Products

Support

Supplementary II: The setup of SVM model Description

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib