Supplementary II: The setup of SVM model Description

advertisement
Supplementary II: The setup of SVM model
Description: Supplementary II describes the theoretical derivation of SVM model.
Supposed that the regression function can be written in Equation A.1,
𝑓(π‘₯) = w βˆ™ x + b
Eq.(A. 1)
For training sample set(x1,y1),(x2,y2)...(xl,yl), Equation A.2 can be achieved as below
when the intensive function ε is adopted as loss function .
Lπœ€ (f(x𝑖 ) − 𝑦𝑖 ) = {
0,
|𝑓(π‘₯𝑖 ) − 𝑦𝑖 | < πœ€
|𝑓(π‘₯𝑖 ) − 𝑦𝑖 | − πœ€, |𝑓(π‘₯𝑖 − 𝑦𝑖 | ≥ πœ€
Eq.(A.2)
So the hyper plane (π‘€βˆ™π‘₯ + b= 0) can be constructed and samples can be divided.
Then, the linear regression problem can be transformed into optimization problem,
which can be described in Equation A.3.
min
1
‖𝑀‖2
2
s. t. (w βˆ™ x𝑖 ) + 𝑏 − 𝑦𝑖 ≤ πœ€ ∗ , 𝑖 = 1, … , 𝑙
Eq.(A. 3)
y𝑖 − (w βˆ™ x𝑖 ) + 𝑏 ≤ πœ€∗ , 𝑖 = 1, … , 𝑙
Considering some potential errors, we introduce two relaxing factors πœ‰π‘– , πœ‰π‘–∗ ≥ 0, 𝑖 =
1, … , 𝑙. Now the optimization function can be written in Equation A.4.
1
min 2 ‖𝑀‖2 + 𝐢 ∑𝑙𝑖=1(πœ‰π‘– + πœ‰π‘–∗ )
S.t. (𝑀 βˆ™ π‘₯𝑖 ) + 𝑏 − 𝑦𝑖 ≤ πœ€ ∗ + πœ‰π‘– , i=1,...l,
Eq.(A.4)
𝑦𝑖 − (𝑀 βˆ™ π‘₯𝑖 ) − 𝑏 ≤ πœ€ ∗ + πœ‰π‘–∗ , 𝑖 = 1, … 𝑙,
πœ‰π‘– , πœ‰π‘–∗ ≥ 0,
𝑖 = 1, … , 𝑙.
Where, constant C is used as penalty factor, which controls the tradeoff between the
1
complexity of the decision function and the number of training examples
miscalculated.
Then, to construct the Lagrange Equation based on Equation A.4.
1
𝐿(𝑀, 𝑏, πœ‰, 𝛼) = 2 ‖𝑀‖2 + 𝐢 ∑𝑙𝑖=1(πœ‰π‘– + πœ‰π‘–∗ ) − ∑𝑙𝑖=1 𝛼𝑖 (πœ€ + πœ‰π‘– + 𝑦𝑖 − (𝑀 βˆ™ π‘₯𝑖 ) − 𝑏) −
∑𝑙𝑖=1 π‘Žπ‘–∗ (ε + πœ‰π‘–∗ − 𝑦𝑖 + (𝑀 βˆ™ π‘₯𝑖 )) + 𝑏) − ∑𝑙𝑖=1(πœ†π‘– πœ‰π‘– + πœ†∗𝑖 πœ‰π‘–∗ )
Eq.(A.5)
Where 𝛼𝑖∗ , πœ†π‘– πœ†∗𝑖 are Lagrange multiplier.
In order to ensure Equation A.5 to get minimum , the results of differentiating
Equation A.5 with w, b, πœ‰π‘– should be 0 as shown in Equation A.6.
∂L
∂w
= 0 → 𝑀 − ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) βˆ™ π‘₯𝑖 = 0
∂L
= 0 → ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) = 0
∂b
∂L
∂ξ𝑖
∂L
∂ξ∗𝑖
= 0 → 𝐢 − 𝛼𝑖 − πœ† 𝑖 = 0
Eq.(A.6)
= 0 → 𝐢 − 𝛼𝑖∗ − πœ†∗𝑖 = 0
When plugging Equation A.6 into Equation A.5, it transforms into a quadratic
programming problem as described in Equation A.7.
𝑙
𝑙
𝑙
𝑖,𝑗=1
𝑖=1
𝑖=1
1
min ∑ (𝛼𝑖 − 𝛼𝑖∗ )(𝛼𝑗 − 𝛼𝑗∗ )(π‘₯𝑖 βˆ™ π‘₯𝑗 ) + ∑ 𝛼𝑖 (πœ€ − 𝑦𝑖 ) + ∑ 𝛼𝑖∗ (πœ€ + 𝑦𝑖 )
2
s.t.
∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) = 0
0 ≤ 𝛼𝑖 ≤ 𝐢, 𝑖 = 1, … , 𝑙
Eq.(A.7)
0 ≤ 𝛼𝑖∗ ≤ 𝐢, 𝑖 = 1, … , 𝑙
Thus, the support vector regression problem can be treated as quadratic
programming problem and the Lagrangian multiplier αi and α*i can be calculated. At
the same time, w also can be achieved using the training samples and Lagrangian
multiplier.
2
w = ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ ) βˆ™ π‘₯𝑖
Eq.(A.8)
So,the linear regression function can be achieved as follow:
𝑓(π‘₯) = ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ )(π‘₯, π‘₯𝑖 ) + 𝑏
Eq.(A.9)
For the non-linear regression problems, x in the primary X variable space can be
mapped into the higher-dimensional Hilbert space F using non-linear transform
method (x→z=∅(x) and transform into a new linear regression problem. That is to
solve the linear regression problem Z=∅(x). So Equation A.5 can be written as below:
1
𝐿(𝑀, 𝑏, πœ‰, 𝛼) = 2 ‖𝑀‖2 + 𝐢 ∑𝑙𝑖=1(πœ‰π‘– + πœ‰π‘–∗ ) − ∑𝑙𝑖=1 𝛼𝑖 (πœ€ + πœ‰π‘– + 𝑦𝑖 − (𝑀 βˆ™ 𝑧𝑖 ) − 𝑏) −
∑𝑙𝑖=1 π‘Žπ‘–∗ (ε + πœ‰π‘–∗ − 𝑦𝑖 + (𝑀 βˆ™ 𝑧𝑖 )) + 𝑏) − ∑𝑙𝑖=1(πœ†π‘– πœ‰π‘– + πœ†∗𝑖 πœ‰π‘–∗ )
Where,
𝑧𝑖𝑇 βˆ™ 𝑧𝑗 = ∅(π‘₯𝑖 )𝑇 βˆ™ ∅(π‘₯𝑗 ) = 𝐾(π‘₯𝑖 , π‘₯𝑗 )
and
𝐾(π‘₯𝑖 , π‘₯𝑗 )
is
(10)
named
Kernel
function. Therefore, the decision function for the non-linear regression problem can
be defined as follow:
f(x) = ∑𝑙𝑖=1(𝛼𝑖 − 𝛼𝑖∗ )𝐾(π‘₯, π‘₯𝑖 ) + 𝑏 (11)
where,
b = 𝑦𝑗 − ∑𝑙𝑖=1(𝛼𝑖∗ − 𝛼𝑖 )𝐾(π‘₯𝑖 , π‘₯𝑗 )+ε (12)
The complete algorithm of the SVM for dealing with nonlinear regression problems is
as follows.
(1) Select parameters ε > 0 and C >0 and the appropriate kernel K(xi, xj) and construct
the following optimization problem (Equation A.13):
𝑙
𝑙
𝑙
𝑖,𝑗=1
𝑖=1
𝑖=1
1
π‘šπ‘–π‘› ∑ (𝛼𝑖∗ − 𝛼𝑖 )(𝛼𝑗∗ − 𝛼𝑗 )𝐾(π‘₯𝑖 βˆ™ π‘₯𝑗 ) + πœ€ ∑(𝛼𝑖∗ + 𝛼𝑖 ) − ∑ 𝑦𝑖 (𝛼𝑖∗ − 𝛼𝑖 )
2
𝑙
𝑠. 𝑑. ∑(𝛼𝑖∗ − 𝛼𝑖 ) = 0
𝑖=1
3
𝐢
0 ≤ 𝛼𝑖 , 𝛼𝑖∗ ≤ 𝑙 , 𝑖 = 1,2, … , 𝑙
Μ…Μ…Μ…1∗ , … , 𝛼̅𝑙 οΌŒπ›Ό
̅̅̅𝑙∗ )𝑇 οΌ›
Obtain the optimal solution 𝛼̅ (∗) = οΌˆπ›ΌΜ…1 οΌŒπ›Ό
(2) Construction the decision function:
𝑙
𝑓(π‘₯) = ∑𝑖=1(𝛼̅𝑖∗ − 𝛼̅𝑖 ) 𝐾(π‘₯𝑖 , π‘₯) + 𝑏 ∗
Where 𝑏 ∗ = 𝑦𝑖 − ∑
𝑙
Eq.(A.14)
(𝛼̅𝑗∗ − 𝛼̅𝑗 )𝐾(π‘₯𝑖 βˆ™ π‘₯𝑗 ) ± πœ€, the positive sign is selected when
𝑗=1
𝐢
𝐢
0 < 𝛼̅𝑗 < 𝑙 , and the negative sign is selected when 0 < 𝛼̅𝑗∗ < 𝑙 .
The radial basis function(RBF) kernel is a reasonable first choice in the practical
application of SVM (Yan et al., 2014; Zhu et al., 2014). There are two parameters in
this function: γ and C, which is the kernel parameter and the penalty parameter of the
error classification, respectively.
RBF is defined as:
K(x, π‘₯𝑖 ) = exp(−𝛾‖π‘₯ − π‘₯𝑖 β€–2 ) , 𝛾 > 0
(15)
The selection of the γ and penalization parameters C have a great influence on the
performance of SVM. In order to obtain the best parameters, the grid searching
method was adopted within the range between 2-4 to 24. The mean-squared-error
(MSE) between the true value and predicted value is used for evaluate the model's
performance. The model with lowest MSE is treated as the best one:
MSE =
2
∑𝑁
𝑖=1(π‘Œπ‘š −π‘Œπ‘ )
𝑁
4
(16)
Download