Supplementary II: The setup of SVM model Description: Supplementary II describes the theoretical derivation of SVM model. Supposed that the regression function can be written in Equation A.1, π(π₯) = w β x + b Eq.(A. 1) For training sample setοΌx1,y1),(x2,y2)...(xl,yl), Equation A.2 can be achieved as below when the intensive function ε is adopted as loss function . Lπ (f(xπ ) − π¦π ) = { 0, |π(π₯π ) − π¦π | < π |π(π₯π ) − π¦π | − π, |π(π₯π − π¦π | ≥ π Eq.(A.2) So the hyper plane (π€βπ₯ + b= 0) can be constructed and samples can be divided. Then, the linear regression problem can be transformed into optimization problem, which can be described in Equation A.3. min 1 βπ€β2 2 s. t. (w β xπ ) + π − π¦π ≤ π ∗ , π = 1, … , π Eq.(A. 3) yπ − (w β xπ ) + π ≤ π∗ , π = 1, … , π Considering some potential errors, we introduce two relaxing factors ππ , ππ∗ ≥ 0, π = 1, … , π. Now the optimization function can be written in Equation A.4. 1 min 2 βπ€β2 + πΆ ∑ππ=1(ππ + ππ∗ ) S.t. (π€ β π₯π ) + π − π¦π ≤ π ∗ + ππ , i=1,...l, Eq.(A.4) π¦π − (π€ β π₯π ) − π ≤ π ∗ + ππ∗ , π = 1, … π, ππ , ππ∗ ≥ 0, π = 1, … , π. Where, constant C is used as penalty factor, which controls the tradeoff between the 1 complexity of the decision function and the number of training examples miscalculated. Then, to construct the Lagrange Equation based on Equation A.4. 1 πΏ(π€, π, π, πΌ) = 2 βπ€β2 + πΆ ∑ππ=1(ππ + ππ∗ ) − ∑ππ=1 πΌπ (π + ππ + π¦π − (π€ β π₯π ) − π) − ∑ππ=1 ππ∗ (ε + ππ∗ − π¦π + (π€ β π₯π )) + π) − ∑ππ=1(ππ ππ + π∗π ππ∗ ) Eq.(A.5) Where πΌπ∗ , ππ π∗π are Lagrange multiplier. In order to ensure Equation A.5 to get minimum , the results of differentiating Equation A.5 with w, b, ππ should be 0 as shown in Equation A.6. ∂L ∂w = 0 → π€ − ∑ππ=1(πΌπ − πΌπ∗ ) β π₯π = 0 ∂L = 0 → ∑ππ=1(πΌπ − πΌπ∗ ) = 0 ∂b ∂L ∂ξπ ∂L ∂ξ∗π = 0 → πΆ − πΌπ − π π = 0 Eq.(A.6) = 0 → πΆ − πΌπ∗ − π∗π = 0 When plugging Equation A.6 into Equation A.5, it transforms into a quadratic programming problem as described in Equation A.7. π π π π,π=1 π=1 π=1 1 min ∑ (πΌπ − πΌπ∗ )(πΌπ − πΌπ∗ )(π₯π β π₯π ) + ∑ πΌπ (π − π¦π ) + ∑ πΌπ∗ (π + π¦π ) 2 s.t. ∑ππ=1(πΌπ − πΌπ∗ ) = 0 0 ≤ πΌπ ≤ πΆ, π = 1, … , π Eq.(A.7) 0 ≤ πΌπ∗ ≤ πΆ, π = 1, … , π Thus, the support vector regression problem can be treated as quadratic programming problem and the Lagrangian multiplier αi and α*i can be calculated. At the same time, w also can be achieved using the training samples and Lagrangian multiplier. 2 w = ∑ππ=1(πΌπ − πΌπ∗ ) β π₯π Eq.(A.8) SoοΌthe linear regression function can be achieved as followοΌ π(π₯) = ∑ππ=1(πΌπ − πΌπ∗ )(π₯, π₯π ) + π Eq.(A.9) For the non-linear regression problems, x in the primary X variable space can be mapped into the higher-dimensional Hilbert space F using non-linear transform method (x→z=∅(x) and transform into a new linear regression problem. That is to solve the linear regression problem Z=∅(x). So Equation A.5 can be written as below: 1 πΏ(π€, π, π, πΌ) = 2 βπ€β2 + πΆ ∑ππ=1(ππ + ππ∗ ) − ∑ππ=1 πΌπ (π + ππ + π¦π − (π€ β π§π ) − π) − ∑ππ=1 ππ∗ (ε + ππ∗ − π¦π + (π€ β π§π )) + π) − ∑ππ=1(ππ ππ + π∗π ππ∗ ) Where, π§ππ β π§π = ∅(π₯π )π β ∅(π₯π ) = πΎ(π₯π , π₯π ) and πΎ(π₯π , π₯π ) is (10) named Kernel function. Therefore, the decision function for the non-linear regression problem can be defined as followοΌ f(x) = ∑ππ=1(πΌπ − πΌπ∗ )πΎ(π₯, π₯π ) + π (11) where, b = π¦π − ∑ππ=1(πΌπ∗ − πΌπ )πΎ(π₯π , π₯π )+ε (12) The complete algorithm of the SVM for dealing with nonlinear regression problems is as follows. (1) Select parameters ε > 0 and C >0 and the appropriate kernel K(xi, xj) and construct the following optimization problem (Equation A.13): π π π π,π=1 π=1 π=1 1 πππ ∑ (πΌπ∗ − πΌπ )(πΌπ∗ − πΌπ )πΎ(π₯π β π₯π ) + π ∑(πΌπ∗ + πΌπ ) − ∑ π¦π (πΌπ∗ − πΌπ ) 2 π π . π‘. ∑(πΌπ∗ − πΌπ ) = 0 π=1 3 πΆ 0 ≤ πΌπ , πΌπ∗ ≤ π , π = 1,2, … , π Μ Μ Μ 1∗ , … , πΌΜ π οΌπΌ Μ Μ Μ π∗ )π οΌ Obtain the optimal solution πΌΜ (∗) = οΌπΌΜ 1 οΌπΌ (2) Construction the decision function: π π(π₯) = ∑π=1(πΌΜ π∗ − πΌΜ π ) πΎ(π₯π , π₯) + π ∗ Where π ∗ = π¦π − ∑ π Eq.(A.14) (πΌΜ π∗ − πΌΜ π )πΎ(π₯π β π₯π ) ± π, the positive sign is selected when π=1 πΆ πΆ 0 < πΌΜ π < π , and the negative sign is selected when 0 < πΌΜ π∗ < π . The radial basis function(RBF) kernel is a reasonable first choice in the practical application of SVM (Yan et al., 2014; Zhu et al., 2014). There are two parameters in this function: γ and C, which is the kernel parameter and the penalty parameter of the error classification, respectively. RBF is defined asοΌ K(x, π₯π ) = exp(−πΎβπ₯ − π₯π β2 ) , πΎ > 0 (15) The selection of the γ and penalization parameters C have a great influence on the performance of SVM. In order to obtain the best parameters, the grid searching method was adopted within the range between 2-4 to 24. The mean-squared-error (MSE) between the true value and predicted value is used for evaluate the model's performance. The model with lowest MSE is treated as the best one: MSE = 2 ∑π π=1(ππ −ππ ) π 4 (16)