Machine Learning Based Models for Time Series Prediction 2014/3 Outline Support Vector Regression Neural Network Adaptive Neuro-Fuzzy Inference System Comparison Support Vector Regression Basic Idea Given a dataset 𝐷 = (𝒙𝑖 , 𝑦𝑖 ) 1 ≤ 𝑖 ≤ 𝑁 , 𝒙𝑖 ∈ 𝑅 𝑛 , 𝑦𝑖 ∈ 𝑅 Our goal is to find a function 𝑓 𝒙 which deviates by at most 𝜀 from the actual target 𝑦𝑖 for all training data. The linear function case, the 𝑓 is in the form 𝑓 𝒙𝑖 = 𝒘, 𝒙𝑖 + 𝑏 ∙,∙ denotes the dot product in 𝑅𝑛 “Flatness” in this case means a small 𝒘 (less sensitive to the perturbations in the features). Therefore, we can write the problem as following 1 min Subject to 2 𝒘 2 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 𝜀 𝒘, 𝒙𝑖 + 𝑏 − 𝑦𝑖 ≤ 𝜀 f1(x) = <w, x> + b + ε f2(x) = <w, x> + b f3(x) = <w, x> + b - ε w +ε -ε Note that +ε and -ε are not actual geometric interpretation Soft Margin and Slack Variables 𝑓 approximates all pairs (𝒙𝑖 , 𝑦𝑖 ) with 𝜀 precision, however, we also may allow some errors. The soft margin loss function and slack variables were introduced to the SVR. 1 min 2 𝒘 2 +𝐶 + 𝑁 𝑖=1(ξ𝑖 + ξ− 𝑖 ) 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 𝜀 + ξ+ 𝑖 𝒘, 𝒙𝑖 + 𝑏 − 𝑦𝑖 ≤ 𝜀 + ξ− Subject to 𝑖 + − ξ𝑖 , ξ𝑖 ≥ 0 𝐶 is the regularization parameter which determines the trade-off between flatness and the tolerance of errors. − ξ+ 𝑖 , ξ𝑖 are slack variables which determine the degree of error that far from 𝜀-insensitive tube. The 𝜀-insensitive loss function ξ 𝜀 = 0, 𝑖𝑓 𝑦 − 𝑓 𝒘, 𝒙 ≤ 𝜀 𝑦 − 𝑓 𝒘, 𝒙 − 𝜀, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 + ξ w +ε - ξ 𝜀-insensitive loss function + ξ -ε - ξ -ε +ε Dual Problem and Quadratic Programs The key idea is to construct a Lagrange function from the objective (primal) and the corresponding constraints, by introducing a dual set of variables. The dual function has a saddle point with respect to the primal and dual variables at the solution. Lagrange Function − 𝐿𝑝𝑟𝑖𝑚𝑎𝑙 𝑤, 𝑏, ξ+ 𝑖 , ξ𝑖 = 𝑁 − 𝑖=1 𝛼𝑖 1 2 𝒘 2 +𝐶 𝜀 + ξ− 𝑖 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 +(−) Subject to 𝛼𝑖 +(−) ≥ 0 and 𝜇𝑖 + 𝑁 𝑖=1(ξ𝑖 − 𝑁 𝑖=1 ≥0 + ξ− 𝑖 )− 𝜇𝑖+ ξ+ 𝑖 + + 𝑁 𝑖=1 𝛼𝑖 𝜇𝑖− ξ− 𝑖 𝜀 + ξ+ 𝑖 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 − Taking the partial derivatives (saddle point condition), we get 𝜕L 𝜕𝐰 𝜕L 𝜕b 𝑁 𝑖=1 =0→𝒘= 𝑁 𝑖=1 =0→ 𝜕L 𝜕𝜉𝑖+ 𝜕L 𝜕𝜉𝑖− 𝛼𝑖+ − 𝛼𝑖− 𝒙𝑖 𝛼𝑖+ − 𝛼𝑖− = 0 = 0 → C − 𝛼𝑖+ − 𝜇𝑖+ = 0 = 0 → C − 𝛼𝑖− − 𝜇𝑖− = 0 The conditions for optimality yield the following dual problem: L𝑑𝑢𝑎𝑙 = − 1 2 Subject to 𝑁 𝑖=1 0 𝑁 𝑗=1 𝑁 𝑖=1 ≤ 𝛼𝑖+ 𝛼𝑖+ − 𝛼𝑖− 𝛼𝑗+ − 𝛼𝑗− 𝒙𝑖 , 𝒙𝑗 − 𝜀 𝛼𝑖− − 𝛼𝑖+ = 0 ≤ 𝐶, 0 ≤ 𝛼𝑖− ≤ 𝐶 𝑁 𝑖=1 𝛼𝑖+ + 𝛼𝑖− − 𝑁 𝑖=1 𝑦𝑖 𝛼𝑖− − 𝛼𝑖+ Finally, we eliminate dual variables by substituting the partial derivatives and we get 𝑓 𝒙 = 𝒘= + 𝑁 𝑖=1(𝛼𝑖 + 𝑁 𝑖=1(𝛼𝑖 − 𝛼𝑖− ) 𝒙, 𝒙𝑖 + 𝑏 − 𝛼𝑖− )𝒙𝑖 This is called “Support Vector Expansion” in which 𝒘 can be completely described as a linear combination of the training patterns 𝒙𝑖 . The function is represented by the SVs, therefore it’s independent of dimensionality of input space 𝑅 𝑛 , and depends only on the number of SVs. We will define the meaning of “Support Vector” later. Computing 𝛼𝑖+ and 𝛼𝑖− is a quadratic programming problem and the popular methods are shown below: Interior point algorithm Simplex algorithm Computing 𝑏 The parameter 𝑏 can be computed by KKT conditions (slackness), which state that at the optimal solution the product between dual variables and constrains has to vanish. 𝛼𝑖+ 𝜀 + ξ𝑖+ − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 = 0 𝛼𝑖− 𝜀 + ξ𝑖− + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 = 0 + + 𝜇𝑖+ ξ+ 𝑖 = C − 𝛼 𝑖 ξ𝑖 = 0 − − 𝜇𝑖− ξ− 𝑖 = C − 𝛼 𝑖 ξ𝑖 = 0 KKT(Karush–Kuhn–Tucker) conditions: KKT conditions extend the idea of Lagrange multipliers to handle inequality constraints. Consider the following nonlinear optimization problem: Minimizing F(𝑥) Subject to 𝐺𝑖 (𝑥) ≤ 0, 𝐻𝑗 𝑥 = 0, where 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑙 To solve the problem with inequalities, we consider the constraints as equalities when there exists critical points. The following necessary conditions hold: 𝑥 ∗ is the local minimum and there exists constants 𝜇𝑖 and 𝜆𝑗 called KKT multipliers. Stationary condition: 𝛻F 𝑥 ∗ + 𝑚 𝑖=1 𝜇𝑖 𝛻 𝐺𝑖 𝑥∗ + 𝑙 𝑗=1 𝜆𝑗 𝛻 𝐻𝑗 𝑥∗ = 0 (This is the saddle point condition in the dual problem.) Primal Feasibility: 𝐺𝑖 (𝑥 ∗ ) ≤ 0 and 𝐻𝑗 𝑥 ∗ = 0 Dual Feasibility: 𝜇𝑖 ≥ 0 Complementary slackness: 𝜇𝑖 𝐺𝑖 𝑥 ∗ = 0 (This condition enforces either 𝜇𝑖 to be zero or 𝐺𝑖 𝑥 ∗ to be zero) Original Problem: 1 2 +𝐶 + ξ− 𝑖 ) min 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 𝜀 + ξ+ 𝑖 Subject to 𝒘, 𝒙𝑖 + 𝑏 − 𝑦𝑖 ≤ 𝜀 + ξ− 𝑖 + − ξ𝑖 , ξ𝑖 ≥ 0 2 𝒘 + 𝑁 𝑖=1(ξ𝑖 Standard Form for KKT Objective 1 min 2 𝒘 2 +𝐶 𝑁 + 𝑖=1(ξ𝑖 + ξ− 𝑖 ) Constraints 𝜀 + ξ𝑖+ − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 ≥ 0 → −(𝜀 + ξ𝑖+ − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏) ≤ 0 𝜀 + ξ𝑖− + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≥ 0 → −(𝜀 + ξ𝑖− + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏) ≤ 0 ξ𝑖+ , ξ𝑖− ≥ 0 → −ξ𝑖+ , −ξ𝑖− ≤ 0 Complementary slackness condition: +(−) There exists KKT multipliers 𝛼𝑖 condition. +(−) and 𝜇𝑖 (Lagrange multipliers in 𝐿𝑝𝑟𝑖𝑚𝑎𝑙 ) that meet this 𝛼𝑖+ 𝜀 + ξ+ 𝑖 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 = 0 … (1) − 𝛼𝑖 𝜀 + ξ − 𝑖 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 = 0 … (2) + + 𝜇𝑖+ ξ+ 𝑖 = C − 𝛼𝑖 ξ𝑖 = 0 … (3) − − 𝜇𝑖− ξ− 𝑖 = C − 𝛼𝑖 ξ𝑖 = 0 … (4) From (1) and (2), we can get 𝛼𝑖+ 𝛼𝑖− = 0 From (3) and (4), we can see that for some 𝛼𝑖 +(−) = 𝐶, the slack variables can be nonzero. Conclusion: +(−) Only samples (𝒙𝑖 , 𝑦𝑖 ) with corresponding 𝛼𝑖 = 𝐶 lie outside the 𝜀-insensitive tube. 𝛼𝑖+ 𝛼𝑖− = 0, i.e. there can never be a set of dual variables 𝛼𝑖+ and 𝛼𝑖− which are both simultaneously nonzero. From previous page, we can conclude: , 𝑖𝑓𝛼𝑖+ < 𝐶 𝜀 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 ≥ 0 , ξ+ 𝑖 =0 𝜀 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 ≤ 0 , 𝑖𝑓𝛼𝑖+ > 0 − 𝜀 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≥ 0 , ξ− 𝑖 = 0 , 𝑖𝑓𝛼𝑖 < 𝐶 , 𝑖𝑓𝛼𝑖− > 0 𝜀 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 0 We can form inequalities in conjunction with the two sets of inequalities above max{𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝜀|𝛼𝑖+ < 𝐶 𝑜𝑟 𝛼𝑖− > 0} ≤ 𝑏 ≤ min{𝑦𝑖 − 𝒘, 𝒙𝑖 + 𝜀|𝛼𝑖+ > 0 𝑜𝑟 𝛼𝑖− < 𝐶} If some 𝛼𝑖 +(−) ∈ (0, 𝐶) the inequalities becomes equalities. Sparseness of the Support Vector Previous conclusion show that only for |𝑓(𝒙𝑖 ) − 𝑦𝑖 | ≥ 𝜀 the Lagrange multipliers can be nonzero. In other words, for all samples inside the 𝜀-insensitive tube the 𝛼𝑖+ and 𝛼𝑖− vanish. 𝒘= Therefore, we have a sparse expansion of 𝒘 in terms of 𝒙𝑖 and the samples that come with non-vanishing coefficients are called “Support Vectors”. 𝑁 + 𝑖=1(𝛼𝑖 − 𝛼𝑖− )𝒙𝑖 Kernel Trick The next step is to make the SV algorithm nonlinear. This could be achieved by simply preprocessing the training patterns 𝒙𝑖 by a map 𝜑. 𝜑 ∙ ∶ 𝑅𝑛 → 𝑅𝑛ℎ , 𝒘 ∈ 𝑅𝑛ℎ The dimensionality 𝑛ℎ of this space is implicitly defined. Example: 𝜑 ∶ 𝑅 2 → 𝑅3 , 𝜑 𝑥1 , 𝑥2 = (𝑥12 , 2𝑥1 𝑥2 , 𝑥22 ) It can easily become computationally infeasible The number of different monomial features (polynomial mapping) of degree p = The computationally cheaper way: 𝐾 𝒙, 𝒙𝑖 = 𝜑(𝒙), 𝜑(𝒙𝑖 ) Kernel should follow Mercer's condition 𝑛+𝑝−1 𝑝 In the end, the nonlinear function takes the form: 𝑁 (𝛼𝑖+ − 𝛼𝑖− ) 𝜑(𝒙), 𝜑(𝒙𝑖 ) + 𝑏 𝑓 𝒙 = 𝑖=1 𝑁 (𝛼𝑖+ − 𝛼𝑖− )𝜑(𝒙𝑖 ) 𝒘= 𝑖=1 Possible kernel functions Linear kernel: 𝐾 𝒙, 𝒙𝑖 = 𝒙, 𝒙𝑖 Polynomial kernel: 𝐾 𝒙, 𝒙𝑖 = ( 𝒙, 𝒙𝑖 + 𝑝)𝑑 Multi-layer Perceptron kernel: 𝐾 𝒙, 𝒙𝑖 = 𝑡𝑎𝑛ℎ(𝜑 𝒙, 𝒙𝑖 + 𝜃) Gaussian Radial Basis Function kernel: 𝐾 𝒙, 𝒙𝑖 = 𝑒𝑥𝑝( − 𝒙−𝒙𝑖 2 ) 2𝜎 2 Neural Network The most common example is using feed-forward networks which employ the sliding window over the input sequence. For each neuron, it consists of three parts: inputs, weights and (activated) output. 𝑓 𝑥1,𝑖 , … , 𝑥𝑛,𝑖 = 𝑓(𝑤0 + 𝑤1 𝑥1,𝑖 , … , 𝑤𝑛 𝑥𝑛,𝑖 ) Sigmoid function 𝑓 𝑧 = 1 1+𝑒 −𝑧 𝑥2,𝑖 … Hyperbolic tangent 𝑓 𝑧 = 𝑒 2𝑧 −1 𝑒 2𝑧 +1 1 𝑥1,𝑖 𝑥𝑛,𝑖 𝑤0 𝑤1 𝑤2 𝑤𝑛 𝑓 𝑧 1 𝒉𝟏 Example: 2-layer feed-forward Neural Network = 𝑦𝑖 Neural Network: Gradient-descent related methods Evolutionary methods 𝑥2,𝑖 𝒉𝟐 𝑾𝟏 𝒉𝟑 𝑾𝟐 … 𝑓 𝑾𝟐 , 𝑓 𝑾𝟏 , 𝒙 𝒊 1 𝑥1,𝑖 𝑥𝑛,𝑖 𝒉 𝒎 Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware. 𝑦𝑖 Learning (Optimization) Algorithm Error Function: E = Chain Rule: 𝜕E ∆𝑤2(𝑗) = −η ∆𝑤1(𝑘,𝑗) = −η 𝜕𝑤2(𝑗) 𝑁 𝑖=1 = −η 𝜕E 𝜕𝑤1(𝑘,𝑗) 𝑦𝑖 − 𝑦𝑖 𝜕𝐸 2 𝜕 𝑦𝑖 𝜕𝑦𝑖 𝜕𝑤2(𝑗) = −η 𝜕𝐸 𝜕 𝑦𝑖 𝜕ℎ𝑗 𝜕𝑦𝑖 𝜕ℎ𝑗 𝜕𝑤1(𝑘,𝑗) where 1 ≤ 𝑘 ≤ 𝑛 and 1 ≤ 𝑗 ≤ 𝑚 (𝑘 𝑡ℎ input, 𝑗𝑡ℎ hidden neuron) Batch Learning and Online Learning using NN the universal approximation theorem states that a feed-forward network with a single hidden layer can approximate continuous functions under mild assumptions on the activation function. Adaptive Neuro-Fuzzy Inference System Combines the advantages of fuzzy logic and neural network. Fuzzy rules is generated by input space partitioning or Fuzzy C-means clustering Gaussian Membership function 𝜇 𝑥 = exp(−( TSK-type fuzzy IF-THEN rules: 𝑥−𝑚 2 ) ) 𝜎 𝐶𝑗 : IF 𝑥1 IS 𝜇1𝑗 AND 𝑥2 IS 𝜇2𝑗 AND … AND 𝑥𝑛 IS 𝜇𝑛𝑗 THEN 𝑓𝑗 = 𝑏0𝑗 + 𝑏1𝑗 𝑥1 + ⋯ + 𝑏𝑛𝑗 𝑥𝑛 Input space partitioning For 2-dimensional data input:(𝑥1 , 𝑥2 ) 𝑥2 𝜇23 3 6 9 𝜇22 2 5 8 𝜇21 1 4 7 𝑥1 𝜇11 𝜇12 𝜇13 Fuzzy C-means clustering For 𝑐 clusters, the degree of belonging : 𝑐 𝑖=1 𝜇𝑖𝑗 𝜇𝑖𝑗 = = 1, ∀𝑗 = 1, … , 𝑛, … (5) 1 2 𝑑 𝑐 ( 𝑖𝑗 )𝑚−1 𝑘=1 𝑑 𝑘𝑗 where 𝑑𝑖𝑗 = 𝑥𝑗 − 𝒄𝑖 … (6) Objective function 𝐽 𝑐 𝑖=1 𝐽𝑖 = 𝑐 𝑖=1 𝑛 𝑚 2 𝑗=1(𝜇𝑖𝑗 ) 𝑑𝑖𝑠𝑡(𝒄𝑖 , 𝑥𝑗 ) 𝐽 𝑈, 𝒄1 , … , 𝒄𝑐 = 𝐽𝐿 𝑈, 𝒄1 , … , 𝒄𝑐 = 𝐽 𝑈, 𝒄1 , … , 𝒄𝑐 + 𝑛 𝑐 𝑗=1 𝜆𝑗 ( 𝑖=1 𝜇𝑖𝑗 … (7) − 1) … (8) To minimize 𝐽 , we take the derivatives of 𝐽𝐿 and we can get the mean of cluster 𝒄𝑖 = Fuzzy C-means algorithm Randomly initialize 𝑈 and satisfies (5) Calculate the means of each cluster Calculate 𝐽𝑛𝑜𝑤 according to new updated 𝜇𝑖𝑗 in (6) Stops when 𝐽𝑛𝑜𝑤 or (𝐽𝑝𝑟𝑒 − 𝐽𝑛𝑜𝑤 ) is small enough 𝑛 𝑚 𝑗=1(𝜇𝑖𝑗 ) 𝑥𝑗 𝑛 𝑚 𝑗=1(𝜇𝑖𝑗 ) … (9) … μ11 μi1 C1 П N R1 … x1 … … … П N Rj … … … … μ1j П N RJ … … μn1 μij Cj … xi … μ1J … … μnj μiJ CJ … xn μnJ x Σ y 1st layer: fuzzification layer (1) 𝑜𝑖𝑗 𝑥𝑖𝑗 − 𝑚𝑖𝑗 = exp − 𝜎𝑖𝑗 2nd layer: conjunction layer 2 , 1 ≤ 𝑖 ≤ 𝑛, 1 ≤ 𝑗 ≤ 𝐽 𝑛 (2) 𝑜𝑗 (1) = 𝑜𝑖𝑗 1 3rd layer: normalization layer (2) (3) 𝑜𝑗 4th layer: inference layer (4) 𝑜𝑗 (3) = 𝑜𝑗 = 𝑜𝑗 (2) 𝐽 𝑜 𝑗=1 𝑗 × (𝑏0𝑗 + 𝑏1𝑗 𝑥1 + ⋯ + 𝑏𝑛𝑗 𝑥𝑛 ) 5th layer: output layer 𝐽 𝑦=𝑜 (5) (4) = 𝑜𝑗 𝑗=1 Comparison Neural Network vs. SVR Local minimum vs. global minimum Choice of kernel/activation function Computational complexity Parallel computation of neural network Online learning vs. batch learning ANFIS vs. Neural Network Convergence speed Number of fuzzy rules SVR NN ANFIS Parameters ε、C kernel function #. of hidden #. of rules neuron membership activation function function Solution Global minimum Local minimum Local minimum Complexity High Low Medium Convergence speed Slow Slow Fast Parallelism Infeasible Feasible Feasible Online learning Infeasible Feasible Feasible Example: Function Approximation (1) 50 ANFIS Training Data ANFIS Output x = (0:0.5:10)'; 40 w=5*rand; 35 b=4*rand; 30 y = w*x+b; 25 trnData = [x y]; 20 tic; 15 numMFs = 5; 10 mfType = 'gbellmf'; 5 epoch_n = 20; 0 in_fis = genfis1(trnData,numMFs,mfType); 0 1 2 3 4 5 6 7 8 9 10 out_fis = anfis(trnData,in_fis,20); time=toc; Time = 0.015707 RMSE = 5.8766e-06 h=evalfis(x,out_fis); plot(x,y,x,h); legend('Training Data','ANFIS Output'); RMSE=sqrt(sum((h-y).^2)/length(h)); disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)]) 45 45 40 NN Training Data NN Output 35 x = (0:0.5:10)'; 30 w=5*rand; 25 b=4*rand; y = w*x+b; 20 trnData = [x y]; 15 tic; 10 net = feedforwardnet(5,'trainlm'); 5 model = train(net,trnData(:,1)', trnData(:,2)'); 0 time=toc; 0 1 2 3 4 5 6 7 8 9 h = model(x')'; Time = 4.3306 RMSE = 0.00010074 plot(x,y,x,h); legend('Training Data','NN Output'); RMSE=sqrt(sum((h-y).^2)/length(h)); disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)]) 10 clear; clc; addpath './LibSVM' addpath './LibSVM/matlab' SVR 40 Training Data SVR Output 35 30 25 20 x = (0:0.5:10)'; w=5*rand; b=4*rand; y = w*x+b; trnData = [x y]; tic; model = svmtrain(y,x,['-s 3 -t 0 -c 2.2 time=toc; 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 Time = 0.00083499 RMSE = 6.0553e-08 -p 1e-7']); h=svmpredict(y,x,model); plot(x,y,x,h); legend('Training Data','LS-SVR Output'); RMSE=sqrt(sum((h-y).^2)/length(h)); disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)]) Given function w=3.277389450887783 and b=0.684746751246247 % model struct % SVs : sparse matrix of SVs % sv_coef : SV coefficients % model.rho : -b of f(x)=wx+b % for lin_kernel : h_2 = full(model.SVs)'*model.sv_coef*x-model.rho; full(model.SVs)'*model.sv_coef=3.277389430887783 -model.rho=0.684746851246246 Example: Function Approximation (2) 1 Training Data ANFIS Output 0.8 ANFIS x = (0:0.1:10)'; y = sin(2*x)./exp(x/5); trnData = [x y]; tic; numMFs = 5; mfType = 'gbellmf'; epoch_n = 20; in_fis = genfis1(trnData,numMFs,mfType); out_fis = anfis(trnData,in_fis,20); time=toc; h=evalfis(x,out_fis); plot(x,y,x,h); legend('Training Data','ANFIS Output'); RMSE=sqrt(sum((h-y).^2)/length(h)); disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)]) 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0 1 2 3 4 5 6 7 8 9 Time = 0.049087 RMSE = 0.042318 10 1 Training Data NN Output 0.8 0.6 NN x = (0:0.1:10)'; y = sin(2*x)./exp(x/5); trnData = [x y]; tic; net = feedforwardnet(5,'trainlm'); model = train(net,trnData(:,1)', trnData(:,2)'); time=toc; h = model(x')'; plot(x,y,x,h); legend('Training Data','NN Output'); RMSE=sqrt(sum((h-y).^2)/length(h)); disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)]) 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0 1 2 3 4 5 6 7 8 9 Time = 0.77625 RMSE = 0.012563 10 clear; clc; addpath './LibSVM' addpath './LibSVM/matlab' SVR x = (0:0.1:10)'; y = sin(2*x)./exp(x/5); trnData = [x y]; tic; model = svmtrain(y,x,['-s 3 -t 0 -c 2.2 time=toc; -p 1e-7']); h=svmpredict(y,x,model); plot(x,y,x,h); legend('Training Data','LS-SVR Output'); RMSE=sqrt(sum((h-y).^2)/length(h)); disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)]) 1 Training Data SVR Output 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0 1 2 3 4 5 6 7 8 9 Time = 0.0039602 RMSE = 0.0036972 10 1 Training Data SVR Output 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0 1 2 3 4 5 6 7 8 Time = 20.9686 RMSE = 0.34124 9 10 1 Training Data SVR Output 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0 1 2 3 4 5 6 7 8 9 Time = 0.0038785 RMSE = 0.33304 10