Time Series Prediction with Machine Learning Models

advertisement
Machine Learning Based Models
for Time Series Prediction
2014/3
Outline

Support Vector Regression

Neural Network

Adaptive Neuro-Fuzzy Inference System

Comparison
Support Vector Regression

Basic Idea

Given a dataset 𝐷 = (𝒙𝑖 , 𝑦𝑖 ) 1 ≤ 𝑖 ≤ 𝑁 , 𝒙𝑖 ∈ 𝑅 𝑛 , 𝑦𝑖 ∈ 𝑅

Our goal is to find a function 𝑓 𝒙 which deviates by at most 𝜀 from the actual
target 𝑦𝑖 for all training data.

The linear function case, the 𝑓 is in the form 𝑓 𝒙𝑖 = 𝒘, 𝒙𝑖 + 𝑏



∙,∙ denotes the dot product in 𝑅𝑛
“Flatness” in this case means a small 𝒘 (less sensitive to the perturbations in the
features).
Therefore, we can write the problem as following
1
min

Subject to
2
𝒘
2

𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 𝜀
𝒘, 𝒙𝑖 + 𝑏 − 𝑦𝑖 ≤ 𝜀
f1(x) = <w, x> + b + ε
f2(x) = <w, x> + b
f3(x) = <w, x> + b - ε
w
+ε
-ε
Note that +ε and -ε are not actual geometric interpretation

Soft Margin and Slack Variables

𝑓 approximates all pairs (𝒙𝑖 , 𝑦𝑖 ) with 𝜀 precision, however, we also may allow some errors.

The soft margin loss function and slack variables were introduced to the SVR.

1
min 2 𝒘
2
+𝐶
+
𝑁
𝑖=1(ξ𝑖
+ ξ−
𝑖 )
𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 𝜀 + ξ+
𝑖
𝒘, 𝒙𝑖 + 𝑏 − 𝑦𝑖 ≤ 𝜀 + ξ−
 Subject to
𝑖
+ −
ξ𝑖 , ξ𝑖 ≥ 0

𝐶 is the regularization parameter which determines the trade-off between flatness and the
tolerance of errors.

−
ξ+
𝑖 , ξ𝑖 are slack variables which determine the degree of error that far from 𝜀-insensitive tube.

The 𝜀-insensitive loss function
ξ
𝜀
=
0,
𝑖𝑓 𝑦 − 𝑓 𝒘, 𝒙 ≤ 𝜀
𝑦 − 𝑓 𝒘, 𝒙 − 𝜀,
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
+
ξ
w
+ε
-
ξ
𝜀-insensitive loss function
+
ξ
-ε
-
ξ
-ε
+ε

Dual Problem and Quadratic Programs

The key idea is to construct a Lagrange function from the objective (primal) and
the corresponding constraints, by introducing a dual set of variables.

The dual function has a saddle point with respect to the primal and dual variables
at the solution.

Lagrange Function

−
𝐿𝑝𝑟𝑖𝑚𝑎𝑙 𝑤, 𝑏, ξ+
𝑖 , ξ𝑖 =
𝑁
−
𝑖=1 𝛼𝑖

1
2
𝒘
2
+𝐶
𝜀 + ξ−
𝑖 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏
+(−)
Subject to 𝛼𝑖
+(−)
≥ 0 and 𝜇𝑖
+
𝑁
𝑖=1(ξ𝑖
− 𝑁
𝑖=1
≥0
+ ξ−
𝑖 )−
𝜇𝑖+ ξ+
𝑖 +
+
𝑁
𝑖=1 𝛼𝑖
𝜇𝑖− ξ−
𝑖
𝜀 + ξ+
𝑖 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 −

Taking the partial derivatives (saddle point condition), we get





𝜕L
𝜕𝐰
𝜕L
𝜕b
𝑁
𝑖=1
=0→𝒘=
𝑁
𝑖=1
=0→
𝜕L
𝜕𝜉𝑖+
𝜕L
𝜕𝜉𝑖−
𝛼𝑖+ − 𝛼𝑖− 𝒙𝑖
𝛼𝑖+ − 𝛼𝑖− = 0
= 0 → C − 𝛼𝑖+ − 𝜇𝑖+ = 0
= 0 → C − 𝛼𝑖− − 𝜇𝑖− = 0
The conditions for optimality yield the following dual problem:


L𝑑𝑢𝑎𝑙 = −
1
2
Subject to
𝑁
𝑖=1
0
𝑁
𝑗=1
𝑁
𝑖=1
≤ 𝛼𝑖+
𝛼𝑖+ − 𝛼𝑖− 𝛼𝑗+ − 𝛼𝑗− 𝒙𝑖 , 𝒙𝑗 − 𝜀
𝛼𝑖− − 𝛼𝑖+ = 0
≤ 𝐶, 0 ≤ 𝛼𝑖− ≤ 𝐶
𝑁
𝑖=1
𝛼𝑖+ + 𝛼𝑖− −
𝑁
𝑖=1 𝑦𝑖
𝛼𝑖− − 𝛼𝑖+

Finally, we eliminate dual variables by substituting the partial derivatives and we get

𝑓 𝒙 =

𝒘=
+
𝑁
𝑖=1(𝛼𝑖
+
𝑁
𝑖=1(𝛼𝑖
− 𝛼𝑖− ) 𝒙, 𝒙𝑖 + 𝑏
− 𝛼𝑖− )𝒙𝑖

This is called “Support Vector Expansion” in which 𝒘 can be completely described as a
linear combination of the training patterns 𝒙𝑖 .

The function is represented by the SVs, therefore it’s independent of dimensionality of
input space 𝑅 𝑛 , and depends only on the number of SVs.

We will define the meaning of “Support Vector” later.

Computing 𝛼𝑖+ and 𝛼𝑖− is a quadratic programming problem and the popular methods are
shown below:

Interior point algorithm

Simplex algorithm

Computing 𝑏

The parameter 𝑏 can be computed by KKT conditions (slackness), which state that
at the optimal solution the product between dual variables and constrains has to
vanish.
𝛼𝑖+ 𝜀 + ξ𝑖+ − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 = 0
𝛼𝑖− 𝜀 + ξ𝑖− + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 = 0
+ +
𝜇𝑖+ ξ+
𝑖 = C − 𝛼 𝑖 ξ𝑖 = 0
− −
𝜇𝑖− ξ−
𝑖 = C − 𝛼 𝑖 ξ𝑖 = 0

KKT(Karush–Kuhn–Tucker) conditions:

KKT conditions extend the idea of Lagrange multipliers to handle inequality
constraints.

Consider the following nonlinear optimization problem:


Minimizing F(𝑥)

Subject to 𝐺𝑖 (𝑥) ≤ 0, 𝐻𝑗 𝑥 = 0, where 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑙
To solve the problem with inequalities, we consider the constraints as equalities
when there exists critical points.

The following necessary conditions hold:
𝑥 ∗ is the local minimum and there exists constants 𝜇𝑖 and 𝜆𝑗 called KKT multipliers.

Stationary condition: 𝛻F 𝑥 ∗ +
𝑚
𝑖=1 𝜇𝑖 𝛻 𝐺𝑖
𝑥∗ +
𝑙
𝑗=1 𝜆𝑗 𝛻 𝐻𝑗
𝑥∗ = 0
(This is the saddle point condition in the dual problem.)

Primal Feasibility: 𝐺𝑖 (𝑥 ∗ ) ≤ 0 and 𝐻𝑗 𝑥 ∗ = 0

Dual Feasibility: 𝜇𝑖 ≥ 0

Complementary slackness: 𝜇𝑖 𝐺𝑖 𝑥 ∗ = 0
(This condition enforces either 𝜇𝑖 to be zero or 𝐺𝑖 𝑥 ∗ to be zero)


Original Problem:
1
2
+𝐶
+ ξ−
𝑖 )
min

𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 𝜀 + ξ+
𝑖
Subject to 𝒘, 𝒙𝑖 + 𝑏 − 𝑦𝑖 ≤ 𝜀 + ξ−
𝑖
+ −
ξ𝑖 , ξ𝑖 ≥ 0
2
𝒘
+
𝑁
𝑖=1(ξ𝑖

Standard Form for KKT

Objective


1
min 2 𝒘
2
+𝐶
𝑁
+
𝑖=1(ξ𝑖
+ ξ−
𝑖 )
Constraints

𝜀 + ξ𝑖+ − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 ≥ 0 → −(𝜀 + ξ𝑖+ − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏) ≤ 0

𝜀 + ξ𝑖− + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≥ 0 → −(𝜀 + ξ𝑖− + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏) ≤ 0

ξ𝑖+ , ξ𝑖− ≥ 0 → −ξ𝑖+ , −ξ𝑖− ≤ 0

Complementary slackness condition:


+(−)
There exists KKT multipliers 𝛼𝑖
condition.
+(−)
and 𝜇𝑖
(Lagrange multipliers in 𝐿𝑝𝑟𝑖𝑚𝑎𝑙 ) that meet this

𝛼𝑖+ 𝜀 + ξ+
𝑖 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 = 0 … (1)
−
𝛼𝑖 𝜀 + ξ −
𝑖 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 = 0 … (2)

+ +
𝜇𝑖+ ξ+
𝑖 = C − 𝛼𝑖 ξ𝑖 = 0 … (3)
− −
𝜇𝑖− ξ−
𝑖 = C − 𝛼𝑖 ξ𝑖 = 0 … (4)

From (1) and (2), we can get 𝛼𝑖+ 𝛼𝑖− = 0

From (3) and (4), we can see that for some 𝛼𝑖
+(−)
= 𝐶, the slack variables can be nonzero.
Conclusion:
+(−)

Only samples (𝒙𝑖 , 𝑦𝑖 ) with corresponding 𝛼𝑖
= 𝐶 lie outside the 𝜀-insensitive tube.

𝛼𝑖+ 𝛼𝑖− = 0, i.e. there can never be a set of dual variables 𝛼𝑖+ and 𝛼𝑖− which are both
simultaneously nonzero.


From previous page, we can conclude:

, 𝑖𝑓𝛼𝑖+ < 𝐶
𝜀 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 ≥ 0 , ξ+
𝑖 =0
𝜀 − 𝑦𝑖 + 𝒘, 𝒙𝑖 + 𝑏 ≤ 0
, 𝑖𝑓𝛼𝑖+ > 0

−
𝜀 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≥ 0 , ξ−
𝑖 = 0 , 𝑖𝑓𝛼𝑖 < 𝐶
, 𝑖𝑓𝛼𝑖− > 0
𝜀 + 𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝑏 ≤ 0
We can form inequalities in conjunction with the two sets of inequalities above

max{𝑦𝑖 − 𝒘, 𝒙𝑖 − 𝜀|𝛼𝑖+ < 𝐶 𝑜𝑟 𝛼𝑖− > 0} ≤ 𝑏 ≤ min{𝑦𝑖 − 𝒘, 𝒙𝑖 + 𝜀|𝛼𝑖+ > 0 𝑜𝑟 𝛼𝑖− < 𝐶}

If some 𝛼𝑖
+(−)
∈ (0, 𝐶) the inequalities becomes equalities.

Sparseness of the Support Vector

Previous conclusion show that only for |𝑓(𝒙𝑖 ) − 𝑦𝑖 | ≥ 𝜀 the Lagrange multipliers can
be nonzero.

In other words, for all samples inside the 𝜀-insensitive tube the 𝛼𝑖+ and 𝛼𝑖− vanish.

𝒘=

Therefore, we have a sparse expansion of 𝒘 in terms of 𝒙𝑖 and the samples that
come with non-vanishing coefficients are called “Support Vectors”.
𝑁
+
𝑖=1(𝛼𝑖
− 𝛼𝑖− )𝒙𝑖

Kernel Trick

The next step is to make the SV algorithm nonlinear. This could be achieved by simply
preprocessing the training patterns 𝒙𝑖 by a map 𝜑.

𝜑 ∙ ∶ 𝑅𝑛 → 𝑅𝑛ℎ , 𝒘 ∈ 𝑅𝑛ℎ

The dimensionality 𝑛ℎ of this space is implicitly defined.

Example: 𝜑 ∶ 𝑅 2 → 𝑅3 , 𝜑 𝑥1 , 𝑥2 = (𝑥12 , 2𝑥1 𝑥2 , 𝑥22 )

It can easily become computationally infeasible

The number of different monomial features (polynomial mapping) of degree p =

The computationally cheaper way:

𝐾 𝒙, 𝒙𝑖 = 𝜑(𝒙), 𝜑(𝒙𝑖 )

Kernel should follow Mercer's condition
𝑛+𝑝−1
𝑝

In the end, the nonlinear function takes the form:
𝑁
(𝛼𝑖+ − 𝛼𝑖− ) 𝜑(𝒙), 𝜑(𝒙𝑖 ) + 𝑏
𝑓 𝒙 =
𝑖=1
𝑁
(𝛼𝑖+ − 𝛼𝑖− )𝜑(𝒙𝑖 )
𝒘=
𝑖=1

Possible kernel functions

Linear kernel: 𝐾 𝒙, 𝒙𝑖 = 𝒙, 𝒙𝑖

Polynomial kernel: 𝐾 𝒙, 𝒙𝑖 = ( 𝒙, 𝒙𝑖 + 𝑝)𝑑

Multi-layer Perceptron kernel: 𝐾 𝒙, 𝒙𝑖 = 𝑡𝑎𝑛ℎ(𝜑 𝒙, 𝒙𝑖 + 𝜃)

Gaussian Radial Basis Function kernel: 𝐾 𝒙, 𝒙𝑖 = 𝑒𝑥𝑝(
− 𝒙−𝒙𝑖 2
)
2𝜎 2
Neural Network

The most common example is using feed-forward networks which employ the
sliding window over the input sequence.

For each neuron, it consists of three parts: inputs, weights and (activated)
output.

𝑓 𝑥1,𝑖 , … , 𝑥𝑛,𝑖 = 𝑓(𝑤0 + 𝑤1 𝑥1,𝑖 , … , 𝑤𝑛 𝑥𝑛,𝑖 )


Sigmoid function 𝑓 𝑧 =
1
1+𝑒 −𝑧
𝑥2,𝑖
…
Hyperbolic tangent 𝑓 𝑧 =
𝑒 2𝑧 −1
𝑒 2𝑧 +1
1
𝑥1,𝑖
𝑥𝑛,𝑖
𝑤0
𝑤1
𝑤2
𝑤𝑛
𝑓 𝑧
1
𝒉𝟏

Example: 2-layer feed-forward Neural Network


= 𝑦𝑖
Neural Network:

Gradient-descent related methods

Evolutionary methods
𝑥2,𝑖
𝒉𝟐
𝑾𝟏
𝒉𝟑
𝑾𝟐
…

𝑓 𝑾𝟐 , 𝑓 𝑾𝟏 , 𝒙 𝒊
1
𝑥1,𝑖
𝑥𝑛,𝑖
𝒉
𝒎
Their simple implementation and the existence of mostly local dependencies
exhibited in the structure allows for fast, parallel implementations in hardware.
𝑦𝑖

Learning (Optimization) Algorithm

Error Function: E =

Chain Rule:
𝜕E

∆𝑤2(𝑗) = −η

∆𝑤1(𝑘,𝑗) = −η
𝜕𝑤2(𝑗)
𝑁
𝑖=1
= −η
𝜕E
𝜕𝑤1(𝑘,𝑗)
𝑦𝑖 − 𝑦𝑖
𝜕𝐸
2
𝜕 𝑦𝑖
𝜕𝑦𝑖 𝜕𝑤2(𝑗)
= −η
𝜕𝐸 𝜕 𝑦𝑖 𝜕ℎ𝑗
𝜕𝑦𝑖 𝜕ℎ𝑗 𝜕𝑤1(𝑘,𝑗)
where 1 ≤ 𝑘 ≤ 𝑛 and 1 ≤ 𝑗 ≤ 𝑚 (𝑘 𝑡ℎ input, 𝑗𝑡ℎ hidden neuron)

Batch Learning and Online Learning using NN

the universal approximation theorem states that a feed-forward network with
a single hidden layer can approximate continuous functions under mild
assumptions on the activation function.
Adaptive Neuro-Fuzzy Inference System

Combines the advantages of fuzzy logic and neural network.

Fuzzy rules is generated by input space partitioning or Fuzzy C-means clustering

Gaussian Membership function

𝜇 𝑥 = exp(−(

TSK-type fuzzy IF-THEN rules:
𝑥−𝑚 2
) )
𝜎
𝐶𝑗 : IF 𝑥1 IS 𝜇1𝑗 AND 𝑥2 IS 𝜇2𝑗 AND … AND 𝑥𝑛 IS 𝜇𝑛𝑗 THEN 𝑓𝑗 = 𝑏0𝑗 + 𝑏1𝑗 𝑥1 + ⋯ + 𝑏𝑛𝑗 𝑥𝑛

Input space partitioning

For 2-dimensional data input:(𝑥1 , 𝑥2 )
𝑥2
𝜇23
3
6
9
𝜇22
2
5
8
𝜇21
1
4
7
𝑥1
𝜇11
𝜇12
𝜇13

Fuzzy C-means clustering

For 𝑐 clusters, the degree of belonging :





𝑐
𝑖=1 𝜇𝑖𝑗
𝜇𝑖𝑗 =
= 1, ∀𝑗 = 1, … , 𝑛, … (5)
1
2
𝑑
𝑐 ( 𝑖𝑗 )𝑚−1
𝑘=1 𝑑
𝑘𝑗
where 𝑑𝑖𝑗 = 𝑥𝑗 − 𝒄𝑖 … (6)
Objective function 𝐽
𝑐
𝑖=1 𝐽𝑖
=
𝑐
𝑖=1
𝑛
𝑚
2
𝑗=1(𝜇𝑖𝑗 ) 𝑑𝑖𝑠𝑡(𝒄𝑖 , 𝑥𝑗 )

𝐽 𝑈, 𝒄1 , … , 𝒄𝑐 =

𝐽𝐿 𝑈, 𝒄1 , … , 𝒄𝑐 = 𝐽 𝑈, 𝒄1 , … , 𝒄𝑐 +
𝑛
𝑐
𝑗=1 𝜆𝑗 ( 𝑖=1 𝜇𝑖𝑗
… (7)
− 1) … (8)
To minimize 𝐽 , we take the derivatives of 𝐽𝐿 and we can get the mean of cluster 𝒄𝑖 =
Fuzzy C-means algorithm

Randomly initialize 𝑈 and satisfies (5)

Calculate the means of each cluster

Calculate 𝐽𝑛𝑜𝑤 according to new updated 𝜇𝑖𝑗 in (6)

Stops when 𝐽𝑛𝑜𝑤 or (𝐽𝑝𝑟𝑒 − 𝐽𝑛𝑜𝑤 ) is small enough
𝑛
𝑚
𝑗=1(𝜇𝑖𝑗 ) 𝑥𝑗
𝑛
𝑚
𝑗=1(𝜇𝑖𝑗 )
… (9)
…
μ11
μi1
C1
П
N
R1
…
x1
…
…
…
П
N
Rj
…
…
…
…
μ1j
П
N
RJ
…
…
μn1
μij
Cj
…
xi
…
μ1J
…
…
μnj
μiJ
CJ
…
xn
μnJ
x
Σ
y

1st layer: fuzzification layer
(1)
𝑜𝑖𝑗

𝑥𝑖𝑗 − 𝑚𝑖𝑗
= exp −
𝜎𝑖𝑗
2nd layer: conjunction layer
2
,
1 ≤ 𝑖 ≤ 𝑛, 1 ≤ 𝑗 ≤ 𝐽
𝑛
(2)
𝑜𝑗
(1)
=
𝑜𝑖𝑗
1

3rd layer: normalization layer
(2)
(3)
𝑜𝑗

4th layer: inference layer
(4)
𝑜𝑗

(3)
= 𝑜𝑗
=
𝑜𝑗
(2)
𝐽
𝑜
𝑗=1 𝑗
× (𝑏0𝑗 + 𝑏1𝑗 𝑥1 + ⋯ + 𝑏𝑛𝑗 𝑥𝑛 )
5th layer: output layer
𝐽
𝑦=𝑜
(5)
(4)
=
𝑜𝑗
𝑗=1
Comparison


Neural Network vs. SVR

Local minimum vs. global minimum

Choice of kernel/activation function

Computational complexity

Parallel computation of neural network

Online learning vs. batch learning
ANFIS vs. Neural Network

Convergence speed

Number of fuzzy rules
SVR
NN
ANFIS
Parameters
ε、C
kernel function
#. of hidden
#. of rules
neuron
membership
activation function function
Solution
Global minimum
Local minimum
Local minimum
Complexity
High
Low
Medium
Convergence
speed
Slow
Slow
Fast
Parallelism
Infeasible
Feasible
Feasible
Online learning
Infeasible
Feasible
Feasible
Example: Function Approximation (1)
50

ANFIS
Training Data
ANFIS Output
x = (0:0.5:10)';
40
w=5*rand;
35
b=4*rand;
30
y = w*x+b;
25
trnData = [x y];
20
tic;
15
numMFs = 5;
10
mfType = 'gbellmf';
5
epoch_n = 20;
0
in_fis = genfis1(trnData,numMFs,mfType);
0
1
2
3
4
5
6
7
8
9
10
out_fis = anfis(trnData,in_fis,20);
time=toc;
Time = 0.015707 RMSE = 5.8766e-06
h=evalfis(x,out_fis);
plot(x,y,x,h);
legend('Training Data','ANFIS Output');
RMSE=sqrt(sum((h-y).^2)/length(h));
disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)])
45
45
40

NN
Training Data
NN Output
35
x = (0:0.5:10)';
30
w=5*rand;
25
b=4*rand;
y = w*x+b;
20
trnData = [x y];
15
tic;
10
net = feedforwardnet(5,'trainlm');
5
model = train(net,trnData(:,1)', trnData(:,2)');
0
time=toc;
0
1
2
3
4
5
6
7
8
9
h = model(x')';
Time = 4.3306 RMSE = 0.00010074
plot(x,y,x,h);
legend('Training Data','NN Output');
RMSE=sqrt(sum((h-y).^2)/length(h));
disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)])
10
clear;
clc;
addpath './LibSVM'
addpath './LibSVM/matlab'

SVR
40
Training Data
SVR Output
35
30
25
20
x = (0:0.5:10)';
w=5*rand;
b=4*rand;
y = w*x+b;
trnData = [x y];
tic;
model = svmtrain(y,x,['-s 3 -t 0 -c 2.2
time=toc;
15
10
5
0
0
1
2
3
4
5
6
7
8
9
10
Time = 0.00083499 RMSE = 6.0553e-08
-p 1e-7']);
h=svmpredict(y,x,model);
plot(x,y,x,h);
legend('Training Data','LS-SVR Output');
RMSE=sqrt(sum((h-y).^2)/length(h));
disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)])

Given function w=3.277389450887783 and b=0.684746751246247

% model struct

% SVs : sparse matrix of SVs

% sv_coef : SV coefficients

% model.rho : -b of f(x)=wx+b

% for lin_kernel : h_2 = full(model.SVs)'*model.sv_coef*x-model.rho;

full(model.SVs)'*model.sv_coef=3.277389430887783

-model.rho=0.684746851246246
Example: Function Approximation (2)
1
Training Data
ANFIS Output
0.8

ANFIS
x = (0:0.1:10)';
y = sin(2*x)./exp(x/5);
trnData = [x y];
tic;
numMFs = 5;
mfType = 'gbellmf';
epoch_n = 20;
in_fis = genfis1(trnData,numMFs,mfType);
out_fis = anfis(trnData,in_fis,20);
time=toc;
h=evalfis(x,out_fis);
plot(x,y,x,h);
legend('Training Data','ANFIS Output');
RMSE=sqrt(sum((h-y).^2)/length(h));
disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)])
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
0
1
2
3
4
5
6
7
8
9
Time = 0.049087 RMSE = 0.042318
10
1
Training Data
NN Output
0.8
0.6

NN
x = (0:0.1:10)';
y = sin(2*x)./exp(x/5);
trnData = [x y];
tic;
net = feedforwardnet(5,'trainlm');
model = train(net,trnData(:,1)', trnData(:,2)');
time=toc;
h = model(x')';
plot(x,y,x,h);
legend('Training Data','NN Output');
RMSE=sqrt(sum((h-y).^2)/length(h));
disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)])
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
0
1
2
3
4
5
6
7
8
9
Time = 0.77625 RMSE = 0.012563
10
clear;
clc;
addpath './LibSVM'
addpath './LibSVM/matlab'

SVR
x = (0:0.1:10)';
y = sin(2*x)./exp(x/5);
trnData = [x y];
tic;
model = svmtrain(y,x,['-s 3 -t 0 -c 2.2
time=toc;
-p 1e-7']);
h=svmpredict(y,x,model);
plot(x,y,x,h);
legend('Training Data','LS-SVR Output');
RMSE=sqrt(sum((h-y).^2)/length(h));
disp(['Time = ',num2str(time),' RMSE = ',num2str(RMSE)])
1
Training Data
SVR Output
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
0
1
2
3
4
5
6
7
8
9
Time = 0.0039602 RMSE = 0.0036972
10
1
Training Data
SVR Output
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
0
1
2
3
4
5
6
7
8
Time = 20.9686 RMSE = 0.34124
9
10
1
Training Data
SVR Output
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
0
1
2
3
4
5
6
7
8
9
Time = 0.0038785 RMSE = 0.33304
10
Download