Uploaded by safaaybek

BOOK NN

advertisement
Artificial Neural Networks
System Identification and Control
Prof. Dr. Serhat Şeker
Faculty of Electrical and Electronics
Istanbul Technical University, Ayazağa Campus
Maslak-34469, Istanbul, Turkey
sekers@itu.edu.tr
July 2014
Table of Contents
Table of Contents
1. Introduction
2. Artificial Neuron and its Mathematical Model
3. Information Flow or Information Content of Neural Network Structure
4. Back Propagation Algorithm (Multi Layered Neural Network)
5. Recurrent Neural Networks
6. System Identification and Use of the Neural Network
7. Neural Network for Applications of Control Theory
7.1. Self-Tuning Neural Network Controller
Artificial Neural Networks: System Identification and Controls
i
1. Introduction
Introduction
System Concept: The class of system studied in this text is assumed to have some
input terminals and output terminals. We assume that if an excitation input is applied at
the input terminals and response is measured on the output terminals. A system with
only one input and one output terminal is generally known as Single-Input Single-Output
(SISO) System. On the other hand, a system with two or more input terminals and/or
two or more output terminals are is called Multi Input-Multi-Output (MIMO) system.
Single-Output
Single-Input
Black Box
(System)
x [n]
y [n]
SISO-System
Multi-Input
Multi-Output
Black Box
(System)
[x1, x2, ……., xp]
[y1, y2, ……., ym]
MIMO-System
Fig 1.1
A system can be continuous-time system or discrete-time system. In this text we
are more interested in discrete time system considering its computer applications. A
system is called discrete-time system if it accepts discrete-time signals as input and
generates discrete-time signals as its output. For SISO system as shown in above
figure, x[n] is the input and y[n] is the output where ‘n’ is the discrete time.
In Time-Domain ‘System’ output can be represented in terms of convolution as
below:
[ ]
[ ]
[ ]
1.1
] [ ]
1.2
or
[ ]
∑ [
Artificial Neural Networks: System Identification and Controls
1
1. Introduction
By applying Z-Transform on the above equation 1.2 we can convert it into
frequency domain. In frequency domain the ‘convolution’ is converted into simple
multiplication operator.
{ [ ]}
{ [ ]
[ ]}
1.3
or
( )
( ) ( )
1.4
In this text, a Neural Network (NN) can be accepted as MIMO system and we are
interested in analysis of non-linear relationship of the system.
Important Differences:
In classical system studies, we know the system function like impulse response
function i.e. h[n], whereas in NN approach, there is no need for system function. In
terms of Mathematical Implementation of conventional system in the hardware we use
summation, multiplication and delays operators. But in NN we build algorithms which
are implemented through software.
Generally, all systems are non-linear in nature. In classical system, non-linear
system are linearized and analyzed through the well-developed analysis techniques e.g
in state-space form. However, there are techniques / approaches and direct methods to
analyze the non-linear system e.g. Lyapunov approach. There are several ways by
which non-linear systems are linearized. But the simplest is the linearization is obtained
by Taylor Expansion. Consider a non-linear function ( ), Taylor Expansion of the
function ( ), at a point
( )
is given as:
( )
(
Consider a general linear equation with
)
( )
(
is constant and
)
1.5
is the slope of the line i.e.
1.6
If we take only the first two terms of the Taylor Expansion above by ignoring the higher
order terms we get:
( )
1.7
( )
(
)
Artificial Neural Networks: System Identification and Controls
2
1. Introduction
Comparing it with the general linear equation 1.6, this can be considered as the
approximated linear form of the non-linear function.
But by ignoring the higher orders terms we have introduced huge error in the
model. But still we use the approach because of its easy computation and analysis
tools. However, in NN approach we deals the non-linear systems in there direct form.
Basic Neural Network (NN) Configuration:
In this text we will consider MIMO Systems as shown in figure below:
Multi-Input
Combination
of the
Neurons
Multi-Output
[y1, y2, ……., ym]
[x1, x2, ……., xn]
Fig 1.2
Where [x1, x2, ……., xn] and [y1, y2, ……., ym] are the discrete (n x 1) input and (m x 1)
outputs respectively. Also combination of the neurons has Parallel Distributed Structure
e.g:
Parallel Distributive
Structure
Neurons
Neurons
Fig 1.3
Artificial Neural Networks: System Identification and Controls
3
1. Introduction
Advantage of Parallel Distributive Structure:
Reliability: It is a highly reliable structure because even if we remove some
connections (neurons) still the system will work.
Optimization: The degree of information of each connection is related to its weight
factor. Optimization can be achieved by implementing the learning algorithm basing on
the sensitivity of the weight factors.
Speed: Fast processing time can be achieved due to parallel distributive nature of the
network. Famous algorithms to achieve this are Levenberg-Marquard and Back
Propagation.
Considering the above facts the basic NN configuration also known as MultiNode Feed Forward NN (information flows from input to output) with ‘n’ input nodes and
‘m’ output nodes is given as in following figure:
Input Layer &
its Nodes
Output Layer &
its Nodes
x1
x2
y1
x3
ym
xn
st
1 Hidden Layer &
its Nodes
nd
2 Hidden Layer
& its Nodes
Fig 1.4: Multi-Layer Feed Forward Neural Network
Problem: Determination of Hidden Layers & Hidden Nodes
As the number of hidden layers and nodes are directly linked with the processing
time of the NN. Higher the number of hidden layers and nodes, slower will be the NN
because of large number of computations. On the other hand if the hidden nodes are
Artificial Neural Networks: System Identification and Controls
4
1. Introduction
less, then fast processing can be achieved but learning process will not be the optimum.
So we want to optimize the Topology.
To make the problem bit simpler, at first we can assume only one hidden layer, in
this case the network topology is shown as follows:
x1
x2
y1
x3
ym
xn
Fig 1.5: Multi-Layer Feed Forward Neural Network with Single Hidden Node
Generally, there is no criterion to optimize the hidden nodes. But we can do it by
trial and error. One method to achieve this is by considering the Shannon’s Information
Criteria. Shannon an American Engineer in 1948 defined Information Criteria for
Channel’s capacity. Sometime it can be taken as information measuring unit. The
measure of unit information is ‘bit’. So we can calculate the information amount which is
being processed at the hidden layer and can be described in terms of bits. Generally:
Number of bits = Number of processing elements of the hidden layers
For a given NN structure we have to follow two steps i.e.


Learning
Test (Recalling)
Artificial Neural Networks: System Identification and Controls
5
1. Introduction
Learning Procedure for Multi-Layered Feed Forward Neural Network:
At first we will consider learning with target values (desired output). Consider a
general NN having with ‘n’ input nodes, ‘m’ output nodes and ‘m’ target values as shown
below:
x1
T1
x2
y1
x3
ym
Tm
xn
Fig 1.6: Multi-Layer Feed Forward Neural Network with Target Value (Desired Output)
In terms of vectors, we can define all variables as follows:
Vector
Vector
Vector
Matrix
̃ = [x1, x2, ……., xn]
̃ = [y1, y2, ……., ym]
: Input Vector
: Output Vector
: Target Value (Desired o/p)
̃ = [T1, T2, ……., Tm]
̃ = [Wji]
= ̃ ⇒
: Size of i/p & hidden layer
= ̃ ⇒
: Size of hidden layer & o/p
(where k is number of hidden nodes)
Error at output layer nodes can be calculated as follows:
(
)
1.8
Note that error can be positive or negative, to avoid this we use absolute value i.e.:
|
|
1.9
Or
|
|
1.10
It can be represented as Energy Function as whole:
∑
Artificial Neural Networks: System Identification and Controls
1.11
6
1. Introduction
For learning procedure we want to minimize the error
and to do so, we have:
1.12
As independent value of
are dependent on weight factors. So we can get the
optimization. For this we adjust the weight factors by learning algorithm. In this manner
the most popular learning algorithm is ‘Back Propagation Algorithm’ for Feed Forward
Neural Networks.
An important point to consider here is that the term Back Propagation is not related to
the feedback as information flows from input to output. Contrary to the conventional
control systems in which feedback flows from output to input. But it is due to the
adjustment of the weights basing on back propagation of error. Same is show below in
figure:
̃
Information flow between Input & Output
x1
T1
x2
y1
x3
ym
Tm
xn
Fig 1.7
Hence, the information which is defined between input and output layers is stored in the
weight factors. Finally the relationship between the input and output pairs is established
by means of learning algorithm without mathematical functional relationship.
Note: In conventional case we know the functional relationship ( ) for equation
( ). And we can easily find the output
for corresponding input . But if the
functional relationship ( ) is not known then we apply Neural Network approach.
Introduction of Noise:
Sometime to get the enhanced performance, noise term is introduced as an
additional term to the data i.e.
1.13
̃
̃
̃
̃
Artificial Neural Networks: System Identification and Controls
1.14
7
1. Introduction
Considering Noise term as Gaussian Noise 1 with zero mean, unit standard deviation
and gain factor k (usually small), we have:
̃
̃
( , )
1.15
̃
̃
( , )
1.16
Matrix Notation in Learning:
𝑥
𝑥
𝑥
𝑥
𝑥
𝑥
𝑥𝑝
𝑥𝑝
𝑥𝑝𝑛
𝑦
𝑦
𝑛
𝑛
Output Matrix
𝑦
𝑦𝑚
𝑦
𝑦𝑚
𝑦𝑝
(pxn)
𝑇
𝑇
Fully Connected Network
𝑦𝑝
𝑦𝑝𝑚
(pxm)
Target Matrix
𝑇
𝑇𝑚
𝑇
𝑇𝑚
𝑇𝑝
𝑇𝑝
𝑇𝑝𝑚
(pxm)
Fig 1.8
So by back propagation of the error matrix and hence adjusting the inner factors,
namely the weight factors, actual output will approach the target values at the end of the
learning procedure.
Application of Data and Recalling:
70 % of the total rows are used for Learning Step
,
,
,
,
,
,
,
,
,
?
?
?
Unknown (Without Target)
30 % of the total rows are used for Test
Fig 1.9
1
See Appendix-I for Gaussian Noise
Artificial Neural Networks: System Identification and Controls
8
2. Artificial Neuron & Its Mathematical Model
Artificial Neuron & Its Mathematical Model
2.1 Biological and Artificial Neuron:
A neuron is a specialized type of cell found in the bodies of all eumetozoans2.
Only sponges and a few other simpler animals lack neurons. The features that define a
neuron are electrical excitability and the presence of synapses, which are complex
membrane junctions that transmit signals to other cells.
Fig 2.1: Basic Biological Neuron Structure [URL 1]
As shown in Fig 2.1, a typical neuron is divided into three parts: the soma or cell
body, dendrites, and axon. The soma is usually compact; the axon and dendrites are
filaments that extrude from it. Dendrites typically branch profusely, getting thinner with
each branching, and extending their farthest branches a few hundred micrometers from
the soma. The axon leaves the soma at a swelling called the axon hillock, and can
extend for great distances, giving rise to hundreds of branches.
2.1.1 Neurotransmission
In neurotransmission or sometimes also called as synaptic transmission, neurons
communicate by sending an electrical charge down the axon and across the synapse to
the next neuron as shown in Fig 2.1. Because the neurons are not physically
connected, chemical messengers called neurotransmitters cross the synaptic gap to get
the message to the next neuron. Communication is both electrical and chemical. At
electrical synapses, two neurons are physically connected to one another through gap
junctions. Gap junctions permit changes in the electrical properties of one neuron to
effect the other, and vice versa, so the two neurons essentially behave as one.
2
Eumetazoa is a clade comprising all major animal groups except sponges, placozoa, and several other
obscure or extinct life forms, such as Dickinsonia.
Artificial Neural Networks: System Identification and Controls
9
2. Artificial Neuron & Its Mathematical Model
2.1.2 Electrical Neurotransmission
At electrical synapses, two neurons are physically connected to one another
through gap junctions. Gap junctions permit changes in the electrical properties of one
neuron to effect the other, and vice versa, so the two neurons essentially behave as
one. Electrical neurotransmission is communication between two neurons at electrical
synapses. [1]
Presynaptic Neuron
Postsynaptic Neuron
Fig 2.2: Enlarge view of Synapse between two neurons [URL 2]
2.1.3 Chemical Neurotransmission
In chemical neurotransmission, the presynaptic neuron and the postsynaptic
neuron are separated by a small gap - the synaptic cleft. The synaptic cleft is filled with
extracellular fluid (the fluid bathing all the cells in the brain). Although very small,
typically on the order of a few nanometers (a billionth of a meter), the synaptic cleft
creates a physical barrier for the electrical signal carried by one neuron to be
transferred to another neuron. In electrical terms, the synaptic cleft would be considered
a “short” in an electrical circuit. The function of neurotransmitter is to overcome this
electrical short. It does so by acting like a chemical messenger, thereby linking the
action potential of one neuron with a synaptic potential in another. [1]
As the information content of during the neurotransmission process has a nonlinear structure. So, we can describe the mathematical model of neuron.
Artificial Neural Networks: System Identification and Controls
10
2. Artificial Neuron & Its Mathematical Model
2.2.1 Mathematical Model of Biological Neuron:
Consider a Multi-Input & Single-Output (MISO) system. Where [x1, x2, ……., xn]
and [yj] are the discrete (n x 1) inputs and output respectively.
x1
wj1
Cell body as
Linear Summation
x2
∑
wj2
xn
aj
Non
Linear
Actuation
Function
yj
wjn
Fig 2.3: Mathematical Model for jth Biological Neuron Topology
Mathematically:
2.1
∑
&
( )
2.2
Here, ( ) is a non-linear function. There are various types of non-linear actuation
functions e.g.:
 Sigmoidal Function (Logistic Function)
 tanh-function
 Arctan – Function
Artificial Neural Networks: System Identification and Controls
11
2. Artificial Neuron & Its Mathematical Model
2.2.2 Modified Mathematical Model (as Learning System):
x0
x1
x2
xn
wj0
wj1
wj2
faj
aj
∑
+
wjn
yj
Tj
-
ej
Adaptive
Learning
Algorithm
Fig 2.4
Where:
(
(
)
, ,……, )
(
( )
Also we know that
(For Sigmoidal case or
Logistic Actuation Function)
)
2.3
is the linear combination of weight factors and input:
2.4
∑
Or
2.5
∑
Finally,
2.6
∑
(
)
∑
Artificial Neural Networks: System Identification and Controls
12
2. Artificial Neuron & Its Mathematical Model
Substituting above in sigmoidal function, we get:
2.7
∑
(
)
Or
2.8
∑
1-
)
Non Linear Region
faj
0.5 -
aj
Linear Region
Fig 2.5
For large values of the weights, the output of actuation function appeared to be
the unit value asymptotically. On the contrary, for small values of weights input is
mapped to around the zero value. For this reason for very small values of the weights
the variation of the actuation function can be accepted as linear variation within a very
narrow band as shown in Fig 2.5. So it can be inferred that for the large values of
weights “the information is stored in the non-linearity” and for the small values of
weights “the information is stored in the linearity”.
Hence, basing on above explanation it can be concluded that behavior of the
neuron can be adjusted by adjusting the modes of operation in linear and non-linear
region respectively.
Artificial Neural Networks: System Identification and Controls
13
2. Artificial Neuron & Its Mathematical Model
2.2.3 Modified Mathematical Model (as Learning System):
Consider input vector, X=[x1, x2, ……., xn], having random variables in standard
normal distribution i.e. ( , )
( , ).
x1
x0=1
wj1
x2
aj
wj2
xn
∑
Non
Linear
Actuation
Function
yj
wjn
Fig 2.6
As shown in Fig 2.5, the non-linear actuation function can be divided into linear
and non-linear regions of operation. Both regions have different response to the input
vector having random variables in standard normal distribution.
2.3.1 Normal Distributed Variables:
Consider random variables having standard normal distribution with variables
mean value (μ=0), standard deviation (σ=1), skewness (s=0) and kurtosis (k=3) as
shown in following fig 2.7.
Fig 2.7 Standard Normal Distribution [URL 3]
Artificial Neural Networks: System Identification and Controls
14
2. Artificial Neuron & Its Mathematical Model
Here, the skewness is a measure of symmetry and it is always zero for
symmetrical distribution as depicted by percentage equality of same colored regions in
above Fig 2.7. For asymmetrical distribution, skewness has either positive or negative
value other than zero. This can be shown in Fig 2.8 below as asymmetrical distribution
skewed to the left (for s<0) and skewed to the right (for s > 0).
“Normal”
(S = 0)
“Skewed to the right”
(S > 0)
“Skewed to the left”
(S < 0)
Fig 2.8
The parameter of skewness, s, defines the measure of the symmetry for a given normal
distribution. For symmetrical case, it is always zero while its different from the zero for
any asymmetrical distribution.
2.3.2 Response of Linear Transform on the Normal Distributed Variables:
The random variables having standard normal distribution are transformed to
another normal distribution with different statistical parameters like:
2.9
Consider general linear equation:
2.10
Comparing the equations 2.9 & 2.10, it can be concluded that if
has standard
normal distribution ( , ), then under the linear transformation, the new variables will
have normal distribution ( , ) or ( , ∑ ) i.e. statistical parameters are changed
depending on the linear equation parameters but we cannot observe the same in the
non-linear operation.
However, it can observed from the following Fig 2.8 that for the linear part of the
sigmoidal function and for small weight factors, Gaussian input symmetrical distribution
results in symmetrical distribution with change statistical parameters as per the
parameters of equation of linear region.
Artificial Neural Networks: System Identification and Controls
15
Gaussian Output
(Symmetrical Distribution)
2. Artificial Neuron & Its Mathematical Model
1-
f(∑𝑛𝑖 𝑤𝑗𝑖 𝑥𝑖
𝜃𝑗 )
Linear Region
0.5 -
𝑛
∑ 𝑤𝑗𝑖 𝑥𝑖
f(𝑥)
𝜃𝑗
𝑖
Gaussian Input
(Symmetrical Distribution)
𝜇
𝜎
𝑥
Fig 2.9
2.3.3 Response of Non-Linear Transform on the Normal Distributed Variables:
Finally we will consider non-linear transformation of normally distributed random
variables in neuron. As concluded above we cannot observe the symmetrical
distribution like linear transform. The result of non-linear transform is asymmetrical
distribution as show in Fig 2.8 below:
Non-Gaussian Output
(Asymmetrical
Distribution)
f(∑𝑛𝑖 𝑤𝑗𝑖 𝑥𝑖
𝜃𝑗 )
1Non Linear Region
0.5 -
𝑛
∑ 𝑤𝑗𝑖 𝑥𝑖
𝜃𝑗
𝑖
f(𝑥)
m
Gaussian Input
(Symmetrical Distribution)
𝜇
𝜎
𝑥
Fig 2.10
Artificial Neural Networks: System Identification and Controls
16
2. Artificial Neuron & Its Mathematical Model
2.3.4 Other Non-Linear Actuation Functions
Until now we considered sigmoidal actuation function. But other actuation
functions can also be used e.g.
2.3.4.1 Arctangent Function
𝑓(𝑥)
(𝛼𝑥)
𝜋
𝑓(𝑥)
𝜋
(𝛼𝑥)
Fig 2.11
2.3.4.2 Hyperbolic Tangent Function
𝑓(𝑥)
h( 𝛼𝑥)
𝑒 𝛼𝑥
𝑒 𝛼𝑥
𝑒
𝑒
𝛼𝑥
𝛼𝑥
Fig 2.12
𝑓(𝑥)
2.3.4.3 Logistic (Sigmoidal) Function
𝑒
𝛼𝑥
Fig 2.13
Artificial Neural Networks: System Identification and Controls
17
2. Artificial Neuron & Its Mathematical Model
2.4 Adaptive Learning Algorithm
2.4.1 Widrow-Hopf Delta Learning Rule
Consider a neuron without actuation function but with target value ‘T’ and error
‘ε’:
x1
wj1
x2
wj2
I=∑ 𝑤𝑖 𝑥𝑖
+
wjn
xn
𝜀
I𝜀
(𝑇
𝐼)
(𝑇
𝐼)
T
Fig 2.14
If the gradient of the squared error is calculated w.r.t. weights as follows:
(
For two inputs, the squared error is written as:
[
[
2.11
)
]
]+
2.12
(
[
)]+[(
) ]
Hence, the derivatives are:
Suppose,
=
0, hence [
[
]
[
]
]
2.13
2.14
. From this equality we get:
2.15
&
2.16
Artificial Neural Networks: System Identification and Controls
18
2. Artificial Neuron & Its Mathematical Model
By substituting
&
in equation 2.12, we get the Minimum Square Error (MSE)
which is equal to zero. In terms of the theoretical result, this is correct but in real world
the MSE is not equal to zero because of non-linearity, noise and imperfect data.
The minimization of squared error by the Widrow-Hopf training algorithm can be
shown graphically as follow in Fig 2.14:
(𝑇
𝑤 𝑥 )
𝜀
𝑚𝑖𝑛
𝑤
𝑇
𝑤
𝑤 𝑥
𝑥
Fig 2.15
Hence, the Widrow-Hopf rule provides the following equality in terms of the change in
each weight:
2.17
Or
(
)
2.18
Where k is constant.
This rule is known as delta rule and it moves the weight factor along the negative
gradient of the curve surface towards the ideal weight factor position. Hence it follows
the gradient and it is also called as Gradient Descent or Steepest Descent Algorithm.
Graphically it is shown as in Fig 2.15.
Artificial Neural Networks: System Identification and Controls
19
2. Artificial Neuron & Its Mathematical Model
𝜀
𝑤
Delta vector
Ideal weight vector
𝑤
Fig 2.16 Geometric Representation of Delta Rule
Finally to normalize the input vector component ‘ ’ by | | , the delta rule is written as:
| |
| |
[
And taking
| | ]
2.19
2.20
| |
| | , it becomes:
2.21
| |
Artificial Neural Networks: System Identification and Controls
20
3. Information Flow or Information Content of Neural Network Structure
Information Flow or Information Content of NN Structure
3.1 Shannon’s Information for Feed Forward Neural Network and Information
Measure as Number of Hidden Nodes
First we will consider following two basic related concepts:
 Information : ( )
 Entropy
: ( )
Where X is a random variable with possible values of [ , , ……., ].
Here, we can write the entropy in terms of information I(X) as follows:
( )
( ( ))
3.1
Where ( ) is a mathematical operator to calculate the mean value. As a result this
entropy value becomes:
( )
( ))
3.2
(
Where ( ) is the probability mass function of . By taking the finite samples, the
entropy, ( ) can be written as:
( )
3.3
∑ ( ) ( )
or
( )
Where
∑ ( )
is base of logarithm for binary case i.e.
( )
3.4
=2.
Hence, the mean information or entropy for the binary case becomes:
( )
Here,
∑
3.5
( )
For equally likely events in probability theory. Each probability value for states
given as:
is
3.6
If so, the equation 3.5 becomes:
( )
∑
Artificial Neural Networks: System Identification and Controls
3.7
21
3. Information Flow or Information Content of Neural Network Structure
or
( )
∑
( )
(
( ))
( )
For combination of information sets e.g.
( | )
( )
( ), conditional entropy is given as:
( , )
( )
3.8
( )
3.9
( )
( , )
Fig 3.1
3.2 Interpretation of a Feed Forward Neural Network as a Communication System
and Shannon’s Channel Capacity:
A basic communication system is comprised of transmitter point, receiver point
and information channel. As an example it is given as:
Noise
Or
Disturbance
Transmitter
Information
(Voice, image
etc.)
Filter
Modulator
Receiver
Communication
Channel
 Wire
 Coaxial Cable
 Radio Link
 Wave Guide
Voice, image
etc.
Filter
Modulator
Fig 3.2: Basic Communication System
Artificial Neural Networks: System Identification and Controls
22
3. Information Flow or Information Content of Neural Network Structure
For the system as shown in Fig 2.16, channel capacity as a measure of
information can be defined as formula using the Shannon’s approach:
(
)
3.10
Where:
: Channel capacity [ ] = bit/sec
: Channel Bandwidth [ ] = Hz
: Signal to Noise Ratio
For a noiseless channel, namely
&
Using the structure (similarity) of communication channel, as shown in Fig. 2.16,
we can define Feed Forward Neural Network topology. However, to do this we have
following assumptions:
Consider a feed forward Neural Network with three layers, using only one hidden
layer as shown below:
Input Layer like
Transmitter
Hidden Layer like
Communication
Channel
Output Layer like
Receiver
Fig 3.3
Hence the hidden layer can be represented by channel capacity and/so
numerical value of the channel capacity is described by the number of the hidden units
like number of bits. This is an optimization problem.
Artificial Neural Networks: System Identification and Controls
23
3. Information Flow or Information Content of Neural Network Structure
3.3.1 An Analytical Approach based on Information Theory for NN Architecture:
For Feed-Forward NN topology, determination of the hidden nodes related to the
hidden layer is still an open problem. In this chapter we consider only one hidden layer
between input and output layers. Objective here is to optimizer the number of the
hidden nodes. The number of hidden nodes can be represented by an information
measure. This information measure is based on Shannon’s information equality. Hence,
it can be defined as a logarithmic measure in unit of bits. The entropy value, which is
defined as a measure value of the information can be given for a binary state as below:
( ,
( ,
( )
)=
( )
)=
3.11
( )
(
)
(
( ,
The graphical illustration of the entropy function
becomes:
3.12
)
) for the binary state
𝑚𝑎𝑥𝐻𝑗
𝐻𝑗
𝑝𝑗
𝑞𝑗
𝑝𝑗
𝑝𝑗
𝑞𝑗
.5
Fig 3.4
Sometimes instead of the binary state, it can be represented by natural logarithm
which is defined by base ‘e’ , then equation 3.11, can be written as:
( ,
)=
( )
(
) (
)
3.13
From equation 3.12, we can see that the entropy function uses the probability
values, whereas we don’t have the probabilities in Neural Network topology. To give the
probabilistic meaning to Neural Network application following assumptions can be
considered.
Artificial Neural Networks: System Identification and Controls
24
3. Information Flow or Information Content of Neural Network Structure
3.3.2 Assumptions and Definitions:
If a pattern set is applied to the Neural Network topology, the relationship of the
input – output pair can be defined by a which is called mapping function between
input and output vector spaces like:
̃
, ̃
3.14
⇒
Indicating the states of each independent input vector by index
Information Flow
1st Pattern
̃
𝑊
𝑥
( )
𝑥
( )
𝑥
( )
𝑥
( )
(𝑁)
𝑥𝑁
𝑥𝑁
(𝑁)
3.15
, it can be shown as:
( )
𝑁
( )
𝑥 𝑁
𝑥
𝑥
( )
𝑥
( )
𝑥
( )
3
𝑥
( )
𝑁
̃
𝑊
(𝑁)
𝑥𝑁𝑁
Square Matrix
with
N-Dimension
Hidden Layer
j=1,2,3,….,M
(Unknown)
Input Layer
i =1,2,3,….,N
Output Layer
k=1,2,3,….,K
Fig 3.3
Note: Here weight matrices, [
procedure.
] [
] , are constant matrices after the training
If each pattern is selected from the pattern set A, according to constant
(likelihood) probability value ( ), then the value of ( ) is:
3.16
( )
If we accept information flow from the input layer to hidden layer nodes. The
weighted sum of jth processing element for the hidden layer with state ( ) is transferred
by means of the sigmoidal function, then:
Artificial Neural Networks: System Identification and Controls
25
3. Information Flow or Information Content of Neural Network Structure
3.17
( )
{
x [
( )
(∑
)]}
Where,
are the optimal weights obtained after the training. Here, we consider the
half topology of Neural Network topology. If so, it is shown as input and hidden output
̃ . Consequently the
pairs. Hence the hidden output node matrix becomes ̃ &
exponential term is given by:
( )
( )
(∑
)
(
( )
)
3.18
3.3.2 Application of Entropy Function:
Using the definition of entropy function as:
( ,
( )
)=
(
) (
3.19
)
Here, this entropy definition can be written in the following way for exponential
term of the sigmoidal outputs at the hidden nodes. Therefore, from equation 3.18 it
becomes:
(∑
( )
)
3.20
,
Where:
( )
( )
( ),
( )&
(
)
Now if the equation 3.18 is integrated between input and hidden output. As a
result we get:
(
( | )
( )| ( ))
∫
∑ (∑
,
( )
)
( )
∫
Here, ( ( )| ( ))is a conditional probability. Also all parameters, like ,
are constant. Finally equation 3.19 becomes:
∑ (∑
,
And if the input
3.21
,
( )
) ( ( )| ( ))
( | )
,
( ),
3.22
,
and hidden output
being independent i.e. ( ( )| ( ))
Artificial Neural Networks: System Identification and Controls
( ), then
26
3. Information Flow or Information Content of Neural Network Structure
∑ (∑
,
( )
3.23
,
( | )
By using the conditional entropy with the definition,
substituting this definition in equation 3.21 we get:
∑ (∑
,
( | )
) ( )
( )
( , )
( , )
) ( )
( ) and
( )
3.24
,
By considering the left hand side of above equation 3.22 as a entropy function, it is
defined as:
( )
( ) ( )
| (∑ ∑
,
∑
( ))|
3.25
,
And comparing the first and second terms of the both sides of equation 3.22
respectively, we can write that:
( ) ( )
∑∑
,
( , )
3.26
,
and the entropy function of input is
( )
∑
with constant probability
( )
and parameter
∑
For,
&
( )
3.27
, equation 3.25 becomes:
( )
3.28
, we get
∑
( )
3.29
After training the bias term ( ) will have a constant value like all weights, then the
mean value of bias ( ̅ ):
̅
With the approximation that
, one can get,
.
Where, .
∑
( )
3.30
.
By writing the entropy of the input explicitly it becomes
Artificial Neural Networks: System Identification and Controls
27
3. Information Flow or Information Content of Neural Network Structure
∑ ( )
.
(
( )
,
And considering the state of the input vector (s) related to probability,
equation 3.29 becomes
.
3.31
)
( )
, the
( ) ( )
3.32
For r=1, we get:
( )
3.33
or
3.34
For binary state, equation 3.31 becomes:
.
Since the value of
is a positive integer, it can be shown as
⟦
⟧
3.35
Here, ⟦ ⟧ is the integer function. Also
⟦
⟧
3.36
We generalize the above result for any input
3.37
Where
indicates the integer and [
] : bit (integer)
As a conclusion if found that the channel capacity in communication system was
3.38
So channel capacity can be interpreted as the number of hidden nodes in the Neural
Network, is , hence, using the similarity, ( ) plays role of the bandwidth ( ) in the
communication system and as a result, we get:
3.39
Artificial Neural Networks: System Identification and Controls
28
4. Back Propagation Algorithm
Back Propagation Algorithm
4.1.1 Back Propagation Algorithm (B.P. Algorithm)
Let’s consider a three layered feed forward neural network topology as shown
below:
𝑊𝑘𝑗
𝑊𝑗𝑖
𝑋𝑝𝑛
Input Layer (i)
where, i =1,2,3,….,N
𝑌𝑝𝑘
Output Layer (k)
where, k=1,2,3,….,K
Hidden Layer (j)
where, j=1,2,3,….,M
Fig 4.1
Where:
1. Input (i) for ‘p’ pattern is given as:
(
,
,
3, … … … … . ,
)
or
𝑥
𝑥
𝑥
𝑥
𝑥𝑝
𝑥
𝑥
𝑥𝑝
𝑁
𝑁
4.1
𝑥𝑝𝑁
2. Net input to the hidden layer neurons (j) is𝑥given𝑥 as:
𝑥𝑁
𝑥
𝑥
𝑥
For a pattern, p=1
or
𝑤 𝑥
𝑤 𝑥
𝑛𝑒𝑡
=
∑
𝑤
𝑥
𝑤 𝑥
𝑛𝑒𝑡
𝑛𝑒𝑡
𝑀
𝑤𝑀 𝑥
Artificial Neural Networks: System Identification and Controls
4.2
𝑤𝑀 𝑥
29
4. Back Propagation Algorithm
3. Output of the hidden layer neurons (j) basing on the actuation function (e.g
sigmoidal function) is given as:
For a pattern, p=1
(
)
𝑂
𝑂
or
𝑂
=
𝑓(𝑛𝑒𝑡 )
𝑓(𝑛𝑒𝑡 )
𝑓(𝑛𝑒𝑡
𝑀
4.3
𝑀)
4. Net input to the output layer neurons (k) is given as:
For a pattern, p=1
∑
𝑛𝑒𝑡
𝑛𝑒𝑡
or
𝑛𝑒𝑡
=
𝑀
𝑤 𝑜
𝑤 𝑜
𝑤 𝑜
𝑤 𝑜
𝑤𝑀 𝑜
𝑤𝑀 𝑜
4.4
5. Output of the output layer neurons (k) basing on the actuation function (e.g
sigmoidal function) is given as:
For a pattern, p=1
(
)
𝑂
𝑂
or
𝑂
=
𝑓(𝑛𝑒𝑡 )
𝑓(𝑛𝑒𝑡 )
4.5
𝑓(𝑛𝑒𝑡 𝐾 )
𝑀
4.1.2 Adjustment/Updating of Weight factors between Output & Hidden Layer
Now we will define an energy function for the output nodes of the neural network
topology as shown in figure 4.1, which is written as:
(
)
4.6
The main goal of B.P. algorithm is to deduce method / conditions for adjustment of
weight factors basing on the minimum value of energy function,
as follows:
4.7
Using equation 4.4,
4.8
Artificial Neural Networks: System Identification and Controls
30
4. Back Propagation Algorithm
Equation 4.7 becomes,
4.9
Re-writing the first term of the right hand side of equation 4.9 using chain rule:
4.10
Using the equations 4.5 and 4.6, we can have following equalities:
(
)
(
)
4.11
And
(
4.12
)
Hence, equation 4.10 using equations 4.11 & 4.12 becomes:
(
)
4.13
Also using the following definition:
4.14
We can define:
(
)
4.15
Returning to the equation 4.9 and substituting the equation 4.15, it becomes:
4.16
Above equation shows the decreasing nature of energy function depending on the
weight factors for output nodes of the neural topology. Then, it can be shown in terms of
proportionality between the changes in the weights between hidden and output nodes.
4.17
Artificial Neural Networks: System Identification and Controls
31
4. Back Propagation Algorithm
And also using a parameter which is known for learning rate (ɳ), this proportionality
becomes equality, as below:
ɳ
4.18
In equation 4.18, the left hand side is finite difference and it can be re-written depending
on the iteration number ( ) as follows:
(
)
( )
ɳ
4.19
(
)
( )
ɳ
4.20
Or
For
, we can calculate derivative of sigmoidal function (chosen as actuation
function) as follows:
[
(
Using calculated result of
]
(
)
)
4.22
, equations 4.15 & 4.20 can be finally re-written as:
(
(
Here, =1 normally
4.21
)
( )
)(
ɳ
(
)
)(
4.23
)
4.24
parameter can be taken as
4.1.3 Adjustment/Updating of Weight factors between Hidden & Input Layer
A similar approach can be used for updating of weight factors which are defined
between hidden and input layer. For this reason, to write the equation 4.20 for this new
case, namely for
and
, we require
and the term
can be easily calculated
by
and hence the algorithm is known as Back-Propagation Algorithm.
Here we consider total energy
of the output nodes:
∑
Artificial Neural Networks: System Identification and Controls
4.25
32
4. Back Propagation Algorithm
And will deduce the minimal condition for the same using parameters between hidden
and input layers as follows:
4.26
Equation 4.26 can be re-written as:
∑
4.27
Here first term of right hand side of above equation can be calculated using chain rule:
∑
Putting the
4.28
from equation 4.2, we have:
∑
(∑
)
4.29
or
∑
Inserting
4.30
as we defined in equation 4.14 in previous case, we have:
∑
4.31
For the second term on the right hand side of equation 4.27, we have:
(
)
(
)
4.32
Using equations 4.31 and 4.32, equation 4.26 becomes:
∑
4.33
∑
4.34
or
Artificial Neural Networks: System Identification and Controls
33
4. Back Propagation Algorithm
Using the similar definition of
in equation 4.14, we can define:
4.35
Hence, equation 4.33 can be re-written as:
∑
4.36
Last equation defined for
can be used in the computation of
which are
defined as changes of weights
between input and hidden layers as done in
previous case (refer equation 4.17 & 4.18).
ɳ
4.37
In equation 4.37, the left hand side is finite difference and it can be re-written depending
on the iteration number ( ) as follows:
(
)
( )
ɳ
4.38
(
)
( )
ɳ
4.39
Or
For
, we can calculate derivative of sigmoidal function (chosen as actuation function)
as follows:
[
(
Using calculated result of
(
)
4.40
)
4.41
, equations 4.36 & 4.39 can be finally re-written as:
(
(
]
)
( )
4.42
)∑
ɳ
(
)∑
4.43
Here, =1 normally
parameter can be taken as
. And the parameter ɳ
(
ɳ
) is learning rate (or constant). For a good learning level it can be selected as
ɳ
. . For stability of learning process, lower value ɳ for is normally chosen.
Artificial Neural Networks: System Identification and Controls
34
4. Back Propagation Algorithm
4.2 Back Propagation Method (Block Diagram)
Target Output
Actual Output
𝑂𝑝𝑘
𝑂𝑝𝑘
𝑡𝑝𝑘
𝑓(𝑛𝑒𝑡𝑝𝑘 )
𝑓 (𝑛𝑒𝑡𝑝𝑘 )
𝑂𝑝𝑘
𝑛𝑒𝑡𝑝𝑘
∑ 𝑊𝑘𝑗 𝑂𝑝𝑗
𝑗
Error
𝛿𝑝𝑘
𝑓 𝑝𝑘 (𝑡𝑝𝑘
𝑊𝑘𝑗
𝑂𝑝𝑘 )
𝑂𝑝𝑗
𝑓(𝑛𝑒𝑡𝑝𝑗 )
𝑊𝑘𝑗
𝛿𝑝𝑘
𝑂𝑝𝑗
𝐸𝑟𝑟𝑜𝑟
𝑃𝑟𝑜𝑝𝑎𝑔𝑡𝑖𝑜𝑛
𝑡𝑜 𝑏𝑎𝑐𝑘
𝑛𝑒𝑡𝑝𝑗
∑ 𝑊𝑗𝑖 𝑂𝑝𝑖
𝑗
𝑓 (𝑛𝑒𝑡𝑝𝑗 )
Error
𝛿𝑝𝑗
𝑓 𝑝𝑗 ∑𝛿𝑝𝑘
𝑊𝑘𝑗
𝑊𝑗𝑖
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛
𝑓𝑙𝑜𝑤 𝑓𝑟𝑜𝑚
𝑖𝑛𝑝𝑢𝑡 𝑡𝑜 𝑜𝑢𝑡𝑝𝑢𝑡
𝛿𝑝𝑗
Input
𝑘
Fig 4.2
Artificial Neural Networks: System Identification and Controls
35
4. Back Propagation Algorithm
4.3 Back Propagation Algorithm & its Flow Chart
4.3.1 Algorithm
1) For all connections, the weight factors among layers can be initialized by small
random numbers using a random number generator (like standard normal
distributed random numbers).
2) Define a counter for number of iterations and read the first value as n=1
3) Define a counter for number of pattern and read the first value as p=1.
4) Read the first pattern to be executed
5) Using this first pattern, calculate the outputs of neuron by equations 4.2, 4.3, 4.4
& 4.5
6) Update the weights between the layers using equations 4.20 & 4.43
7) Repeat the calculations between steps 3 to 6 up to the end of the pattern set.
8) For nth iteration, calculate the error function E(n)
( )
∑
∑ ∑(
9) Consider a very small number , like
)
4.44
and if it provides the following
condition for error function (energy function)
| ( )
(
)|
4.45
10) Then go to stop and save the all updated weight factors. In opposite case taking
the next iteration number, go to step 3 and repeat the operations between the
steps 3 to 9 until reaching the acceptable numerical value of the parameter
for
error function.
Artificial Neural Networks: System Identification and Controls
36
4. Back Propagation Algorithm
4.3.2 The Flow Chart of BP-Algorithm
Start
(1) Initial values of weights
(2) Counter for iteration n=1
(3) Counter for pattern p=1
(4) Read the first pattern
p=p+1
(5) Calculate the outputs for neurons
n=n+1
(6) Update the weight for / between layers
No
(7) p=P?
Yes
(8) Calculate E(n)
No
(9) |𝐸(𝑛)
𝐸(𝑛
)|
𝜀
Yes
Stop
Fig 4.3
Artificial Neural Networks: System Identification and Controls
37
5. Recurrent Neural Networks
Recurrent Neural Networks
5.1.1 Recurrent Neural Networks (RNN)
A Recurrent Neural Network has a feedback connection from output layer or hidden
layers. In this case depending on the feedback type it is called as:
 JORDAN Type RNN
 ELMAN Type RNN
5.1.2 Jordan Type RNN
In the Jordan-type RNN the feedback connections come from the output layer to
input side. Where have a special input layer or nodes. This layer is called as Context
Layer. In this manner this is little bit different from the ordinary inputs nodes.
Hidden Layer
Input Layer
Output Layer
Context
Layer
𝑧
Feedback
Connection
s
𝑧
Here, 𝑧 is the delay
element. This topology
uses
the
B.P.
algorithm for training
and learning process.
𝑧
Fig 5.1
Artificial Neural Networks: System Identification and Controls
38
5. Recurrent Neural Networks
5.1.2 Jordan Type RNN
In this case, the feedback connections are provided from the hidden layer(s).
Consider a simple case having only one hidden layer as given below:
Input Layer
Hidden Layer
Output Layer
Context
Layer
Feedback
Connection
s
𝑧
𝑧
𝑧
Fig 5.2
This topology also uses the B.P. algorithm for learning and training steps. In other
words these topologies or RNN’s are known as Dynamical Neural Networks.
As a comparison, ELMAN type RNN has an advantage over the JORDAN type because
it uses the feedback connection from hidden nodes and it uses the extracted information
one more time. These topologies can be used in fault detection problems successfully.
Artificial Neural Networks: System Identification and Controls
39
6. System Identification and Use of the Neural Network
System Identification and Use of the Neural Network
6.1.1 System Identification (Modeling)
Consider a general linear model:
System
𝑥[𝑛]
𝑦[𝑛]
Fig 6.1
[ ]
[
]
[
]
[ ]
[
[
]
]
[
]
[
]
6.1
N is System order
Or
[
∑
]
[
∑
]
6.2
This is the linear combination of input and output pairs.
Here,
. If so it is written as:
[ ]
∑
[
]
∑
[
]
6.3
Where the main problem is the determination of the system parameters and also
the determination of correct system order N.
6.1.2 For Stochastic System
Stochastic system model can be represented by following block diagram.
System
𝜀𝑡
𝑦𝑡
Fig 6.2
Where
: Independent random error
{ } : set of the random error
Statistically { }
( ,
)
Artificial Neural Networks: System Identification and Controls
40
6. System Identification and Use of the Neural Network
Hence observing the system outputs (current and former values of the system
outputs) the modeling task can be used to characterize the rpcess.
|
|
6.4
Where,
: Model output
: Actual output
For auto regression process,
6.5
Here,
can be interpreted as a white noise process.
For determination of the system parameters like
, ,
, , some special
techniques / methods are used. In this manner the most famous one is Yule Walker
method (equation). Here
,
,
are called as Auto-Regressive (AR)
parameters. The Yule Walker equation is given as:
𝑅
𝑅
𝑅
𝑅
𝑅𝑝
𝑅𝑝
𝑎
𝑎
𝑅𝑝
𝑅𝑝
𝑅
𝑎𝑝
𝑥
𝑥
𝑥
𝑥𝑁
Auto
correlation
𝑥
𝑥
Matrix
= _
𝑅
𝑅
6.6
𝑅𝑝
Coefficients
Hence, each data sample can be predicted from its predecessors:
̂
∑̂
6.7
Then the residue is ̂ and it becomes:
̂
̂
6.8
For this computation the system order P can be calculated by Akaik’s Information
Criteria (AIC)3. In this computation matrix elements
is given as:
∑
3
6.9
See Appendix 2
Artificial Neural Networks: System Identification and Controls
41
6. System Identification and Use of the Neural Network
Finally, if we determine the system order and system parameters, the approach is
known as “Identification”.
6.1.3 Auto-Regressive-Moving-Average (ARMA) – Process (Model)
𝜀𝑡
ARMA
Process
𝑦𝑡
Fig 6.3
6.10
Where,
( ,
)
6.11
is white noise
Or
∑̂
∑
6.12
Here,
p: Order
6.2.1 Use of the Neural Networks (NNs) in System Modeling.
Here NNs are feed-forward networks which use the Back Propagation algorithm.
𝑥𝑡
Physical
System
𝑥𝑡
Physical
System
𝑦
𝑦
Here
𝑒 𝑦 𝑦
𝑦 𝑁𝑁-Ouput
𝑦 𝑁𝑁-Ouput
𝑦
NN
Fig 6.4
Artificial Neural Networks: System Identification and Controls
42
6. System Identification and Use of the Neural Network
If at the end of the learning procedure the modeling error reach to an acceptable value
then Neural Network topology will emulate the actual physical system. For this reason
system behaves as an observer.
6.3 Applications of Adaptive Neural Networks
6.3.1 Adaptive Process and its Components
Input
Process
Output
Adaptive
Algorithm
Performance
Calculation
Fig 6.5
In this manner, an artificial neural network is an adaptive system. Now we can define
the configuration of the Adaptive Neural Network Systems.
1.
2.
3.
4.
System Identification and Modeling
Inverse Modeling
Adaptive Interference Cancelling
Adaptive Prediction
Artificial Neural Networks: System Identification and Controls
43
6. System Identification and Use of the Neural Network
6.3.2 System Identification and Modeling
A simple modeling technique is shown in the following block diagram.
System
+
NN
𝑒
Fig 6.6
Modeling system with delay
and noise.
𝑁𝑜𝑖𝑠𝑒
System
+
+
𝑁𝑜𝑖𝑠𝑦 𝑂𝑢𝑡𝑝𝑢𝑡
+
NN
𝑒
Fig 6.7
Where
Artificial Neural Networks: System Identification and Controls
44
6. System Identification and Use of the Neural Network
: Difference or error
6.3.3 Inverse Modeling
A simple inverse modeling system is given as:
𝐼𝑛𝑝𝑢𝑡
System
NN
Fig 6.8
By adding delay
and noise, we have:
𝑁𝑜𝑖𝑠𝑒
System
NN
Fig 6.9
Artificial Neural Networks: System Identification and Controls
45
6. System Identification and Use of the Neural Network
6.3.4 Adaptive Noise Cancelling System
𝑆
𝑁 𝑁𝑜𝑖𝑠𝑦 𝑆𝑖𝑔𝑛𝑎𝑙
System
𝑆
𝑁
𝑌
Fig 6.10
6.3.5 Adaptive Prediction
𝑆(𝑡)
𝑆(𝑡
SLAVE
NN
)
𝐼𝑛𝑝𝑢𝑡, 𝑆(𝑡)
NN
𝑆(𝑡)
𝑆(𝑡
)
Fig 6.11
Artificial Neural Networks: System Identification and Controls
46
6. System Identification and Use of the Neural Network
6.4.1 Non-Parametric Identification
Non-parametric identification uses “Black-Box” model of the input-outputs. Neural
Networks for non-parametric process models can be interpreted as a non linear
extension of the system identification problems. As an example the adaptive transverse
filter structure can be implemented by a neural network and it becomes a finite impulse
response (FIR) network as a non-parametric identification system.
FIR structure for a given input-output pair like
[ ]
∑
and
[
is given by
]
6.13
For a single input adaptive transversal filter with a target output.
𝑥𝑘
𝑥𝑘
𝑧
𝑥𝑘
𝑥𝑘
𝑧
𝑥𝑘
𝑚
𝑀
𝑧
𝐼𝑛𝑝𝑢𝑡
𝑦𝑘
𝐴𝑑𝑎𝑝𝑡𝑖𝑣𝑒
𝐴𝑙𝑔𝑜𝑟𝑖𝑡 𝑚
𝐸𝑟𝑟𝑜𝑟
𝜀𝑘
𝑂𝑢𝑡𝑝𝑢𝑡
𝑇𝑎𝑟𝑔𝑒𝑡
𝑇𝑟
Fig 6.12
Here,
(or [ ]) is system output. In FIR structure
parameters are represented by
the weights in this model and these parameters can be adjusted by the adaptive
algorithm.
Artificial Neural Networks: System Identification and Controls
47
6. System Identification and Use of the Neural Network
6.4.2 Parametric Identification
Neural Networks is trained through supervised learning can be used for both
identification and parameter estimation. In this manner this approach is a little bit
different from the non-parametric application.
Using the advantage of the parametric identification modeling by NNs we get the
following figures for Dynamic Models
𝑥𝑘
𝑧
𝑧
𝐷𝑒𝑙𝑎𝑦𝑒𝑑 𝐼𝑛𝑝𝑢𝑡𝑠
𝑁𝑁 𝑇𝑜𝑝𝑜𝑙𝑜𝑔𝑦
𝑧
𝑧
𝐷𝑒𝑙𝑎𝑦𝑒 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠
Fig 6.13: Neural Network with time delayed inputs
Sometimes a dynamical neural network with time delayed recurrent inputs can be
used as below:
𝑥𝑘
𝑦𝑘
𝑧
𝑧
𝑧
𝑧
𝐷𝑒𝑙𝑎𝑦𝑒 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠
Fig 6.14
Artificial Neural Networks: System Identification and Controls
48
6. System Identification and Use of the Neural Network
In this application, output of the Neural Network is current value of the , but it
uses the previous values of
as inputs. It is used for the dependency of the current
values of on the former values. This approach looks like Infinite Impulse Response (IIR)
structure or filter of the system. As a different configuration of the IIR-structure both of
the delayed inputs and outputs can be applied as input information to the neural
topology, of course, here the previous outputs
of the neural network is used as
feedback connection. For this purpose, the following configuration is given:
𝑥𝑘
𝑧
𝐷𝑖𝑟𝑒𝑐𝑡
𝐼𝑛𝑝𝑢𝑡𝑠
𝑧
𝑦𝑘
𝑧
𝑧
𝑅𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑡
𝐼𝑛𝑝𝑢𝑡𝑠
𝑧
𝐹𝑢𝑙𝑙𝑦 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑
𝑁𝑁 𝑇𝑜𝑝𝑜𝑙𝑜𝑔𝑦
𝑧
Fig 6.15: Dynamic NN with direct and recurrent inputs
This is the dynamic neural network with time delayed direct and recurrent inputs.
Artificial Neural Networks: System Identification and Controls
49
6. System Identification and Use of the Neural Network
6.5 Models of Dynamical Systems
To get the models of dynamic system neural network identification structures are
used in the configuration of an observer, like a connected model to the real physical
structure.
𝑃 𝑦𝑠𝑖𝑐𝑎𝑙
𝑆𝑦𝑠𝑡𝑒𝑚
𝑧
𝑁𝑁
𝑧
𝑧
Fig 6.16: Non-recurrent parallel identification model
Artificial Neural Networks: System Identification and Controls
50
7. Neural Network for Applications of Control Theory
Neural Network for Applications of Control Theory
7.1 Classical Controllers
A classical block diagram of a control application is given as below:
𝑟(𝑡)
𝑢(𝑡)
𝑒(𝑡)
𝑦(𝑡)
𝑃 𝑦𝑠𝑖𝑐𝑎𝑙
𝑆𝑦𝑠𝑡𝑒𝑚
𝐶𝑜𝑛𝑡𝑜𝑟𝑙
𝑈𝑛𝑖𝑡 𝐹𝑒𝑒𝑑𝑏𝑎𝑐𝑘
Fig 7.1: Non-recurrent parallel identification model
Where,
(
(
(
(
) : reference signal
)
( )
( ): error signal
) : control signal
) : reference signal
There three types of controller:
1. PID (Proportional-Integral-Derivative) Controller
2. PI (Proportional-Integral) Controller
3. PD (Proportional-Derivative) Controller
In this manner, PID controller is defined as:
PID:
( )
( )
( )
∫ ( )
7.1
Where,
: Continuous time
,
: Controller parameters for proportional, integral and derivative
respectively.
Also other controllers are shown as:
PI:
PD:
( )
( )
( )
( )
∫ ( )
( )
7.2
7.3
Applying the Laplace transform on these equations:
Artificial Neural Networks: System Identification and Controls
51
7. Neural Network for Applications of Control Theory
PID:
( )
( )
( )
( )
7.4
With similar way:
PI:
( )
( )
( )
PD:
( )
( )
7.5
( )
7.6
Here the most important problem is to determine the controller‘s’ parameters. One of
the most famous methods is “Ziegler Nichole’s” method. Also using the equalities for
each controller type in‘s’ domain. The transfer functions of the controller are defined:
PID:
( )
( )
( )
PI:
PD:
7.7
( )
( )
( )
7.8
( )
( )
( )
7.9
Also considering the transfer function of the physical system like
diagram become:
𝑅(𝑠)
( ) the first block
𝑈(𝑠)
𝐸(𝑠)
𝑌(𝑠)
𝐺𝑠 (𝑠)
𝐺𝑐 (𝑠)
𝑈𝑛𝑖𝑡 𝐹𝑒𝑒𝑑𝑏𝑎𝑐𝑘
Fig 7.2
Where, ( ) and ( ) are also representation ( ) (reference signal) and ( ) (system
output) in s-domain, respectively.
After that we can mention about the determination of the controller parameters. In this
manner “Ziegler Nichol’s” method is given for PID controller as follows:
.
Artificial Neural Networks: System Identification and Controls
7.10
52
7. Neural Network for Applications of Control Theory
7.11
7.12
Where,
is gain at which the proportional system oscillates and
frequency.
is the oscillation
Either root-locus or Bode-plots can be used to determine
and
. For example, a
root locus is obtained from plant transfer function. The gain at which the root locus
crosses the -axis is
, and the frequency on the - axis gives
.
Alternatively, Bode plots are plotted for the given plant (physical system) transfer
function. The
(Gain Margin) is determined at the frequency
.
⁄
(
)
7.13
7.14
Example
For a given plant
( )
(
3
)
Find the PID Controller parameters by the Ziegler-Nichol’s Method
Solution:
Using following MALTLAB commands:
s = tf(‘s’);
g=400/(s*(s^2+30*s+200));
sisotool(g)
Some system parameters are
Closed loop poles { .
.
,
. }
Gain Cross-over frequency
. 5 rad/sec
GM (Gain Margin)
dB
PM (Phase Margin)
. 5 degree
Parameters using equations 7.13 and 7.14 are
respectively.
PID parameters using equations 7.10, 7.11, 7.12 are
respectively.
.
5 and
.
Artificial Neural Networks: System Identification and Controls
,
.
.
5,
,
.
,
53
7. Neural Network for Applications of Control Theory
The closed loop step response has no overshoot and the steady state error to a unit
ramp input 0.5. The root locus and bode plots for this plant is given below:
Fig 7.3: Root Locus
Fig 7.4: Bode Plots
Artificial Neural Networks: System Identification and Controls
54
7. Neural Network for Applications of Control Theory
Effects of each controller
in table 7.1.
,
and
on a closed system are summarized as shown
Rise Time
Overshoot
Settling Time
Decrease
Increase
Small Change
Steady State
Error
Decrease
Decrease
Increase
Increase
Eliminate
Small Change
Decrease
Decrease
Small Change
Table 7.1
The computation of the parameters of the controller is some time difficult and to
eliminate this difficulty neural network approach presents an alternative solution to the
control applications. In this sense, neuro-controller structures (NC) are defined.
Here the NC-approach is a learning system and it is independent from the calculation of
the controller parameters. As a primitive (basis) model:
𝑟(𝑡)
𝑒(𝑡)
𝑃𝐼𝐷
𝑁𝑁
𝑢(𝑡)
𝑆𝑦𝑠𝑡𝑒𝑚
𝑦(𝑡)
𝑀𝑜𝑑𝑒𝑙
𝑁𝐶
Fig 7.5
Here we have the PID-controller structure and training the NN with the input and output
(target for the NN) pairs of the PID-controller of the NN-topology plays role of the
classical controller PID. This application is not independent for the classical controller
structure but it provides the reliability of the controller. I can be accepted as a redundant
system.
Artificial Neural Networks: System Identification and Controls
55
7. Neural Network for Applications of Control Theory
7.2 Neuro-Controller Applications
𝑟(𝑡)
𝑒(𝑡)
𝑦(𝑡
𝑢(𝑡)
𝑁𝑒𝑢𝑟𝑜
𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑙𝑒𝑟
)
𝑃 𝑦𝑠𝑖𝑐𝑎𝑙
𝑆𝑦𝑠𝑡𝑒𝑚
𝑦(𝑡)
𝑧
𝑧
𝑦(𝑡
)
𝑧
𝐷𝑒𝑙𝑎𝑦 𝑒𝑙𝑒𝑚𝑒𝑛𝑡
Fig 7.6
𝑧
𝑧
𝑟(𝑡)
𝑒(𝑡)
𝑁𝑒𝑢𝑟𝑜
𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑙𝑒𝑟
𝑢(𝑡)
𝑃 𝑦𝑠𝑖𝑐𝑎𝑙
𝑆𝑦𝑠𝑡𝑒𝑚
𝑦(𝑡)
𝑧
𝑧
Fig 7.7


If we have mathematical model of the physical system
This model is taught to neural to NN and then the trained NN is replaced to the
configuration as a neuro controller.
Artificial Neural Networks: System Identification and Controls
56
7. Neural Network for Applications of Control Theory
7.3 Self Tuning PID Controller by Neural Network
𝑇𝐷
𝑟(𝑡)
𝑇𝐷
𝐾𝑃
𝑁𝑁
𝐾𝐷
𝑇𝐷 𝑇𝑖𝑚𝑒 𝐷𝑒𝑙𝑎𝑦
𝐾𝐼
𝑢(𝑡)
𝑧
𝑧
𝑧
𝑃𝐼𝐷
𝑃 𝑦𝑠𝑖𝑐𝑎𝑙
𝑆𝑦𝑠𝑡𝑒𝑚
Fig 7.8
𝑇𝐷
𝑟(𝑡)
𝑁𝑁
𝑃𝐼𝐷
𝑇𝐷
𝑆𝑦𝑠𝑡𝑒𝑚
𝑧
Fig 7.9
𝑟(𝑡)
𝑇𝐷
𝑁𝑁𝐶
(𝑁𝑁 𝑃𝐼𝐷)
𝑢(𝑡)
𝑆𝑦𝑠𝑡𝑒𝑚
𝑇𝐷
Fig 7.10
Artificial Neural Networks: System Identification and Controls
57
7. Neural Network for Applications of Control Theory
7.4 Sensors Validation Problem
Consider three sensors like A, B and C.
𝑥
𝐴
𝑎
𝑁𝑁
𝑦
𝑏
𝐵
𝑧
𝐶
𝑎
𝑒𝑎 (𝑒𝑟𝑟𝑜𝑟)
𝑏
𝑒𝑏 (𝑒𝑟𝑟𝑜𝑟)
𝑐
𝑒𝑐 (𝑒𝑟𝑟𝑜𝑟)
𝑐
Fig 7.11
1. NN is trained by the input and output pairs of all sensors (like x-a, y-b and z-c)
2. In any anomaly case NN will produce a different error level at the output of the NN
3. Hence we get the failure case at the NN output which is connected with the actual
sensor output.
7.4 Conditional Monitoring Application of the NN as a Neuro-Detector
𝑥(𝑡)
𝑃 𝑦𝑠𝑖𝑐𝑎𝑙
𝑆𝑦𝑠𝑡𝑒𝑚
𝑦(𝑡)
|𝑌|
𝐹𝐹𝑇
(𝑃𝑆𝐷)
|𝑌|
|𝑌(𝑗𝜔)|
𝑒𝑟𝑟𝑜𝑟
|𝑌(𝑗𝜔)|
𝐴𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒 𝑎𝑡
𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑆𝑎𝑚𝑒 𝑎𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒
𝑤𝑖𝑡 𝑠𝑎𝑚𝑒
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑
Fig 7.12
Artificial Neural Networks: System Identification and Controls
58
7. Neural Network for Applications of Control Theory
For different system behavior, we get different error signal. Hence, we detect the
different frequency component by the error variation in spectral domain.
𝑖𝑛𝑝𝑢𝑡
𝑒𝑟𝑟𝑜𝑟
𝑂𝑢𝑡𝑝𝑢𝑡
𝑁𝑁
Fig 7.13
𝑒𝑟𝑟𝑜𝑟
𝑖𝑛𝑝𝑢𝑡
𝑁𝑁
Fig 7.14
Artificial Neural Networks: System Identification and Controls
59
Appendix-2
Gaussian Noise:
Expression for Gaussian Noise is given as:
( ,
)
Where:
: Gain Factor
: Mean Value
Artificial Neural Networks: System Identification and Controls
i
Appendix-2
Autoregressive (AR) Model
In statistics and signal processing, an autoregressive (AR) model is a type of random
process which is often used to model and predict various types of natural and social
phenomenon.
AR models are shown in the form of AR(p) with p being the order of the modeled
system. The AR model is one of a group of linear prediction formulas that attempt to
predict an output of a system based on the previous outputs and inputs as shown in the
following equation.
∑
(A)
or
∑
Where,
,
,……,
are the parameter of the model, c is a constant and
white noise. The constant term is term is omitted by many authors for simplicity.
(B)
is
An AR model can thus be viewed as the output of an all-pole infinite impulse response
filter whose input is white noise.
Some constraints are necessary on the values of the parameters of this model in order
that the model remains wide-sense stationary. For example, processes in the AR(1)
model with | |
are not stationary. More generally, for AR(p) model to be wide∑
sense stationary, the roots of the polynomial
must lie within the unit
circle, i.e. each root of must satisfy | |
.
A model which depends only the previous outputs of the system is called an
autoregressive model (AR), while model which depends only on the inputs to the
system is called moving average model (MA), and of course a model based on both
inputs and outputs is an autoregressive-moving-average (ARMA) model. Note that by
definition, the AR model has only poles while MA model has only zeros.
Several methods and algorithms exist for calculation of the coefficients of the AR model,
all of which can be implemented using matlab command ‘ar’.
Yule-Walker Algorithm
A number of techniques exist for computation of AR coefficients. Two main categories
among them are least squares and Burg method. Most common least squares method
is based upon the Yule-Walker equations. Matlab has a wide range of supported
techniques, note that when comparing algorithms from different sources there are two
common basis, first is whether or not the mean is removed from the series, the second
Artificial Neural Networks: System Identification and Controls
i
Appendix-2
is the sign of the coefficients returned (this depends on the definition and is fixed by
simply inverting the sign of all coefficients).
The most common method for deriving the coefficients involves multiplying the AR
model given in equation-(B) above by
, taking the expectation values and by
normalizing gives a set of linear equations called the Yule-Walker equations that can be
derived as follows:
(i)
Multiplying the AR model in equation-(B) above by
and taking c=0;
∑
(ii)
(C)
Taking the expectation on both sides.
{
}
∑
{
}
{
}
(D)
} & {
} involves the data and
Since the expectations {
shifted version of the data which is the definition of the auto-covariance
{
} is zero for past values of
i.e.
&
respectively. While
with (delay) greater than zero, as past value of the output is unrelated
} is
to the present value of the noise and {
for
So for
, we have:
{
}
{
∑
}
(E)
}
(F)
Or
∑
{
}
{
By expanding above expression we have:
3
[
3
…
3
][
Artificial Neural Networks: System Identification and Controls
3
]
(G)
[ ]
ii
Appendix-2
Burg Algorithm
The parameter estimation approach that is nowadays regarded as the most appropriate
is known as Burg’s method. In contrast to the least-squares and Yule-Walker method,
which estimate the autoregressive parameters directly, Burg’s method first estimates
the reflection coefficients, which are defined as the last autoregressive- parameter
estimate for each model order . From these, the parameter estimates are determined
using the Levinson-Durbin algorithm. The reflection coefficients constitute unbiased
estimates of the partial correlation coefficients. Usually, these estimation methods lead
to approximately the same results for the autoregressive parameters. Once these have
been estimated from the time series , the autoregressive model can be applied to an
independent prediction realization of the same stochastic process. In terms of , the
AR process given in can be written as:
………
(H)
In which the innovation process
is statistically identical to the innovation process of
Yule-Walker method. The corresponding AR model can be written as:
̃
̃
………
̃
̃
(I)
In which
are the AR parameters estimated from realization of
and
are the
estimated innovations. Each data sample can be estimated from its predecessors.
̃
∑̃
(J)
The difference between the measured value and the estimated value is now defined as
the prediction error:
̃
̃
(K)
The prediction error is therefore equal to the estimated innovation. Each prediction error
can be calculated once the actual value of the data point is measured. A clear
distinction should be made between the residue and the prediction error and their
variances is a measure for the fit of the AR model to those data that have been used
for the estimation of the AR parameters, and can be estimated from the realization of ,
which is used for the parameter estimation:
̃ (̃ )
∑ (
̃)
(K)
For the prediction of future data, instead of the residual variance, the variance of the
( ̃ ) is essential. If the independent prediction realization contains
prediction error
Artificial Neural Networks: System Identification and Controls
iii
Appendix-2
data samples, the prediction error variance can be estimated from the sample
variance:
̃ ( ̃)
∑ (
̃)
(L)
Akeike’s Information Criteria
The Akaike Information Criterion (AIC) determines the model order by minimizing as
information theoretic function of , AIC(p). For an AR process with Gaussian statistics,
AIC(p) is defined as:
( )
( ( ))
(M)
( ) is the estimated variance of the white
Where
is the number of samples, and
driving noise (i.e., the prediction error), a decreasing function of . The term
is a
“penalty” for the use of extra AR coefficients that do not substantially reduce the
prediction error.
The “AIC minimum” is only one of many criteria proposed for the selection of the AR
order. Another popular criterion is the Final Prediction Error.
Artificial Neural Networks: System Identification and Controls
iv
References
References
URL 1: http://science.kennesaw.edu/~jdirnber/Bio2108/Lecture/LecPhysio/
PhysioNervous.html
URL 2: http://cla.calpoly.edu/~cslem/101/4-B.html
URL 3: http://superchargeretirementincome.com/wpcontent/uploads/2013/01/NormalDistributionWithPercentages1.png
1. Robert Stufflebeam, Neurons, Synapses, Action Potentials, and Neurotransmission,
http://www.mind.ilstu.edu/curriculum/neurons_intro/neurons_intro.php
Artificial Neural Networks: System Identification and Controls
i
Download