Artificial Neural Networks System Identification and Control Prof. Dr. Serhat Şeker Faculty of Electrical and Electronics Istanbul Technical University, Ayazağa Campus Maslak-34469, Istanbul, Turkey sekers@itu.edu.tr July 2014 Table of Contents Table of Contents 1. Introduction 2. Artificial Neuron and its Mathematical Model 3. Information Flow or Information Content of Neural Network Structure 4. Back Propagation Algorithm (Multi Layered Neural Network) 5. Recurrent Neural Networks 6. System Identification and Use of the Neural Network 7. Neural Network for Applications of Control Theory 7.1. Self-Tuning Neural Network Controller Artificial Neural Networks: System Identification and Controls i 1. Introduction Introduction System Concept: The class of system studied in this text is assumed to have some input terminals and output terminals. We assume that if an excitation input is applied at the input terminals and response is measured on the output terminals. A system with only one input and one output terminal is generally known as Single-Input Single-Output (SISO) System. On the other hand, a system with two or more input terminals and/or two or more output terminals are is called Multi Input-Multi-Output (MIMO) system. Single-Output Single-Input Black Box (System) x [n] y [n] SISO-System Multi-Input Multi-Output Black Box (System) [x1, x2, ……., xp] [y1, y2, ……., ym] MIMO-System Fig 1.1 A system can be continuous-time system or discrete-time system. In this text we are more interested in discrete time system considering its computer applications. A system is called discrete-time system if it accepts discrete-time signals as input and generates discrete-time signals as its output. For SISO system as shown in above figure, x[n] is the input and y[n] is the output where ‘n’ is the discrete time. In Time-Domain ‘System’ output can be represented in terms of convolution as below: [ ] [ ] [ ] 1.1 ] [ ] 1.2 or [ ] ∑ [ Artificial Neural Networks: System Identification and Controls 1 1. Introduction By applying Z-Transform on the above equation 1.2 we can convert it into frequency domain. In frequency domain the ‘convolution’ is converted into simple multiplication operator. { [ ]} { [ ] [ ]} 1.3 or ( ) ( ) ( ) 1.4 In this text, a Neural Network (NN) can be accepted as MIMO system and we are interested in analysis of non-linear relationship of the system. Important Differences: In classical system studies, we know the system function like impulse response function i.e. h[n], whereas in NN approach, there is no need for system function. In terms of Mathematical Implementation of conventional system in the hardware we use summation, multiplication and delays operators. But in NN we build algorithms which are implemented through software. Generally, all systems are non-linear in nature. In classical system, non-linear system are linearized and analyzed through the well-developed analysis techniques e.g in state-space form. However, there are techniques / approaches and direct methods to analyze the non-linear system e.g. Lyapunov approach. There are several ways by which non-linear systems are linearized. But the simplest is the linearization is obtained by Taylor Expansion. Consider a non-linear function ( ), Taylor Expansion of the function ( ), at a point ( ) is given as: ( ) ( Consider a general linear equation with ) ( ) ( is constant and ) 1.5 is the slope of the line i.e. 1.6 If we take only the first two terms of the Taylor Expansion above by ignoring the higher order terms we get: ( ) 1.7 ( ) ( ) Artificial Neural Networks: System Identification and Controls 2 1. Introduction Comparing it with the general linear equation 1.6, this can be considered as the approximated linear form of the non-linear function. But by ignoring the higher orders terms we have introduced huge error in the model. But still we use the approach because of its easy computation and analysis tools. However, in NN approach we deals the non-linear systems in there direct form. Basic Neural Network (NN) Configuration: In this text we will consider MIMO Systems as shown in figure below: Multi-Input Combination of the Neurons Multi-Output [y1, y2, ……., ym] [x1, x2, ……., xn] Fig 1.2 Where [x1, x2, ……., xn] and [y1, y2, ……., ym] are the discrete (n x 1) input and (m x 1) outputs respectively. Also combination of the neurons has Parallel Distributed Structure e.g: Parallel Distributive Structure Neurons Neurons Fig 1.3 Artificial Neural Networks: System Identification and Controls 3 1. Introduction Advantage of Parallel Distributive Structure: Reliability: It is a highly reliable structure because even if we remove some connections (neurons) still the system will work. Optimization: The degree of information of each connection is related to its weight factor. Optimization can be achieved by implementing the learning algorithm basing on the sensitivity of the weight factors. Speed: Fast processing time can be achieved due to parallel distributive nature of the network. Famous algorithms to achieve this are Levenberg-Marquard and Back Propagation. Considering the above facts the basic NN configuration also known as MultiNode Feed Forward NN (information flows from input to output) with ‘n’ input nodes and ‘m’ output nodes is given as in following figure: Input Layer & its Nodes Output Layer & its Nodes x1 x2 y1 x3 ym xn st 1 Hidden Layer & its Nodes nd 2 Hidden Layer & its Nodes Fig 1.4: Multi-Layer Feed Forward Neural Network Problem: Determination of Hidden Layers & Hidden Nodes As the number of hidden layers and nodes are directly linked with the processing time of the NN. Higher the number of hidden layers and nodes, slower will be the NN because of large number of computations. On the other hand if the hidden nodes are Artificial Neural Networks: System Identification and Controls 4 1. Introduction less, then fast processing can be achieved but learning process will not be the optimum. So we want to optimize the Topology. To make the problem bit simpler, at first we can assume only one hidden layer, in this case the network topology is shown as follows: x1 x2 y1 x3 ym xn Fig 1.5: Multi-Layer Feed Forward Neural Network with Single Hidden Node Generally, there is no criterion to optimize the hidden nodes. But we can do it by trial and error. One method to achieve this is by considering the Shannon’s Information Criteria. Shannon an American Engineer in 1948 defined Information Criteria for Channel’s capacity. Sometime it can be taken as information measuring unit. The measure of unit information is ‘bit’. So we can calculate the information amount which is being processed at the hidden layer and can be described in terms of bits. Generally: Number of bits = Number of processing elements of the hidden layers For a given NN structure we have to follow two steps i.e. Learning Test (Recalling) Artificial Neural Networks: System Identification and Controls 5 1. Introduction Learning Procedure for Multi-Layered Feed Forward Neural Network: At first we will consider learning with target values (desired output). Consider a general NN having with ‘n’ input nodes, ‘m’ output nodes and ‘m’ target values as shown below: x1 T1 x2 y1 x3 ym Tm xn Fig 1.6: Multi-Layer Feed Forward Neural Network with Target Value (Desired Output) In terms of vectors, we can define all variables as follows: Vector Vector Vector Matrix ̃ = [x1, x2, ……., xn] ̃ = [y1, y2, ……., ym] : Input Vector : Output Vector : Target Value (Desired o/p) ̃ = [T1, T2, ……., Tm] ̃ = [Wji] = ̃ ⇒ : Size of i/p & hidden layer = ̃ ⇒ : Size of hidden layer & o/p (where k is number of hidden nodes) Error at output layer nodes can be calculated as follows: ( ) 1.8 Note that error can be positive or negative, to avoid this we use absolute value i.e.: | | 1.9 Or | | 1.10 It can be represented as Energy Function as whole: ∑ Artificial Neural Networks: System Identification and Controls 1.11 6 1. Introduction For learning procedure we want to minimize the error and to do so, we have: 1.12 As independent value of are dependent on weight factors. So we can get the optimization. For this we adjust the weight factors by learning algorithm. In this manner the most popular learning algorithm is ‘Back Propagation Algorithm’ for Feed Forward Neural Networks. An important point to consider here is that the term Back Propagation is not related to the feedback as information flows from input to output. Contrary to the conventional control systems in which feedback flows from output to input. But it is due to the adjustment of the weights basing on back propagation of error. Same is show below in figure: ̃ Information flow between Input & Output x1 T1 x2 y1 x3 ym Tm xn Fig 1.7 Hence, the information which is defined between input and output layers is stored in the weight factors. Finally the relationship between the input and output pairs is established by means of learning algorithm without mathematical functional relationship. Note: In conventional case we know the functional relationship ( ) for equation ( ). And we can easily find the output for corresponding input . But if the functional relationship ( ) is not known then we apply Neural Network approach. Introduction of Noise: Sometime to get the enhanced performance, noise term is introduced as an additional term to the data i.e. 1.13 ̃ ̃ ̃ ̃ Artificial Neural Networks: System Identification and Controls 1.14 7 1. Introduction Considering Noise term as Gaussian Noise 1 with zero mean, unit standard deviation and gain factor k (usually small), we have: ̃ ̃ ( , ) 1.15 ̃ ̃ ( , ) 1.16 Matrix Notation in Learning: 𝑥 𝑥 𝑥 𝑥 𝑥 𝑥 𝑥𝑝 𝑥𝑝 𝑥𝑝𝑛 𝑦 𝑦 𝑛 𝑛 Output Matrix 𝑦 𝑦𝑚 𝑦 𝑦𝑚 𝑦𝑝 (pxn) 𝑇 𝑇 Fully Connected Network 𝑦𝑝 𝑦𝑝𝑚 (pxm) Target Matrix 𝑇 𝑇𝑚 𝑇 𝑇𝑚 𝑇𝑝 𝑇𝑝 𝑇𝑝𝑚 (pxm) Fig 1.8 So by back propagation of the error matrix and hence adjusting the inner factors, namely the weight factors, actual output will approach the target values at the end of the learning procedure. Application of Data and Recalling: 70 % of the total rows are used for Learning Step , , , , , , , , , ? ? ? Unknown (Without Target) 30 % of the total rows are used for Test Fig 1.9 1 See Appendix-I for Gaussian Noise Artificial Neural Networks: System Identification and Controls 8 2. Artificial Neuron & Its Mathematical Model Artificial Neuron & Its Mathematical Model 2.1 Biological and Artificial Neuron: A neuron is a specialized type of cell found in the bodies of all eumetozoans2. Only sponges and a few other simpler animals lack neurons. The features that define a neuron are electrical excitability and the presence of synapses, which are complex membrane junctions that transmit signals to other cells. Fig 2.1: Basic Biological Neuron Structure [URL 1] As shown in Fig 2.1, a typical neuron is divided into three parts: the soma or cell body, dendrites, and axon. The soma is usually compact; the axon and dendrites are filaments that extrude from it. Dendrites typically branch profusely, getting thinner with each branching, and extending their farthest branches a few hundred micrometers from the soma. The axon leaves the soma at a swelling called the axon hillock, and can extend for great distances, giving rise to hundreds of branches. 2.1.1 Neurotransmission In neurotransmission or sometimes also called as synaptic transmission, neurons communicate by sending an electrical charge down the axon and across the synapse to the next neuron as shown in Fig 2.1. Because the neurons are not physically connected, chemical messengers called neurotransmitters cross the synaptic gap to get the message to the next neuron. Communication is both electrical and chemical. At electrical synapses, two neurons are physically connected to one another through gap junctions. Gap junctions permit changes in the electrical properties of one neuron to effect the other, and vice versa, so the two neurons essentially behave as one. 2 Eumetazoa is a clade comprising all major animal groups except sponges, placozoa, and several other obscure or extinct life forms, such as Dickinsonia. Artificial Neural Networks: System Identification and Controls 9 2. Artificial Neuron & Its Mathematical Model 2.1.2 Electrical Neurotransmission At electrical synapses, two neurons are physically connected to one another through gap junctions. Gap junctions permit changes in the electrical properties of one neuron to effect the other, and vice versa, so the two neurons essentially behave as one. Electrical neurotransmission is communication between two neurons at electrical synapses. [1] Presynaptic Neuron Postsynaptic Neuron Fig 2.2: Enlarge view of Synapse between two neurons [URL 2] 2.1.3 Chemical Neurotransmission In chemical neurotransmission, the presynaptic neuron and the postsynaptic neuron are separated by a small gap - the synaptic cleft. The synaptic cleft is filled with extracellular fluid (the fluid bathing all the cells in the brain). Although very small, typically on the order of a few nanometers (a billionth of a meter), the synaptic cleft creates a physical barrier for the electrical signal carried by one neuron to be transferred to another neuron. In electrical terms, the synaptic cleft would be considered a “short” in an electrical circuit. The function of neurotransmitter is to overcome this electrical short. It does so by acting like a chemical messenger, thereby linking the action potential of one neuron with a synaptic potential in another. [1] As the information content of during the neurotransmission process has a nonlinear structure. So, we can describe the mathematical model of neuron. Artificial Neural Networks: System Identification and Controls 10 2. Artificial Neuron & Its Mathematical Model 2.2.1 Mathematical Model of Biological Neuron: Consider a Multi-Input & Single-Output (MISO) system. Where [x1, x2, ……., xn] and [yj] are the discrete (n x 1) inputs and output respectively. x1 wj1 Cell body as Linear Summation x2 ∑ wj2 xn aj Non Linear Actuation Function yj wjn Fig 2.3: Mathematical Model for jth Biological Neuron Topology Mathematically: 2.1 ∑ & ( ) 2.2 Here, ( ) is a non-linear function. There are various types of non-linear actuation functions e.g.: Sigmoidal Function (Logistic Function) tanh-function Arctan – Function Artificial Neural Networks: System Identification and Controls 11 2. Artificial Neuron & Its Mathematical Model 2.2.2 Modified Mathematical Model (as Learning System): x0 x1 x2 xn wj0 wj1 wj2 faj aj ∑ + wjn yj Tj - ej Adaptive Learning Algorithm Fig 2.4 Where: ( ( ) , ,……, ) ( ( ) Also we know that (For Sigmoidal case or Logistic Actuation Function) ) 2.3 is the linear combination of weight factors and input: 2.4 ∑ Or 2.5 ∑ Finally, 2.6 ∑ ( ) ∑ Artificial Neural Networks: System Identification and Controls 12 2. Artificial Neuron & Its Mathematical Model Substituting above in sigmoidal function, we get: 2.7 ∑ ( ) Or 2.8 ∑ 1- ) Non Linear Region faj 0.5 - aj Linear Region Fig 2.5 For large values of the weights, the output of actuation function appeared to be the unit value asymptotically. On the contrary, for small values of weights input is mapped to around the zero value. For this reason for very small values of the weights the variation of the actuation function can be accepted as linear variation within a very narrow band as shown in Fig 2.5. So it can be inferred that for the large values of weights “the information is stored in the non-linearity” and for the small values of weights “the information is stored in the linearity”. Hence, basing on above explanation it can be concluded that behavior of the neuron can be adjusted by adjusting the modes of operation in linear and non-linear region respectively. Artificial Neural Networks: System Identification and Controls 13 2. Artificial Neuron & Its Mathematical Model 2.2.3 Modified Mathematical Model (as Learning System): Consider input vector, X=[x1, x2, ……., xn], having random variables in standard normal distribution i.e. ( , ) ( , ). x1 x0=1 wj1 x2 aj wj2 xn ∑ Non Linear Actuation Function yj wjn Fig 2.6 As shown in Fig 2.5, the non-linear actuation function can be divided into linear and non-linear regions of operation. Both regions have different response to the input vector having random variables in standard normal distribution. 2.3.1 Normal Distributed Variables: Consider random variables having standard normal distribution with variables mean value (μ=0), standard deviation (σ=1), skewness (s=0) and kurtosis (k=3) as shown in following fig 2.7. Fig 2.7 Standard Normal Distribution [URL 3] Artificial Neural Networks: System Identification and Controls 14 2. Artificial Neuron & Its Mathematical Model Here, the skewness is a measure of symmetry and it is always zero for symmetrical distribution as depicted by percentage equality of same colored regions in above Fig 2.7. For asymmetrical distribution, skewness has either positive or negative value other than zero. This can be shown in Fig 2.8 below as asymmetrical distribution skewed to the left (for s<0) and skewed to the right (for s > 0). “Normal” (S = 0) “Skewed to the right” (S > 0) “Skewed to the left” (S < 0) Fig 2.8 The parameter of skewness, s, defines the measure of the symmetry for a given normal distribution. For symmetrical case, it is always zero while its different from the zero for any asymmetrical distribution. 2.3.2 Response of Linear Transform on the Normal Distributed Variables: The random variables having standard normal distribution are transformed to another normal distribution with different statistical parameters like: 2.9 Consider general linear equation: 2.10 Comparing the equations 2.9 & 2.10, it can be concluded that if has standard normal distribution ( , ), then under the linear transformation, the new variables will have normal distribution ( , ) or ( , ∑ ) i.e. statistical parameters are changed depending on the linear equation parameters but we cannot observe the same in the non-linear operation. However, it can observed from the following Fig 2.8 that for the linear part of the sigmoidal function and for small weight factors, Gaussian input symmetrical distribution results in symmetrical distribution with change statistical parameters as per the parameters of equation of linear region. Artificial Neural Networks: System Identification and Controls 15 Gaussian Output (Symmetrical Distribution) 2. Artificial Neuron & Its Mathematical Model 1- f(∑𝑛𝑖 𝑤𝑗𝑖 𝑥𝑖 𝜃𝑗 ) Linear Region 0.5 - 𝑛 ∑ 𝑤𝑗𝑖 𝑥𝑖 f(𝑥) 𝜃𝑗 𝑖 Gaussian Input (Symmetrical Distribution) 𝜇 𝜎 𝑥 Fig 2.9 2.3.3 Response of Non-Linear Transform on the Normal Distributed Variables: Finally we will consider non-linear transformation of normally distributed random variables in neuron. As concluded above we cannot observe the symmetrical distribution like linear transform. The result of non-linear transform is asymmetrical distribution as show in Fig 2.8 below: Non-Gaussian Output (Asymmetrical Distribution) f(∑𝑛𝑖 𝑤𝑗𝑖 𝑥𝑖 𝜃𝑗 ) 1Non Linear Region 0.5 - 𝑛 ∑ 𝑤𝑗𝑖 𝑥𝑖 𝜃𝑗 𝑖 f(𝑥) m Gaussian Input (Symmetrical Distribution) 𝜇 𝜎 𝑥 Fig 2.10 Artificial Neural Networks: System Identification and Controls 16 2. Artificial Neuron & Its Mathematical Model 2.3.4 Other Non-Linear Actuation Functions Until now we considered sigmoidal actuation function. But other actuation functions can also be used e.g. 2.3.4.1 Arctangent Function 𝑓(𝑥) (𝛼𝑥) 𝜋 𝑓(𝑥) 𝜋 (𝛼𝑥) Fig 2.11 2.3.4.2 Hyperbolic Tangent Function 𝑓(𝑥) h( 𝛼𝑥) 𝑒 𝛼𝑥 𝑒 𝛼𝑥 𝑒 𝑒 𝛼𝑥 𝛼𝑥 Fig 2.12 𝑓(𝑥) 2.3.4.3 Logistic (Sigmoidal) Function 𝑒 𝛼𝑥 Fig 2.13 Artificial Neural Networks: System Identification and Controls 17 2. Artificial Neuron & Its Mathematical Model 2.4 Adaptive Learning Algorithm 2.4.1 Widrow-Hopf Delta Learning Rule Consider a neuron without actuation function but with target value ‘T’ and error ‘ε’: x1 wj1 x2 wj2 I=∑ 𝑤𝑖 𝑥𝑖 + wjn xn 𝜀 I𝜀 (𝑇 𝐼) (𝑇 𝐼) T Fig 2.14 If the gradient of the squared error is calculated w.r.t. weights as follows: ( For two inputs, the squared error is written as: [ [ 2.11 ) ] ]+ 2.12 ( [ )]+[( ) ] Hence, the derivatives are: Suppose, = 0, hence [ [ ] [ ] ] 2.13 2.14 . From this equality we get: 2.15 & 2.16 Artificial Neural Networks: System Identification and Controls 18 2. Artificial Neuron & Its Mathematical Model By substituting & in equation 2.12, we get the Minimum Square Error (MSE) which is equal to zero. In terms of the theoretical result, this is correct but in real world the MSE is not equal to zero because of non-linearity, noise and imperfect data. The minimization of squared error by the Widrow-Hopf training algorithm can be shown graphically as follow in Fig 2.14: (𝑇 𝑤 𝑥 ) 𝜀 𝑚𝑖𝑛 𝑤 𝑇 𝑤 𝑤 𝑥 𝑥 Fig 2.15 Hence, the Widrow-Hopf rule provides the following equality in terms of the change in each weight: 2.17 Or ( ) 2.18 Where k is constant. This rule is known as delta rule and it moves the weight factor along the negative gradient of the curve surface towards the ideal weight factor position. Hence it follows the gradient and it is also called as Gradient Descent or Steepest Descent Algorithm. Graphically it is shown as in Fig 2.15. Artificial Neural Networks: System Identification and Controls 19 2. Artificial Neuron & Its Mathematical Model 𝜀 𝑤 Delta vector Ideal weight vector 𝑤 Fig 2.16 Geometric Representation of Delta Rule Finally to normalize the input vector component ‘ ’ by | | , the delta rule is written as: | | | | [ And taking | | ] 2.19 2.20 | | | | , it becomes: 2.21 | | Artificial Neural Networks: System Identification and Controls 20 3. Information Flow or Information Content of Neural Network Structure Information Flow or Information Content of NN Structure 3.1 Shannon’s Information for Feed Forward Neural Network and Information Measure as Number of Hidden Nodes First we will consider following two basic related concepts: Information : ( ) Entropy : ( ) Where X is a random variable with possible values of [ , , ……., ]. Here, we can write the entropy in terms of information I(X) as follows: ( ) ( ( )) 3.1 Where ( ) is a mathematical operator to calculate the mean value. As a result this entropy value becomes: ( ) ( )) 3.2 ( Where ( ) is the probability mass function of . By taking the finite samples, the entropy, ( ) can be written as: ( ) 3.3 ∑ ( ) ( ) or ( ) Where ∑ ( ) is base of logarithm for binary case i.e. ( ) 3.4 =2. Hence, the mean information or entropy for the binary case becomes: ( ) Here, ∑ 3.5 ( ) For equally likely events in probability theory. Each probability value for states given as: is 3.6 If so, the equation 3.5 becomes: ( ) ∑ Artificial Neural Networks: System Identification and Controls 3.7 21 3. Information Flow or Information Content of Neural Network Structure or ( ) ∑ ( ) ( ( )) ( ) For combination of information sets e.g. ( | ) ( ) ( ), conditional entropy is given as: ( , ) ( ) 3.8 ( ) 3.9 ( ) ( , ) Fig 3.1 3.2 Interpretation of a Feed Forward Neural Network as a Communication System and Shannon’s Channel Capacity: A basic communication system is comprised of transmitter point, receiver point and information channel. As an example it is given as: Noise Or Disturbance Transmitter Information (Voice, image etc.) Filter Modulator Receiver Communication Channel Wire Coaxial Cable Radio Link Wave Guide Voice, image etc. Filter Modulator Fig 3.2: Basic Communication System Artificial Neural Networks: System Identification and Controls 22 3. Information Flow or Information Content of Neural Network Structure For the system as shown in Fig 2.16, channel capacity as a measure of information can be defined as formula using the Shannon’s approach: ( ) 3.10 Where: : Channel capacity [ ] = bit/sec : Channel Bandwidth [ ] = Hz : Signal to Noise Ratio For a noiseless channel, namely & Using the structure (similarity) of communication channel, as shown in Fig. 2.16, we can define Feed Forward Neural Network topology. However, to do this we have following assumptions: Consider a feed forward Neural Network with three layers, using only one hidden layer as shown below: Input Layer like Transmitter Hidden Layer like Communication Channel Output Layer like Receiver Fig 3.3 Hence the hidden layer can be represented by channel capacity and/so numerical value of the channel capacity is described by the number of the hidden units like number of bits. This is an optimization problem. Artificial Neural Networks: System Identification and Controls 23 3. Information Flow or Information Content of Neural Network Structure 3.3.1 An Analytical Approach based on Information Theory for NN Architecture: For Feed-Forward NN topology, determination of the hidden nodes related to the hidden layer is still an open problem. In this chapter we consider only one hidden layer between input and output layers. Objective here is to optimizer the number of the hidden nodes. The number of hidden nodes can be represented by an information measure. This information measure is based on Shannon’s information equality. Hence, it can be defined as a logarithmic measure in unit of bits. The entropy value, which is defined as a measure value of the information can be given for a binary state as below: ( , ( , ( ) )= ( ) )= 3.11 ( ) ( ) ( ( , The graphical illustration of the entropy function becomes: 3.12 ) ) for the binary state 𝑚𝑎𝑥𝐻𝑗 𝐻𝑗 𝑝𝑗 𝑞𝑗 𝑝𝑗 𝑝𝑗 𝑞𝑗 .5 Fig 3.4 Sometimes instead of the binary state, it can be represented by natural logarithm which is defined by base ‘e’ , then equation 3.11, can be written as: ( , )= ( ) ( ) ( ) 3.13 From equation 3.12, we can see that the entropy function uses the probability values, whereas we don’t have the probabilities in Neural Network topology. To give the probabilistic meaning to Neural Network application following assumptions can be considered. Artificial Neural Networks: System Identification and Controls 24 3. Information Flow or Information Content of Neural Network Structure 3.3.2 Assumptions and Definitions: If a pattern set is applied to the Neural Network topology, the relationship of the input – output pair can be defined by a which is called mapping function between input and output vector spaces like: ̃ , ̃ 3.14 ⇒ Indicating the states of each independent input vector by index Information Flow 1st Pattern ̃ 𝑊 𝑥 ( ) 𝑥 ( ) 𝑥 ( ) 𝑥 ( ) (𝑁) 𝑥𝑁 𝑥𝑁 (𝑁) 3.15 , it can be shown as: ( ) 𝑁 ( ) 𝑥 𝑁 𝑥 𝑥 ( ) 𝑥 ( ) 𝑥 ( ) 3 𝑥 ( ) 𝑁 ̃ 𝑊 (𝑁) 𝑥𝑁𝑁 Square Matrix with N-Dimension Hidden Layer j=1,2,3,….,M (Unknown) Input Layer i =1,2,3,….,N Output Layer k=1,2,3,….,K Fig 3.3 Note: Here weight matrices, [ procedure. ] [ ] , are constant matrices after the training If each pattern is selected from the pattern set A, according to constant (likelihood) probability value ( ), then the value of ( ) is: 3.16 ( ) If we accept information flow from the input layer to hidden layer nodes. The weighted sum of jth processing element for the hidden layer with state ( ) is transferred by means of the sigmoidal function, then: Artificial Neural Networks: System Identification and Controls 25 3. Information Flow or Information Content of Neural Network Structure 3.17 ( ) { x [ ( ) (∑ )]} Where, are the optimal weights obtained after the training. Here, we consider the half topology of Neural Network topology. If so, it is shown as input and hidden output ̃ . Consequently the pairs. Hence the hidden output node matrix becomes ̃ & exponential term is given by: ( ) ( ) (∑ ) ( ( ) ) 3.18 3.3.2 Application of Entropy Function: Using the definition of entropy function as: ( , ( ) )= ( ) ( 3.19 ) Here, this entropy definition can be written in the following way for exponential term of the sigmoidal outputs at the hidden nodes. Therefore, from equation 3.18 it becomes: (∑ ( ) ) 3.20 , Where: ( ) ( ) ( ), ( )& ( ) Now if the equation 3.18 is integrated between input and hidden output. As a result we get: ( ( | ) ( )| ( )) ∫ ∑ (∑ , ( ) ) ( ) ∫ Here, ( ( )| ( ))is a conditional probability. Also all parameters, like , are constant. Finally equation 3.19 becomes: ∑ (∑ , And if the input 3.21 , ( ) ) ( ( )| ( )) ( | ) , ( ), 3.22 , and hidden output being independent i.e. ( ( )| ( )) Artificial Neural Networks: System Identification and Controls ( ), then 26 3. Information Flow or Information Content of Neural Network Structure ∑ (∑ , ( ) 3.23 , ( | ) By using the conditional entropy with the definition, substituting this definition in equation 3.21 we get: ∑ (∑ , ( | ) ) ( ) ( ) ( , ) ( , ) ) ( ) ( ) and ( ) 3.24 , By considering the left hand side of above equation 3.22 as a entropy function, it is defined as: ( ) ( ) ( ) | (∑ ∑ , ∑ ( ))| 3.25 , And comparing the first and second terms of the both sides of equation 3.22 respectively, we can write that: ( ) ( ) ∑∑ , ( , ) 3.26 , and the entropy function of input is ( ) ∑ with constant probability ( ) and parameter ∑ For, & ( ) 3.27 , equation 3.25 becomes: ( ) 3.28 , we get ∑ ( ) 3.29 After training the bias term ( ) will have a constant value like all weights, then the mean value of bias ( ̅ ): ̅ With the approximation that , one can get, . Where, . ∑ ( ) 3.30 . By writing the entropy of the input explicitly it becomes Artificial Neural Networks: System Identification and Controls 27 3. Information Flow or Information Content of Neural Network Structure ∑ ( ) . ( ( ) , And considering the state of the input vector (s) related to probability, equation 3.29 becomes . 3.31 ) ( ) , the ( ) ( ) 3.32 For r=1, we get: ( ) 3.33 or 3.34 For binary state, equation 3.31 becomes: . Since the value of is a positive integer, it can be shown as ⟦ ⟧ 3.35 Here, ⟦ ⟧ is the integer function. Also ⟦ ⟧ 3.36 We generalize the above result for any input 3.37 Where indicates the integer and [ ] : bit (integer) As a conclusion if found that the channel capacity in communication system was 3.38 So channel capacity can be interpreted as the number of hidden nodes in the Neural Network, is , hence, using the similarity, ( ) plays role of the bandwidth ( ) in the communication system and as a result, we get: 3.39 Artificial Neural Networks: System Identification and Controls 28 4. Back Propagation Algorithm Back Propagation Algorithm 4.1.1 Back Propagation Algorithm (B.P. Algorithm) Let’s consider a three layered feed forward neural network topology as shown below: 𝑊𝑘𝑗 𝑊𝑗𝑖 𝑋𝑝𝑛 Input Layer (i) where, i =1,2,3,….,N 𝑌𝑝𝑘 Output Layer (k) where, k=1,2,3,….,K Hidden Layer (j) where, j=1,2,3,….,M Fig 4.1 Where: 1. Input (i) for ‘p’ pattern is given as: ( , , 3, … … … … . , ) or 𝑥 𝑥 𝑥 𝑥 𝑥𝑝 𝑥 𝑥 𝑥𝑝 𝑁 𝑁 4.1 𝑥𝑝𝑁 2. Net input to the hidden layer neurons (j) is𝑥given𝑥 as: 𝑥𝑁 𝑥 𝑥 𝑥 For a pattern, p=1 or 𝑤 𝑥 𝑤 𝑥 𝑛𝑒𝑡 = ∑ 𝑤 𝑥 𝑤 𝑥 𝑛𝑒𝑡 𝑛𝑒𝑡 𝑀 𝑤𝑀 𝑥 Artificial Neural Networks: System Identification and Controls 4.2 𝑤𝑀 𝑥 29 4. Back Propagation Algorithm 3. Output of the hidden layer neurons (j) basing on the actuation function (e.g sigmoidal function) is given as: For a pattern, p=1 ( ) 𝑂 𝑂 or 𝑂 = 𝑓(𝑛𝑒𝑡 ) 𝑓(𝑛𝑒𝑡 ) 𝑓(𝑛𝑒𝑡 𝑀 4.3 𝑀) 4. Net input to the output layer neurons (k) is given as: For a pattern, p=1 ∑ 𝑛𝑒𝑡 𝑛𝑒𝑡 or 𝑛𝑒𝑡 = 𝑀 𝑤 𝑜 𝑤 𝑜 𝑤 𝑜 𝑤 𝑜 𝑤𝑀 𝑜 𝑤𝑀 𝑜 4.4 5. Output of the output layer neurons (k) basing on the actuation function (e.g sigmoidal function) is given as: For a pattern, p=1 ( ) 𝑂 𝑂 or 𝑂 = 𝑓(𝑛𝑒𝑡 ) 𝑓(𝑛𝑒𝑡 ) 4.5 𝑓(𝑛𝑒𝑡 𝐾 ) 𝑀 4.1.2 Adjustment/Updating of Weight factors between Output & Hidden Layer Now we will define an energy function for the output nodes of the neural network topology as shown in figure 4.1, which is written as: ( ) 4.6 The main goal of B.P. algorithm is to deduce method / conditions for adjustment of weight factors basing on the minimum value of energy function, as follows: 4.7 Using equation 4.4, 4.8 Artificial Neural Networks: System Identification and Controls 30 4. Back Propagation Algorithm Equation 4.7 becomes, 4.9 Re-writing the first term of the right hand side of equation 4.9 using chain rule: 4.10 Using the equations 4.5 and 4.6, we can have following equalities: ( ) ( ) 4.11 And ( 4.12 ) Hence, equation 4.10 using equations 4.11 & 4.12 becomes: ( ) 4.13 Also using the following definition: 4.14 We can define: ( ) 4.15 Returning to the equation 4.9 and substituting the equation 4.15, it becomes: 4.16 Above equation shows the decreasing nature of energy function depending on the weight factors for output nodes of the neural topology. Then, it can be shown in terms of proportionality between the changes in the weights between hidden and output nodes. 4.17 Artificial Neural Networks: System Identification and Controls 31 4. Back Propagation Algorithm And also using a parameter which is known for learning rate (ɳ), this proportionality becomes equality, as below: ɳ 4.18 In equation 4.18, the left hand side is finite difference and it can be re-written depending on the iteration number ( ) as follows: ( ) ( ) ɳ 4.19 ( ) ( ) ɳ 4.20 Or For , we can calculate derivative of sigmoidal function (chosen as actuation function) as follows: [ ( Using calculated result of ] ( ) ) 4.22 , equations 4.15 & 4.20 can be finally re-written as: ( ( Here, =1 normally 4.21 ) ( ) )( ɳ ( ) )( 4.23 ) 4.24 parameter can be taken as 4.1.3 Adjustment/Updating of Weight factors between Hidden & Input Layer A similar approach can be used for updating of weight factors which are defined between hidden and input layer. For this reason, to write the equation 4.20 for this new case, namely for and , we require and the term can be easily calculated by and hence the algorithm is known as Back-Propagation Algorithm. Here we consider total energy of the output nodes: ∑ Artificial Neural Networks: System Identification and Controls 4.25 32 4. Back Propagation Algorithm And will deduce the minimal condition for the same using parameters between hidden and input layers as follows: 4.26 Equation 4.26 can be re-written as: ∑ 4.27 Here first term of right hand side of above equation can be calculated using chain rule: ∑ Putting the 4.28 from equation 4.2, we have: ∑ (∑ ) 4.29 or ∑ Inserting 4.30 as we defined in equation 4.14 in previous case, we have: ∑ 4.31 For the second term on the right hand side of equation 4.27, we have: ( ) ( ) 4.32 Using equations 4.31 and 4.32, equation 4.26 becomes: ∑ 4.33 ∑ 4.34 or Artificial Neural Networks: System Identification and Controls 33 4. Back Propagation Algorithm Using the similar definition of in equation 4.14, we can define: 4.35 Hence, equation 4.33 can be re-written as: ∑ 4.36 Last equation defined for can be used in the computation of which are defined as changes of weights between input and hidden layers as done in previous case (refer equation 4.17 & 4.18). ɳ 4.37 In equation 4.37, the left hand side is finite difference and it can be re-written depending on the iteration number ( ) as follows: ( ) ( ) ɳ 4.38 ( ) ( ) ɳ 4.39 Or For , we can calculate derivative of sigmoidal function (chosen as actuation function) as follows: [ ( Using calculated result of ( ) 4.40 ) 4.41 , equations 4.36 & 4.39 can be finally re-written as: ( ( ] ) ( ) 4.42 )∑ ɳ ( )∑ 4.43 Here, =1 normally parameter can be taken as . And the parameter ɳ ( ɳ ) is learning rate (or constant). For a good learning level it can be selected as ɳ . . For stability of learning process, lower value ɳ for is normally chosen. Artificial Neural Networks: System Identification and Controls 34 4. Back Propagation Algorithm 4.2 Back Propagation Method (Block Diagram) Target Output Actual Output 𝑂𝑝𝑘 𝑂𝑝𝑘 𝑡𝑝𝑘 𝑓(𝑛𝑒𝑡𝑝𝑘 ) 𝑓 (𝑛𝑒𝑡𝑝𝑘 ) 𝑂𝑝𝑘 𝑛𝑒𝑡𝑝𝑘 ∑ 𝑊𝑘𝑗 𝑂𝑝𝑗 𝑗 Error 𝛿𝑝𝑘 𝑓 𝑝𝑘 (𝑡𝑝𝑘 𝑊𝑘𝑗 𝑂𝑝𝑘 ) 𝑂𝑝𝑗 𝑓(𝑛𝑒𝑡𝑝𝑗 ) 𝑊𝑘𝑗 𝛿𝑝𝑘 𝑂𝑝𝑗 𝐸𝑟𝑟𝑜𝑟 𝑃𝑟𝑜𝑝𝑎𝑔𝑡𝑖𝑜𝑛 𝑡𝑜 𝑏𝑎𝑐𝑘 𝑛𝑒𝑡𝑝𝑗 ∑ 𝑊𝑗𝑖 𝑂𝑝𝑖 𝑗 𝑓 (𝑛𝑒𝑡𝑝𝑗 ) Error 𝛿𝑝𝑗 𝑓 𝑝𝑗 ∑𝛿𝑝𝑘 𝑊𝑘𝑗 𝑊𝑗𝑖 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑓𝑙𝑜𝑤 𝑓𝑟𝑜𝑚 𝑖𝑛𝑝𝑢𝑡 𝑡𝑜 𝑜𝑢𝑡𝑝𝑢𝑡 𝛿𝑝𝑗 Input 𝑘 Fig 4.2 Artificial Neural Networks: System Identification and Controls 35 4. Back Propagation Algorithm 4.3 Back Propagation Algorithm & its Flow Chart 4.3.1 Algorithm 1) For all connections, the weight factors among layers can be initialized by small random numbers using a random number generator (like standard normal distributed random numbers). 2) Define a counter for number of iterations and read the first value as n=1 3) Define a counter for number of pattern and read the first value as p=1. 4) Read the first pattern to be executed 5) Using this first pattern, calculate the outputs of neuron by equations 4.2, 4.3, 4.4 & 4.5 6) Update the weights between the layers using equations 4.20 & 4.43 7) Repeat the calculations between steps 3 to 6 up to the end of the pattern set. 8) For nth iteration, calculate the error function E(n) ( ) ∑ ∑ ∑( 9) Consider a very small number , like ) 4.44 and if it provides the following condition for error function (energy function) | ( ) ( )| 4.45 10) Then go to stop and save the all updated weight factors. In opposite case taking the next iteration number, go to step 3 and repeat the operations between the steps 3 to 9 until reaching the acceptable numerical value of the parameter for error function. Artificial Neural Networks: System Identification and Controls 36 4. Back Propagation Algorithm 4.3.2 The Flow Chart of BP-Algorithm Start (1) Initial values of weights (2) Counter for iteration n=1 (3) Counter for pattern p=1 (4) Read the first pattern p=p+1 (5) Calculate the outputs for neurons n=n+1 (6) Update the weight for / between layers No (7) p=P? Yes (8) Calculate E(n) No (9) |𝐸(𝑛) 𝐸(𝑛 )| 𝜀 Yes Stop Fig 4.3 Artificial Neural Networks: System Identification and Controls 37 5. Recurrent Neural Networks Recurrent Neural Networks 5.1.1 Recurrent Neural Networks (RNN) A Recurrent Neural Network has a feedback connection from output layer or hidden layers. In this case depending on the feedback type it is called as: JORDAN Type RNN ELMAN Type RNN 5.1.2 Jordan Type RNN In the Jordan-type RNN the feedback connections come from the output layer to input side. Where have a special input layer or nodes. This layer is called as Context Layer. In this manner this is little bit different from the ordinary inputs nodes. Hidden Layer Input Layer Output Layer Context Layer 𝑧 Feedback Connection s 𝑧 Here, 𝑧 is the delay element. This topology uses the B.P. algorithm for training and learning process. 𝑧 Fig 5.1 Artificial Neural Networks: System Identification and Controls 38 5. Recurrent Neural Networks 5.1.2 Jordan Type RNN In this case, the feedback connections are provided from the hidden layer(s). Consider a simple case having only one hidden layer as given below: Input Layer Hidden Layer Output Layer Context Layer Feedback Connection s 𝑧 𝑧 𝑧 Fig 5.2 This topology also uses the B.P. algorithm for learning and training steps. In other words these topologies or RNN’s are known as Dynamical Neural Networks. As a comparison, ELMAN type RNN has an advantage over the JORDAN type because it uses the feedback connection from hidden nodes and it uses the extracted information one more time. These topologies can be used in fault detection problems successfully. Artificial Neural Networks: System Identification and Controls 39 6. System Identification and Use of the Neural Network System Identification and Use of the Neural Network 6.1.1 System Identification (Modeling) Consider a general linear model: System 𝑥[𝑛] 𝑦[𝑛] Fig 6.1 [ ] [ ] [ ] [ ] [ [ ] ] [ ] [ ] 6.1 N is System order Or [ ∑ ] [ ∑ ] 6.2 This is the linear combination of input and output pairs. Here, . If so it is written as: [ ] ∑ [ ] ∑ [ ] 6.3 Where the main problem is the determination of the system parameters and also the determination of correct system order N. 6.1.2 For Stochastic System Stochastic system model can be represented by following block diagram. System 𝜀𝑡 𝑦𝑡 Fig 6.2 Where : Independent random error { } : set of the random error Statistically { } ( , ) Artificial Neural Networks: System Identification and Controls 40 6. System Identification and Use of the Neural Network Hence observing the system outputs (current and former values of the system outputs) the modeling task can be used to characterize the rpcess. | | 6.4 Where, : Model output : Actual output For auto regression process, 6.5 Here, can be interpreted as a white noise process. For determination of the system parameters like , , , , some special techniques / methods are used. In this manner the most famous one is Yule Walker method (equation). Here , , are called as Auto-Regressive (AR) parameters. The Yule Walker equation is given as: 𝑅 𝑅 𝑅 𝑅 𝑅𝑝 𝑅𝑝 𝑎 𝑎 𝑅𝑝 𝑅𝑝 𝑅 𝑎𝑝 𝑥 𝑥 𝑥 𝑥𝑁 Auto correlation 𝑥 𝑥 Matrix = _ 𝑅 𝑅 6.6 𝑅𝑝 Coefficients Hence, each data sample can be predicted from its predecessors: ̂ ∑̂ 6.7 Then the residue is ̂ and it becomes: ̂ ̂ 6.8 For this computation the system order P can be calculated by Akaik’s Information Criteria (AIC)3. In this computation matrix elements is given as: ∑ 3 6.9 See Appendix 2 Artificial Neural Networks: System Identification and Controls 41 6. System Identification and Use of the Neural Network Finally, if we determine the system order and system parameters, the approach is known as “Identification”. 6.1.3 Auto-Regressive-Moving-Average (ARMA) – Process (Model) 𝜀𝑡 ARMA Process 𝑦𝑡 Fig 6.3 6.10 Where, ( , ) 6.11 is white noise Or ∑̂ ∑ 6.12 Here, p: Order 6.2.1 Use of the Neural Networks (NNs) in System Modeling. Here NNs are feed-forward networks which use the Back Propagation algorithm. 𝑥𝑡 Physical System 𝑥𝑡 Physical System 𝑦 𝑦 Here 𝑒 𝑦 𝑦 𝑦 𝑁𝑁-Ouput 𝑦 𝑁𝑁-Ouput 𝑦 NN Fig 6.4 Artificial Neural Networks: System Identification and Controls 42 6. System Identification and Use of the Neural Network If at the end of the learning procedure the modeling error reach to an acceptable value then Neural Network topology will emulate the actual physical system. For this reason system behaves as an observer. 6.3 Applications of Adaptive Neural Networks 6.3.1 Adaptive Process and its Components Input Process Output Adaptive Algorithm Performance Calculation Fig 6.5 In this manner, an artificial neural network is an adaptive system. Now we can define the configuration of the Adaptive Neural Network Systems. 1. 2. 3. 4. System Identification and Modeling Inverse Modeling Adaptive Interference Cancelling Adaptive Prediction Artificial Neural Networks: System Identification and Controls 43 6. System Identification and Use of the Neural Network 6.3.2 System Identification and Modeling A simple modeling technique is shown in the following block diagram. System + NN 𝑒 Fig 6.6 Modeling system with delay and noise. 𝑁𝑜𝑖𝑠𝑒 System + + 𝑁𝑜𝑖𝑠𝑦 𝑂𝑢𝑡𝑝𝑢𝑡 + NN 𝑒 Fig 6.7 Where Artificial Neural Networks: System Identification and Controls 44 6. System Identification and Use of the Neural Network : Difference or error 6.3.3 Inverse Modeling A simple inverse modeling system is given as: 𝐼𝑛𝑝𝑢𝑡 System NN Fig 6.8 By adding delay and noise, we have: 𝑁𝑜𝑖𝑠𝑒 System NN Fig 6.9 Artificial Neural Networks: System Identification and Controls 45 6. System Identification and Use of the Neural Network 6.3.4 Adaptive Noise Cancelling System 𝑆 𝑁 𝑁𝑜𝑖𝑠𝑦 𝑆𝑖𝑔𝑛𝑎𝑙 System 𝑆 𝑁 𝑌 Fig 6.10 6.3.5 Adaptive Prediction 𝑆(𝑡) 𝑆(𝑡 SLAVE NN ) 𝐼𝑛𝑝𝑢𝑡, 𝑆(𝑡) NN 𝑆(𝑡) 𝑆(𝑡 ) Fig 6.11 Artificial Neural Networks: System Identification and Controls 46 6. System Identification and Use of the Neural Network 6.4.1 Non-Parametric Identification Non-parametric identification uses “Black-Box” model of the input-outputs. Neural Networks for non-parametric process models can be interpreted as a non linear extension of the system identification problems. As an example the adaptive transverse filter structure can be implemented by a neural network and it becomes a finite impulse response (FIR) network as a non-parametric identification system. FIR structure for a given input-output pair like [ ] ∑ and [ is given by ] 6.13 For a single input adaptive transversal filter with a target output. 𝑥𝑘 𝑥𝑘 𝑧 𝑥𝑘 𝑥𝑘 𝑧 𝑥𝑘 𝑚 𝑀 𝑧 𝐼𝑛𝑝𝑢𝑡 𝑦𝑘 𝐴𝑑𝑎𝑝𝑡𝑖𝑣𝑒 𝐴𝑙𝑔𝑜𝑟𝑖𝑡 𝑚 𝐸𝑟𝑟𝑜𝑟 𝜀𝑘 𝑂𝑢𝑡𝑝𝑢𝑡 𝑇𝑎𝑟𝑔𝑒𝑡 𝑇𝑟 Fig 6.12 Here, (or [ ]) is system output. In FIR structure parameters are represented by the weights in this model and these parameters can be adjusted by the adaptive algorithm. Artificial Neural Networks: System Identification and Controls 47 6. System Identification and Use of the Neural Network 6.4.2 Parametric Identification Neural Networks is trained through supervised learning can be used for both identification and parameter estimation. In this manner this approach is a little bit different from the non-parametric application. Using the advantage of the parametric identification modeling by NNs we get the following figures for Dynamic Models 𝑥𝑘 𝑧 𝑧 𝐷𝑒𝑙𝑎𝑦𝑒𝑑 𝐼𝑛𝑝𝑢𝑡𝑠 𝑁𝑁 𝑇𝑜𝑝𝑜𝑙𝑜𝑔𝑦 𝑧 𝑧 𝐷𝑒𝑙𝑎𝑦𝑒 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 Fig 6.13: Neural Network with time delayed inputs Sometimes a dynamical neural network with time delayed recurrent inputs can be used as below: 𝑥𝑘 𝑦𝑘 𝑧 𝑧 𝑧 𝑧 𝐷𝑒𝑙𝑎𝑦𝑒 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 Fig 6.14 Artificial Neural Networks: System Identification and Controls 48 6. System Identification and Use of the Neural Network In this application, output of the Neural Network is current value of the , but it uses the previous values of as inputs. It is used for the dependency of the current values of on the former values. This approach looks like Infinite Impulse Response (IIR) structure or filter of the system. As a different configuration of the IIR-structure both of the delayed inputs and outputs can be applied as input information to the neural topology, of course, here the previous outputs of the neural network is used as feedback connection. For this purpose, the following configuration is given: 𝑥𝑘 𝑧 𝐷𝑖𝑟𝑒𝑐𝑡 𝐼𝑛𝑝𝑢𝑡𝑠 𝑧 𝑦𝑘 𝑧 𝑧 𝑅𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝐼𝑛𝑝𝑢𝑡𝑠 𝑧 𝐹𝑢𝑙𝑙𝑦 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑁𝑁 𝑇𝑜𝑝𝑜𝑙𝑜𝑔𝑦 𝑧 Fig 6.15: Dynamic NN with direct and recurrent inputs This is the dynamic neural network with time delayed direct and recurrent inputs. Artificial Neural Networks: System Identification and Controls 49 6. System Identification and Use of the Neural Network 6.5 Models of Dynamical Systems To get the models of dynamic system neural network identification structures are used in the configuration of an observer, like a connected model to the real physical structure. 𝑃 𝑦𝑠𝑖𝑐𝑎𝑙 𝑆𝑦𝑠𝑡𝑒𝑚 𝑧 𝑁𝑁 𝑧 𝑧 Fig 6.16: Non-recurrent parallel identification model Artificial Neural Networks: System Identification and Controls 50 7. Neural Network for Applications of Control Theory Neural Network for Applications of Control Theory 7.1 Classical Controllers A classical block diagram of a control application is given as below: 𝑟(𝑡) 𝑢(𝑡) 𝑒(𝑡) 𝑦(𝑡) 𝑃 𝑦𝑠𝑖𝑐𝑎𝑙 𝑆𝑦𝑠𝑡𝑒𝑚 𝐶𝑜𝑛𝑡𝑜𝑟𝑙 𝑈𝑛𝑖𝑡 𝐹𝑒𝑒𝑑𝑏𝑎𝑐𝑘 Fig 7.1: Non-recurrent parallel identification model Where, ( ( ( ( ) : reference signal ) ( ) ( ): error signal ) : control signal ) : reference signal There three types of controller: 1. PID (Proportional-Integral-Derivative) Controller 2. PI (Proportional-Integral) Controller 3. PD (Proportional-Derivative) Controller In this manner, PID controller is defined as: PID: ( ) ( ) ( ) ∫ ( ) 7.1 Where, : Continuous time , : Controller parameters for proportional, integral and derivative respectively. Also other controllers are shown as: PI: PD: ( ) ( ) ( ) ( ) ∫ ( ) ( ) 7.2 7.3 Applying the Laplace transform on these equations: Artificial Neural Networks: System Identification and Controls 51 7. Neural Network for Applications of Control Theory PID: ( ) ( ) ( ) ( ) 7.4 With similar way: PI: ( ) ( ) ( ) PD: ( ) ( ) 7.5 ( ) 7.6 Here the most important problem is to determine the controller‘s’ parameters. One of the most famous methods is “Ziegler Nichole’s” method. Also using the equalities for each controller type in‘s’ domain. The transfer functions of the controller are defined: PID: ( ) ( ) ( ) PI: PD: 7.7 ( ) ( ) ( ) 7.8 ( ) ( ) ( ) 7.9 Also considering the transfer function of the physical system like diagram become: 𝑅(𝑠) ( ) the first block 𝑈(𝑠) 𝐸(𝑠) 𝑌(𝑠) 𝐺𝑠 (𝑠) 𝐺𝑐 (𝑠) 𝑈𝑛𝑖𝑡 𝐹𝑒𝑒𝑑𝑏𝑎𝑐𝑘 Fig 7.2 Where, ( ) and ( ) are also representation ( ) (reference signal) and ( ) (system output) in s-domain, respectively. After that we can mention about the determination of the controller parameters. In this manner “Ziegler Nichol’s” method is given for PID controller as follows: . Artificial Neural Networks: System Identification and Controls 7.10 52 7. Neural Network for Applications of Control Theory 7.11 7.12 Where, is gain at which the proportional system oscillates and frequency. is the oscillation Either root-locus or Bode-plots can be used to determine and . For example, a root locus is obtained from plant transfer function. The gain at which the root locus crosses the -axis is , and the frequency on the - axis gives . Alternatively, Bode plots are plotted for the given plant (physical system) transfer function. The (Gain Margin) is determined at the frequency . ⁄ ( ) 7.13 7.14 Example For a given plant ( ) ( 3 ) Find the PID Controller parameters by the Ziegler-Nichol’s Method Solution: Using following MALTLAB commands: s = tf(‘s’); g=400/(s*(s^2+30*s+200)); sisotool(g) Some system parameters are Closed loop poles { . . , . } Gain Cross-over frequency . 5 rad/sec GM (Gain Margin) dB PM (Phase Margin) . 5 degree Parameters using equations 7.13 and 7.14 are respectively. PID parameters using equations 7.10, 7.11, 7.12 are respectively. . 5 and . Artificial Neural Networks: System Identification and Controls , . . 5, , . , 53 7. Neural Network for Applications of Control Theory The closed loop step response has no overshoot and the steady state error to a unit ramp input 0.5. The root locus and bode plots for this plant is given below: Fig 7.3: Root Locus Fig 7.4: Bode Plots Artificial Neural Networks: System Identification and Controls 54 7. Neural Network for Applications of Control Theory Effects of each controller in table 7.1. , and on a closed system are summarized as shown Rise Time Overshoot Settling Time Decrease Increase Small Change Steady State Error Decrease Decrease Increase Increase Eliminate Small Change Decrease Decrease Small Change Table 7.1 The computation of the parameters of the controller is some time difficult and to eliminate this difficulty neural network approach presents an alternative solution to the control applications. In this sense, neuro-controller structures (NC) are defined. Here the NC-approach is a learning system and it is independent from the calculation of the controller parameters. As a primitive (basis) model: 𝑟(𝑡) 𝑒(𝑡) 𝑃𝐼𝐷 𝑁𝑁 𝑢(𝑡) 𝑆𝑦𝑠𝑡𝑒𝑚 𝑦(𝑡) 𝑀𝑜𝑑𝑒𝑙 𝑁𝐶 Fig 7.5 Here we have the PID-controller structure and training the NN with the input and output (target for the NN) pairs of the PID-controller of the NN-topology plays role of the classical controller PID. This application is not independent for the classical controller structure but it provides the reliability of the controller. I can be accepted as a redundant system. Artificial Neural Networks: System Identification and Controls 55 7. Neural Network for Applications of Control Theory 7.2 Neuro-Controller Applications 𝑟(𝑡) 𝑒(𝑡) 𝑦(𝑡 𝑢(𝑡) 𝑁𝑒𝑢𝑟𝑜 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑙𝑒𝑟 ) 𝑃 𝑦𝑠𝑖𝑐𝑎𝑙 𝑆𝑦𝑠𝑡𝑒𝑚 𝑦(𝑡) 𝑧 𝑧 𝑦(𝑡 ) 𝑧 𝐷𝑒𝑙𝑎𝑦 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 Fig 7.6 𝑧 𝑧 𝑟(𝑡) 𝑒(𝑡) 𝑁𝑒𝑢𝑟𝑜 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑙𝑒𝑟 𝑢(𝑡) 𝑃 𝑦𝑠𝑖𝑐𝑎𝑙 𝑆𝑦𝑠𝑡𝑒𝑚 𝑦(𝑡) 𝑧 𝑧 Fig 7.7 If we have mathematical model of the physical system This model is taught to neural to NN and then the trained NN is replaced to the configuration as a neuro controller. Artificial Neural Networks: System Identification and Controls 56 7. Neural Network for Applications of Control Theory 7.3 Self Tuning PID Controller by Neural Network 𝑇𝐷 𝑟(𝑡) 𝑇𝐷 𝐾𝑃 𝑁𝑁 𝐾𝐷 𝑇𝐷 𝑇𝑖𝑚𝑒 𝐷𝑒𝑙𝑎𝑦 𝐾𝐼 𝑢(𝑡) 𝑧 𝑧 𝑧 𝑃𝐼𝐷 𝑃 𝑦𝑠𝑖𝑐𝑎𝑙 𝑆𝑦𝑠𝑡𝑒𝑚 Fig 7.8 𝑇𝐷 𝑟(𝑡) 𝑁𝑁 𝑃𝐼𝐷 𝑇𝐷 𝑆𝑦𝑠𝑡𝑒𝑚 𝑧 Fig 7.9 𝑟(𝑡) 𝑇𝐷 𝑁𝑁𝐶 (𝑁𝑁 𝑃𝐼𝐷) 𝑢(𝑡) 𝑆𝑦𝑠𝑡𝑒𝑚 𝑇𝐷 Fig 7.10 Artificial Neural Networks: System Identification and Controls 57 7. Neural Network for Applications of Control Theory 7.4 Sensors Validation Problem Consider three sensors like A, B and C. 𝑥 𝐴 𝑎 𝑁𝑁 𝑦 𝑏 𝐵 𝑧 𝐶 𝑎 𝑒𝑎 (𝑒𝑟𝑟𝑜𝑟) 𝑏 𝑒𝑏 (𝑒𝑟𝑟𝑜𝑟) 𝑐 𝑒𝑐 (𝑒𝑟𝑟𝑜𝑟) 𝑐 Fig 7.11 1. NN is trained by the input and output pairs of all sensors (like x-a, y-b and z-c) 2. In any anomaly case NN will produce a different error level at the output of the NN 3. Hence we get the failure case at the NN output which is connected with the actual sensor output. 7.4 Conditional Monitoring Application of the NN as a Neuro-Detector 𝑥(𝑡) 𝑃 𝑦𝑠𝑖𝑐𝑎𝑙 𝑆𝑦𝑠𝑡𝑒𝑚 𝑦(𝑡) |𝑌| 𝐹𝐹𝑇 (𝑃𝑆𝐷) |𝑌| |𝑌(𝑗𝜔)| 𝑒𝑟𝑟𝑜𝑟 |𝑌(𝑗𝜔)| 𝐴𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑆𝑎𝑚𝑒 𝑎𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒 𝑤𝑖𝑡 𝑠𝑎𝑚𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 Fig 7.12 Artificial Neural Networks: System Identification and Controls 58 7. Neural Network for Applications of Control Theory For different system behavior, we get different error signal. Hence, we detect the different frequency component by the error variation in spectral domain. 𝑖𝑛𝑝𝑢𝑡 𝑒𝑟𝑟𝑜𝑟 𝑂𝑢𝑡𝑝𝑢𝑡 𝑁𝑁 Fig 7.13 𝑒𝑟𝑟𝑜𝑟 𝑖𝑛𝑝𝑢𝑡 𝑁𝑁 Fig 7.14 Artificial Neural Networks: System Identification and Controls 59 Appendix-2 Gaussian Noise: Expression for Gaussian Noise is given as: ( , ) Where: : Gain Factor : Mean Value Artificial Neural Networks: System Identification and Controls i Appendix-2 Autoregressive (AR) Model In statistics and signal processing, an autoregressive (AR) model is a type of random process which is often used to model and predict various types of natural and social phenomenon. AR models are shown in the form of AR(p) with p being the order of the modeled system. The AR model is one of a group of linear prediction formulas that attempt to predict an output of a system based on the previous outputs and inputs as shown in the following equation. ∑ (A) or ∑ Where, , ,……, are the parameter of the model, c is a constant and white noise. The constant term is term is omitted by many authors for simplicity. (B) is An AR model can thus be viewed as the output of an all-pole infinite impulse response filter whose input is white noise. Some constraints are necessary on the values of the parameters of this model in order that the model remains wide-sense stationary. For example, processes in the AR(1) model with | | are not stationary. More generally, for AR(p) model to be wide∑ sense stationary, the roots of the polynomial must lie within the unit circle, i.e. each root of must satisfy | | . A model which depends only the previous outputs of the system is called an autoregressive model (AR), while model which depends only on the inputs to the system is called moving average model (MA), and of course a model based on both inputs and outputs is an autoregressive-moving-average (ARMA) model. Note that by definition, the AR model has only poles while MA model has only zeros. Several methods and algorithms exist for calculation of the coefficients of the AR model, all of which can be implemented using matlab command ‘ar’. Yule-Walker Algorithm A number of techniques exist for computation of AR coefficients. Two main categories among them are least squares and Burg method. Most common least squares method is based upon the Yule-Walker equations. Matlab has a wide range of supported techniques, note that when comparing algorithms from different sources there are two common basis, first is whether or not the mean is removed from the series, the second Artificial Neural Networks: System Identification and Controls i Appendix-2 is the sign of the coefficients returned (this depends on the definition and is fixed by simply inverting the sign of all coefficients). The most common method for deriving the coefficients involves multiplying the AR model given in equation-(B) above by , taking the expectation values and by normalizing gives a set of linear equations called the Yule-Walker equations that can be derived as follows: (i) Multiplying the AR model in equation-(B) above by and taking c=0; ∑ (ii) (C) Taking the expectation on both sides. { } ∑ { } { } (D) } & { } involves the data and Since the expectations { shifted version of the data which is the definition of the auto-covariance { } is zero for past values of i.e. & respectively. While with (delay) greater than zero, as past value of the output is unrelated } is to the present value of the noise and { for So for , we have: { } { ∑ } (E) } (F) Or ∑ { } { By expanding above expression we have: 3 [ 3 … 3 ][ Artificial Neural Networks: System Identification and Controls 3 ] (G) [ ] ii Appendix-2 Burg Algorithm The parameter estimation approach that is nowadays regarded as the most appropriate is known as Burg’s method. In contrast to the least-squares and Yule-Walker method, which estimate the autoregressive parameters directly, Burg’s method first estimates the reflection coefficients, which are defined as the last autoregressive- parameter estimate for each model order . From these, the parameter estimates are determined using the Levinson-Durbin algorithm. The reflection coefficients constitute unbiased estimates of the partial correlation coefficients. Usually, these estimation methods lead to approximately the same results for the autoregressive parameters. Once these have been estimated from the time series , the autoregressive model can be applied to an independent prediction realization of the same stochastic process. In terms of , the AR process given in can be written as: ……… (H) In which the innovation process is statistically identical to the innovation process of Yule-Walker method. The corresponding AR model can be written as: ̃ ̃ ……… ̃ ̃ (I) In which are the AR parameters estimated from realization of and are the estimated innovations. Each data sample can be estimated from its predecessors. ̃ ∑̃ (J) The difference between the measured value and the estimated value is now defined as the prediction error: ̃ ̃ (K) The prediction error is therefore equal to the estimated innovation. Each prediction error can be calculated once the actual value of the data point is measured. A clear distinction should be made between the residue and the prediction error and their variances is a measure for the fit of the AR model to those data that have been used for the estimation of the AR parameters, and can be estimated from the realization of , which is used for the parameter estimation: ̃ (̃ ) ∑ ( ̃) (K) For the prediction of future data, instead of the residual variance, the variance of the ( ̃ ) is essential. If the independent prediction realization contains prediction error Artificial Neural Networks: System Identification and Controls iii Appendix-2 data samples, the prediction error variance can be estimated from the sample variance: ̃ ( ̃) ∑ ( ̃) (L) Akeike’s Information Criteria The Akaike Information Criterion (AIC) determines the model order by minimizing as information theoretic function of , AIC(p). For an AR process with Gaussian statistics, AIC(p) is defined as: ( ) ( ( )) (M) ( ) is the estimated variance of the white Where is the number of samples, and driving noise (i.e., the prediction error), a decreasing function of . The term is a “penalty” for the use of extra AR coefficients that do not substantially reduce the prediction error. The “AIC minimum” is only one of many criteria proposed for the selection of the AR order. Another popular criterion is the Final Prediction Error. Artificial Neural Networks: System Identification and Controls iv References References URL 1: http://science.kennesaw.edu/~jdirnber/Bio2108/Lecture/LecPhysio/ PhysioNervous.html URL 2: http://cla.calpoly.edu/~cslem/101/4-B.html URL 3: http://superchargeretirementincome.com/wpcontent/uploads/2013/01/NormalDistributionWithPercentages1.png 1. Robert Stufflebeam, Neurons, Synapses, Action Potentials, and Neurotransmission, http://www.mind.ilstu.edu/curriculum/neurons_intro/neurons_intro.php Artificial Neural Networks: System Identification and Controls i