Purpose of this paper and General introduction to Adaptive Critic

advertisement
Adaptive Critic Based Neural Networks
For Nonlinear Automatic Flight Control Systems
Sergio Esteban Roncero and Dr. S. N. Balakrishnan
Department of Mechanical and Aerospace Engineering and Engineering Mechanics
University of Missouri-Rolla
Rolla, Missouri, USA
In this study an adaptive critic based neural network was developed to obtain near
optimal control laws for a nonlinear automatic flight control system. This study uses
existing paper's results written by Garrad and Jordan1 to compare the accuracy of the
adaptive critic based neural network results. Reducing the altitude loss during stall and
the increasing of the magnitude of the angle of attack from which the aircraft can recover
from stall becomes a really complex task due to the nonlinearities of the dynamic plant
and the different initial conditions.
The goal of this study is to demonstrate that for such complex problems the use of proper
trained adaptive critic based neural networks can effectively reproduce the control
solution for a wider range of initial conditions, including the stall and no-stall conditions.
This will considerably reduce the time required to compute the control solution since
Neural Networks can easily be adapted to different plants, or in its similitude different
aircraft models by simply changing the plant of the neural network routines.
The study is divided in three different sections.
- Introduction to the problem: Nonlinear Automatic Flight Control Systems.
- Purpose of this paper and General introduction to Adaptive Critic Neural
Networks.
- Preliminary analysis and results.
Introduction to the problem: Nonlinear Automatic Flight Control Systems.
Garrad and Jordan1 express the nonlinear equations of motion that describe longitudinal
motion of the F-8 Crusader in the form:

X  AX   ( X )  b
where X describes the states of the system:
 x1 
X   x 2 
 x 3 
(1)
(2)
where x1 is the angle of attack, x2 is the pitch angle, and x3 the pitch rate. Similarly A
represents the linearized plant of the system:
1
1 
 0.877 0

A 0
0
1 
 4.208 0  0.396
(3)
and b represents the forcing function of the system:
  0.215 

b  
0

 20.967 
(4)
The modeled nonlinearities of the system are described by the extra term  ( X )
 x1 2 x 3  0.088 x1 x 3  0.019 x 2 2  0.47 x1 2  3.846 x1 3 


(X )  
0

2
3



0
.
47
x

3
.
564
x
1
1


(5)
Hence the resulting nonlinear equations of motion are described

2
2
2
3
1   x1   x1 x3  0.088 x1 x3  0.019 x 2  0.47 x1  3.846 x1    0.215 
 x1   0.877 0

 

x    0
0
1   x 2   
0
0


 2  
2
3


 20.967
 0.47 x1  3.564 x1
 x3   4.208 0  0.396  x3  

 
(6)
The paper's control selected to minimize the quadratic performance index is:

(7)
0
0 
0.25

Q 0
0.25
0 
 0
0
0.25
(8)
J

1  T
X QX  r 2 dr

0
2
Where the quadratic matrix, Q, is defined as:
And the regulator matrix (r) is R=1. The linearized model of the equations of motion
yields:
 
1   x1    0.215 
 x1   0.877 0

x    0
0
1   x 2   
0
(9)

 2  
 x 3   4.208 0  0.396  x 3   20.967 
 
2
and the corresponding linear control law obtained by solving the matrix Riccati equation
is:
(10)
  0.053x1  0.5x 2  0.521x3
Garrard and Jordan1 introduce a second and third order controller to reduce the loss of
attitude during stall.
2  0.053x1  0.5x2  0.521x3  0.04 x12  0.048x1 x2
(11)
nd
3  0.053x1  0.5x2  0.521x3  0.04 x  0.048x1 x2  0.374 x  0.312 x x
rd
2
1
3
1
2
1 2
(12)
The paper here referenced shows in its conclusion and remarks that the use of the nonlinear controllers along with the non-linear plant increases significantly the range of
recoverable stall.
Purpose of this paper and General introduction to Adaptive Critic Neural Networks
The purpose of this study is to demonstrate that for such complex problems as the ones
described on the introduction section, the use of proper trained adaptive critic based
neural networks can effectively reproduce the control solution for a wider range of initial
conditions, including the stall and no-stall conditions. This will considerably reduce the
time required to compute the control solution since Neural Networks can easily be
adapted to different plants, or in its similitude different aircraft models by simply
changing the plant of the neural network routines.
An early mention needs to be made to two persons that have been made possible the
introduction to what Adaptive Critic Based Neural Networks (ACNN) for control means
and the potential that an architecture like this has in order to be able to simulate and
formulate near-optimal control theory for any given system. These two persons are Dr. S.
N. Balakrishnan, my advisor, and Victor Lynn Biega. In 1995 they wrote the paper that
set the basis for the study of a lot of investigations conducted by a lot of graduate
students like myself. Their paper has become the initial stone for the construction of
different architectures here at the University of Missouri-Rolla, and other universities, to
be able to model a wide variety of systems and problems. The neural network notation
used in this introduction and throughout this paper is obtained from their papers
“Adaptive Critic Based Neural Network for Control” 19942.
The adaptive critic based neural networks architecture presented in this study tries to
adapt the use of an emerging area of study, Neural Network based architectures, by using
dynamic programming methodology and Hamiltonian formulation to develop near
optimal control laws for the problem here presented. Neural Networks try to emulate the
complex behavior of the biological neural network that govern most of every day human
activities: reading, breathing, motion and thinking. The next section describes the general
basis of the Adaptive Critic Neural Network programming theory.
Statement of the Problem and Hamiltonian Formulation Theory
3
The use of the Hamiltonian formulation allows the use of the continuous time nonlinear
equations of motion as described in equation (6).

2
2
2
3
1   x1   x1 x3  0.088 x1 x3  0.019 x2  0.47 x1  3.846 x1 
 x1   0.877 0


x    0
0
1   x2   
0

 2  
2
3

 0.47 x1  3.564 x1
 x3   4.208 0  0.396  x3  

 
  0.215 

 
0

 20.967 
(6)
Similarly the Hamiltonian for such system is defined as
H  1 X T QX  1 u T Ru  T [ AX   ( X )  bu ]
2
2
(13)
which expanded becomes equation (14)
2
2
2
H  1 [0.25 x1  0.25 x 2  0.25 x 3 ]  1 u 2 
2
2
2
2
2
3
  0.877 0

1   x1   x1 x 3  0.088 x1 x 3  0.019 x 2  0.47 x1  3.846 x1    0.215  


 
u 
T   0
0
1   x 2   
0
0

 
2
3
  4.208 0  0.396  x  
  20.967  

0
.
47
x

3
.
564
x
3
1
1
  
 
 

The differential co-states equations of the system are defined:

H

(14)
X
and the co-state differential equations become:

1  

 
H
2
2
 0.25 x1  1  0.877  2 x1 x3  0.088 x3  2(0.47 x1 )  3(3.846 x1 )  3  4.208  2(0.47 x1 )  3(3.564 x1 )
x1

2  

3  
H
 0.25 x 2   2  2(0.019 x 2 ) 
x 2


(16)

H
2
  0.25 x3  1 1  x1  0.088 x1   2   3 0.396
x3
which is subject to the cost function

1
J   [ X T QX  r 2 ]dr
20
 (15)

(17)
(18)
4

From equations (6) and (15) through (17) the Meta State Vector Z can be formed:
  f 

(18)
Z   X    H   F (Z )
    X 
U ( x(t ), u (t )) is the utility function denoting the cost going from t to time t+1
U ( x(t ), u(t ))  x T Qx  r 2
(19)
The co-state  ( x (t )) is defined as
 ( x(t )) 
J ( x(t ))
x(t )
(20)
then the optimal critic,  ( x(t )) * is defined by
 ( x(t ))* 
U ( x(t ), u (t ))
x(t  1) u ( x(t ))  U ( x(t ), u (t ))
x(t  1)  (21)


  ( x(t  1))

  ( x(t  1))
x(t )
x(t )
x(t ) 
u (t )
u (t ) 
where from the determined system it can be expressed as:
 ( x(t ))*  2Qx (t )   ( x(t  1)) A 
u ( x(t ))
2 Ru (t )  ( x(t  1)) B
x(t )
(22)
It can be seen that the co-state equation develops backwards in time, and as it will be

shown later it is obtained from back integrating the differential Meta State Vector Z .
The goal is to determine the control law of the continuous system throughout
Hamiltonian formulation. The optimal control law is determined by the Bellman’s
optimality equation3.
J ( x(t ))
U ( x(t ), u (t ))
x(t  1)
(23)
0
  ( x(t  1))
u (t )
u (t )
u (t )
0  2 Ru (t ) *  ( x(t  1)) B
(24)
thus the optimal control becoming
u(t )*  (2R) 1  ( x(t  1)) B
(25)
Action and Critic Neural Network Architecture
The Adaptive Critic Neural Network is a feed forward back-propagation architecture
divided in an Action Neural Network (Action NN) and a Critic Neural Network (Critic
NN). Each network has its own independent architecture characteristics but at the same
5
time their intrinsic relationship is a key point for obtaining the near optimal control law
for a given analyzed system. Both the Action NN and Critic NN, are consisted of two
hidden layers with hyperbolic tangent sigmoid transfer function (tansig), and an output
layer with a linear transfer function (purelin). The general structure can be seen in
Figure1. The output for both networks can be defined such:
a 3  W 3 f 2 (W 2 f 1 (W 1 p  b1 )  b 2 )  b 3
(26)
where for the Action NN the output becomes
2
u (t )  WA3 (
1 e
2
 2 (W A2 *(
1 e
1 x ( t ) b1 )
2 (WA
A
1)  b A2 )
 1)  bA3
(27)
where the upper script implies the layer and the subscript implies A for Action NN and C
for Critic NN. The Critic NN output becomes
 (t )  WC3 (
1 e
2
2
 2 (WC2 *(
1 e
1 x ( t ) b1 )
 2 ( WC
C
1)  bC2
)
 1)  bC3
(28)
Each one of the two hidden layers for both networks has l number of neurons defined by
the designer’s choice. The output layer of the Action NN has m neurons, where m is the
number of controllers, and the Critic NN output layer has n neurons, where n is the
number of states of the problem, in this case 3 neurons.
For the preliminary results of this paper, the Action NN has N3,4,4,1 architecture ,ie. 3
neurons corresponding to the three state inputs, 1 neuron corresponding to the single
control output and 4 neurons for the first and second hidden layers. The Critic NN has
N3,6,6,3 architecture ,ie. 3 neurons corresponding to the three state inputs and outputs and 6
neurons for the first and second hidden layers. The training algorithm is implememnted in
MATLAB using the Neural Network Toolbox.
Training of the Action and Critic Neural Network
Two networks, the Action NN and the Critic NN, are randomized and trained using the
algorithm:
1) The initial Critic NN is assumed to be optimal.
2) The initial output u(t), is obtained by feeding random values of the states X(t), to
the of the Action NN.
3) The continuous time nonlinear equations of motion (6) are used to integrate the
next state X(t+1) using the states X(t) and the initial output u(t) of the Action NN.
4) The Critic NN is feed the output form step 3, X(t+1), to calculate  (t  1) .
6
5) The Action NN is then trained using X(t) as input and the optimal control, u (t ) *
(26), as target.
6) Steps 1 through 6 are repeated until the desired level of accuracy for the Action
NN is achieved.
7) The Action NN is assumed to be optimal.
8) The initial output u(t), is obtained by feeding random values of the states X(t), to
the of the Action NN.
9) The continuous time nonlinear equations of motion (6) are used to back integrate

the Meta State Vector Z to obtain  ( x(t ))* by using as input the output from
step3, X(t+1), from step4  (t  1) , and the initial output u(t) from step8
10) The Critic NN is then trained using X(t) as input and  ( x(t ))* from step9 as
target.
11) Steps 7 through 10 are repeated until the desired level of accuracy for the Critic
NN is achieved.
Step 11 marks the end of one training cycle. The training cycles are continued until the
time where there is no acceptable change in the outputs of both the Action and Critic NN.
At this point the output u (t ) * of the action NN is the optimal control.
Preliminary analysis and results.
The response of the aircraft to the three different controllers derived by Garrard and
Jordan1 were tested against the preliminary Neural Network optimal control solution. The
three different controllers introduced by Garrard and Jordan1 are the solution of the linear
control law obtained by solving the matrix Riccati equation, and the second and third
order controllers:
  0.053x1  0.5x 2  0.521x3
2  0.053x1  0.5x2  0.521x3  0.04 x12  0.048x1 x2
nd
(10)
(11)
3  0.053x1  0.5x2  0.521x3  0.04 x12  0.048x1 x2  0.374 x13  0.312 x12 x2
rd
(12)
At the flight conditions considered for this paper, Mach=0.85 and 30,000 feet (9000 m),
the F-8 stalls when the angle of attack is 23.5º. Figure2 shows the time response for the
three states and the control, the tail elevator deflection for an initial angle of attack of
25.69º, and a pitch angle and pitch rate of 0º. The initial angle of attack of Figure2
corresponds to the last angle for which the linear controller (10) can recover from stall.
Beyond this angle the linear quadratic solution cannot effectively recover from the stall.
It can be seen that the Neural Network solution reaches equilibrium faster than any of the
three compared controllers. Figure3 shows the cost associated to recover from stall for
each one of the different controllers. It can be seen that the Neural Network solution has
the smaller cost associated with recovering from stall. Figure4 shows the time response
for the three states and the control, the tail elevator deflection for an initial angle of attack
of 25.99º, and a pitch angle and pitch rate of 0º. The initial angle of attack of Figure4
corresponds to the last angle for which the second order controller (11) controller can
recover from stall. Beyond this angle the second order controller solution cannot
7
effectively recover from the stall. It can be seen that the Neural Network solution reaches
equilibrium faster than any of the two remaining compared controllers.Figure5 shows the
cost associated to recover from stall for each one of the different controllers. It can be
seen that the Neural Network solution has the smaller cost associated with recovering
from stall. Figure 6 shows the time response for the three states and the control, the tail
elevator deflection for an initial angle of attack of 27º, and a pitch angle and pitch rate of
0º.The initial angle of attack of Figure6 corresponds to the last angle for which the third
order controller (12) controller can recover from stall. Beyond this angle the third order
controller solution cannot effectively recover from the stall. It can be seen that the Neural
Network solution reaches equilibrium faster than the remaining compared controller.
Figure7 shows the cost associated to recover from stall for each one of the different
controllers. It can be seen that the Neural Network solution has the smaller cost
associated with recovering from stall.
After this initial angle of attack of 27º, the none of the three controllers compared against
the Neural network solution are effective to recover from stall. Figure 8 shows the time
response for the three states and the control, the tail elevator deflection for an initial angle
of attack of 30º, and a pitch angle and pitch rate of 0º. It can be seen that the Neural
Network solution is able to recover from stall at this angle of attack. Figure9 shows the
cost associated to recover from stall for the Neural Network controller. Figure 10 shows
the time response for the three states and the control, the tail elevator deflection for an
initial angle of attack of 35º, and a pitch angle and pitch rate of 0º. It can be seen that the
Neural Network solution is able to recover from stall at this angle of attack. Figure11
shows the cost associated to recover from stall for the Neural Network controller. Beyond
the initial angle of attack of 35º, the Neural Network controller cannot recover from stall
without exceeding the maximum tail deflection of 25º. It has been prove the efficiency of
the Neural Network controller against three compared controllers to be able to recover
from stall. It has to be noted that the training of the Neural Network control here used is
far from being complete since the training region for the solution here presented is only
up to 15º. It is expected that once the range of the Neural Network controller is broaden
to the stall conditions range, that the effectiveness on such controller will be considerably
improved.
Future Work
Currently, the envelope of the training range is being increased to include the stall
conditions into the Neural Network training. Also robustness for systems with input
uncertainties, such the lag time between the pilot control input and the actual tail
deflection, are included into the Neural Network algorithm. Both of this improvements
will be included in the next revision of this paper.
8
Figures
Figure 1
9
Plots
Figure2
Figure3
10
Figure4
Figure5
11
Figure6
Figure7
12
Figure8
Figure9
13
Figure10
Figure11
14
Bibliography
Garrard William L. and Jordan, M. 1977, “Design of Nonlinear Automatic Flight
Control Systems.” Automatical . Vol 13, pp 497-505 Pergamon Press, 1977, Great
Britain.
2
Balakrishnan, S.N. and Biega, V., “Adaptive Critic Based Neural Networks for
Control.” Proceedings of the American Control Conference, 1995, Seattle, WA.
3
Bellman, R.E. ”Linear Equations and Quadratic Criteria: Introduction to the
Mathematical Theory of Control Process”. Vol. 1, New York Academic Press, 1997
1
15
Download