Adaptive Critic Based Neural Networks For Nonlinear Automatic Flight Control Systems Sergio Esteban Roncero and Dr. S. N. Balakrishnan Department of Mechanical and Aerospace Engineering and Engineering Mechanics University of Missouri-Rolla Rolla, Missouri, USA In this study an adaptive critic based neural network was developed to obtain near optimal control laws for a nonlinear automatic flight control system. This study uses existing paper's results written by Garrad and Jordan1 to compare the accuracy of the adaptive critic based neural network results. Reducing the altitude loss during stall and the increasing of the magnitude of the angle of attack from which the aircraft can recover from stall becomes a really complex task due to the nonlinearities of the dynamic plant and the different initial conditions. The goal of this study is to demonstrate that for such complex problems the use of proper trained adaptive critic based neural networks can effectively reproduce the control solution for a wider range of initial conditions, including the stall and no-stall conditions. This will considerably reduce the time required to compute the control solution since Neural Networks can easily be adapted to different plants, or in its similitude different aircraft models by simply changing the plant of the neural network routines. The study is divided in three different sections. - Introduction to the problem: Nonlinear Automatic Flight Control Systems. - Purpose of this paper and General introduction to Adaptive Critic Neural Networks. - Preliminary analysis and results. Introduction to the problem: Nonlinear Automatic Flight Control Systems. Garrad and Jordan1 express the nonlinear equations of motion that describe longitudinal motion of the F-8 Crusader in the form: X AX ( X ) b where X describes the states of the system: x1 X x 2 x 3 (1) (2) where x1 is the angle of attack, x2 is the pitch angle, and x3 the pitch rate. Similarly A represents the linearized plant of the system: 1 1 0.877 0 A 0 0 1 4.208 0 0.396 (3) and b represents the forcing function of the system: 0.215 b 0 20.967 (4) The modeled nonlinearities of the system are described by the extra term ( X ) x1 2 x 3 0.088 x1 x 3 0.019 x 2 2 0.47 x1 2 3.846 x1 3 (X ) 0 2 3 0 . 47 x 3 . 564 x 1 1 (5) Hence the resulting nonlinear equations of motion are described 2 2 2 3 1 x1 x1 x3 0.088 x1 x3 0.019 x 2 0.47 x1 3.846 x1 0.215 x1 0.877 0 x 0 0 1 x 2 0 0 2 2 3 20.967 0.47 x1 3.564 x1 x3 4.208 0 0.396 x3 (6) The paper's control selected to minimize the quadratic performance index is: (7) 0 0 0.25 Q 0 0.25 0 0 0 0.25 (8) J 1 T X QX r 2 dr 0 2 Where the quadratic matrix, Q, is defined as: And the regulator matrix (r) is R=1. The linearized model of the equations of motion yields: 1 x1 0.215 x1 0.877 0 x 0 0 1 x 2 0 (9) 2 x 3 4.208 0 0.396 x 3 20.967 2 and the corresponding linear control law obtained by solving the matrix Riccati equation is: (10) 0.053x1 0.5x 2 0.521x3 Garrard and Jordan1 introduce a second and third order controller to reduce the loss of attitude during stall. 2 0.053x1 0.5x2 0.521x3 0.04 x12 0.048x1 x2 (11) nd 3 0.053x1 0.5x2 0.521x3 0.04 x 0.048x1 x2 0.374 x 0.312 x x rd 2 1 3 1 2 1 2 (12) The paper here referenced shows in its conclusion and remarks that the use of the nonlinear controllers along with the non-linear plant increases significantly the range of recoverable stall. Purpose of this paper and General introduction to Adaptive Critic Neural Networks The purpose of this study is to demonstrate that for such complex problems as the ones described on the introduction section, the use of proper trained adaptive critic based neural networks can effectively reproduce the control solution for a wider range of initial conditions, including the stall and no-stall conditions. This will considerably reduce the time required to compute the control solution since Neural Networks can easily be adapted to different plants, or in its similitude different aircraft models by simply changing the plant of the neural network routines. An early mention needs to be made to two persons that have been made possible the introduction to what Adaptive Critic Based Neural Networks (ACNN) for control means and the potential that an architecture like this has in order to be able to simulate and formulate near-optimal control theory for any given system. These two persons are Dr. S. N. Balakrishnan, my advisor, and Victor Lynn Biega. In 1995 they wrote the paper that set the basis for the study of a lot of investigations conducted by a lot of graduate students like myself. Their paper has become the initial stone for the construction of different architectures here at the University of Missouri-Rolla, and other universities, to be able to model a wide variety of systems and problems. The neural network notation used in this introduction and throughout this paper is obtained from their papers “Adaptive Critic Based Neural Network for Control” 19942. The adaptive critic based neural networks architecture presented in this study tries to adapt the use of an emerging area of study, Neural Network based architectures, by using dynamic programming methodology and Hamiltonian formulation to develop near optimal control laws for the problem here presented. Neural Networks try to emulate the complex behavior of the biological neural network that govern most of every day human activities: reading, breathing, motion and thinking. The next section describes the general basis of the Adaptive Critic Neural Network programming theory. Statement of the Problem and Hamiltonian Formulation Theory 3 The use of the Hamiltonian formulation allows the use of the continuous time nonlinear equations of motion as described in equation (6). 2 2 2 3 1 x1 x1 x3 0.088 x1 x3 0.019 x2 0.47 x1 3.846 x1 x1 0.877 0 x 0 0 1 x2 0 2 2 3 0.47 x1 3.564 x1 x3 4.208 0 0.396 x3 0.215 0 20.967 (6) Similarly the Hamiltonian for such system is defined as H 1 X T QX 1 u T Ru T [ AX ( X ) bu ] 2 2 (13) which expanded becomes equation (14) 2 2 2 H 1 [0.25 x1 0.25 x 2 0.25 x 3 ] 1 u 2 2 2 2 2 2 3 0.877 0 1 x1 x1 x 3 0.088 x1 x 3 0.019 x 2 0.47 x1 3.846 x1 0.215 u T 0 0 1 x 2 0 0 2 3 4.208 0 0.396 x 20.967 0 . 47 x 3 . 564 x 3 1 1 The differential co-states equations of the system are defined: H (14) X and the co-state differential equations become: 1 H 2 2 0.25 x1 1 0.877 2 x1 x3 0.088 x3 2(0.47 x1 ) 3(3.846 x1 ) 3 4.208 2(0.47 x1 ) 3(3.564 x1 ) x1 2 3 H 0.25 x 2 2 2(0.019 x 2 ) x 2 (16) H 2 0.25 x3 1 1 x1 0.088 x1 2 3 0.396 x3 which is subject to the cost function 1 J [ X T QX r 2 ]dr 20 (15) (17) (18) 4 From equations (6) and (15) through (17) the Meta State Vector Z can be formed: f (18) Z X H F (Z ) X U ( x(t ), u (t )) is the utility function denoting the cost going from t to time t+1 U ( x(t ), u(t )) x T Qx r 2 (19) The co-state ( x (t )) is defined as ( x(t )) J ( x(t )) x(t ) (20) then the optimal critic, ( x(t )) * is defined by ( x(t ))* U ( x(t ), u (t )) x(t 1) u ( x(t )) U ( x(t ), u (t )) x(t 1) (21) ( x(t 1)) ( x(t 1)) x(t ) x(t ) x(t ) u (t ) u (t ) where from the determined system it can be expressed as: ( x(t ))* 2Qx (t ) ( x(t 1)) A u ( x(t )) 2 Ru (t ) ( x(t 1)) B x(t ) (22) It can be seen that the co-state equation develops backwards in time, and as it will be shown later it is obtained from back integrating the differential Meta State Vector Z . The goal is to determine the control law of the continuous system throughout Hamiltonian formulation. The optimal control law is determined by the Bellman’s optimality equation3. J ( x(t )) U ( x(t ), u (t )) x(t 1) (23) 0 ( x(t 1)) u (t ) u (t ) u (t ) 0 2 Ru (t ) * ( x(t 1)) B (24) thus the optimal control becoming u(t )* (2R) 1 ( x(t 1)) B (25) Action and Critic Neural Network Architecture The Adaptive Critic Neural Network is a feed forward back-propagation architecture divided in an Action Neural Network (Action NN) and a Critic Neural Network (Critic NN). Each network has its own independent architecture characteristics but at the same 5 time their intrinsic relationship is a key point for obtaining the near optimal control law for a given analyzed system. Both the Action NN and Critic NN, are consisted of two hidden layers with hyperbolic tangent sigmoid transfer function (tansig), and an output layer with a linear transfer function (purelin). The general structure can be seen in Figure1. The output for both networks can be defined such: a 3 W 3 f 2 (W 2 f 1 (W 1 p b1 ) b 2 ) b 3 (26) where for the Action NN the output becomes 2 u (t ) WA3 ( 1 e 2 2 (W A2 *( 1 e 1 x ( t ) b1 ) 2 (WA A 1) b A2 ) 1) bA3 (27) where the upper script implies the layer and the subscript implies A for Action NN and C for Critic NN. The Critic NN output becomes (t ) WC3 ( 1 e 2 2 2 (WC2 *( 1 e 1 x ( t ) b1 ) 2 ( WC C 1) bC2 ) 1) bC3 (28) Each one of the two hidden layers for both networks has l number of neurons defined by the designer’s choice. The output layer of the Action NN has m neurons, where m is the number of controllers, and the Critic NN output layer has n neurons, where n is the number of states of the problem, in this case 3 neurons. For the preliminary results of this paper, the Action NN has N3,4,4,1 architecture ,ie. 3 neurons corresponding to the three state inputs, 1 neuron corresponding to the single control output and 4 neurons for the first and second hidden layers. The Critic NN has N3,6,6,3 architecture ,ie. 3 neurons corresponding to the three state inputs and outputs and 6 neurons for the first and second hidden layers. The training algorithm is implememnted in MATLAB using the Neural Network Toolbox. Training of the Action and Critic Neural Network Two networks, the Action NN and the Critic NN, are randomized and trained using the algorithm: 1) The initial Critic NN is assumed to be optimal. 2) The initial output u(t), is obtained by feeding random values of the states X(t), to the of the Action NN. 3) The continuous time nonlinear equations of motion (6) are used to integrate the next state X(t+1) using the states X(t) and the initial output u(t) of the Action NN. 4) The Critic NN is feed the output form step 3, X(t+1), to calculate (t 1) . 6 5) The Action NN is then trained using X(t) as input and the optimal control, u (t ) * (26), as target. 6) Steps 1 through 6 are repeated until the desired level of accuracy for the Action NN is achieved. 7) The Action NN is assumed to be optimal. 8) The initial output u(t), is obtained by feeding random values of the states X(t), to the of the Action NN. 9) The continuous time nonlinear equations of motion (6) are used to back integrate the Meta State Vector Z to obtain ( x(t ))* by using as input the output from step3, X(t+1), from step4 (t 1) , and the initial output u(t) from step8 10) The Critic NN is then trained using X(t) as input and ( x(t ))* from step9 as target. 11) Steps 7 through 10 are repeated until the desired level of accuracy for the Critic NN is achieved. Step 11 marks the end of one training cycle. The training cycles are continued until the time where there is no acceptable change in the outputs of both the Action and Critic NN. At this point the output u (t ) * of the action NN is the optimal control. Preliminary analysis and results. The response of the aircraft to the three different controllers derived by Garrard and Jordan1 were tested against the preliminary Neural Network optimal control solution. The three different controllers introduced by Garrard and Jordan1 are the solution of the linear control law obtained by solving the matrix Riccati equation, and the second and third order controllers: 0.053x1 0.5x 2 0.521x3 2 0.053x1 0.5x2 0.521x3 0.04 x12 0.048x1 x2 nd (10) (11) 3 0.053x1 0.5x2 0.521x3 0.04 x12 0.048x1 x2 0.374 x13 0.312 x12 x2 rd (12) At the flight conditions considered for this paper, Mach=0.85 and 30,000 feet (9000 m), the F-8 stalls when the angle of attack is 23.5º. Figure2 shows the time response for the three states and the control, the tail elevator deflection for an initial angle of attack of 25.69º, and a pitch angle and pitch rate of 0º. The initial angle of attack of Figure2 corresponds to the last angle for which the linear controller (10) can recover from stall. Beyond this angle the linear quadratic solution cannot effectively recover from the stall. It can be seen that the Neural Network solution reaches equilibrium faster than any of the three compared controllers. Figure3 shows the cost associated to recover from stall for each one of the different controllers. It can be seen that the Neural Network solution has the smaller cost associated with recovering from stall. Figure4 shows the time response for the three states and the control, the tail elevator deflection for an initial angle of attack of 25.99º, and a pitch angle and pitch rate of 0º. The initial angle of attack of Figure4 corresponds to the last angle for which the second order controller (11) controller can recover from stall. Beyond this angle the second order controller solution cannot 7 effectively recover from the stall. It can be seen that the Neural Network solution reaches equilibrium faster than any of the two remaining compared controllers.Figure5 shows the cost associated to recover from stall for each one of the different controllers. It can be seen that the Neural Network solution has the smaller cost associated with recovering from stall. Figure 6 shows the time response for the three states and the control, the tail elevator deflection for an initial angle of attack of 27º, and a pitch angle and pitch rate of 0º.The initial angle of attack of Figure6 corresponds to the last angle for which the third order controller (12) controller can recover from stall. Beyond this angle the third order controller solution cannot effectively recover from the stall. It can be seen that the Neural Network solution reaches equilibrium faster than the remaining compared controller. Figure7 shows the cost associated to recover from stall for each one of the different controllers. It can be seen that the Neural Network solution has the smaller cost associated with recovering from stall. After this initial angle of attack of 27º, the none of the three controllers compared against the Neural network solution are effective to recover from stall. Figure 8 shows the time response for the three states and the control, the tail elevator deflection for an initial angle of attack of 30º, and a pitch angle and pitch rate of 0º. It can be seen that the Neural Network solution is able to recover from stall at this angle of attack. Figure9 shows the cost associated to recover from stall for the Neural Network controller. Figure 10 shows the time response for the three states and the control, the tail elevator deflection for an initial angle of attack of 35º, and a pitch angle and pitch rate of 0º. It can be seen that the Neural Network solution is able to recover from stall at this angle of attack. Figure11 shows the cost associated to recover from stall for the Neural Network controller. Beyond the initial angle of attack of 35º, the Neural Network controller cannot recover from stall without exceeding the maximum tail deflection of 25º. It has been prove the efficiency of the Neural Network controller against three compared controllers to be able to recover from stall. It has to be noted that the training of the Neural Network control here used is far from being complete since the training region for the solution here presented is only up to 15º. It is expected that once the range of the Neural Network controller is broaden to the stall conditions range, that the effectiveness on such controller will be considerably improved. Future Work Currently, the envelope of the training range is being increased to include the stall conditions into the Neural Network training. Also robustness for systems with input uncertainties, such the lag time between the pilot control input and the actual tail deflection, are included into the Neural Network algorithm. Both of this improvements will be included in the next revision of this paper. 8 Figures Figure 1 9 Plots Figure2 Figure3 10 Figure4 Figure5 11 Figure6 Figure7 12 Figure8 Figure9 13 Figure10 Figure11 14 Bibliography Garrard William L. and Jordan, M. 1977, “Design of Nonlinear Automatic Flight Control Systems.” Automatical . Vol 13, pp 497-505 Pergamon Press, 1977, Great Britain. 2 Balakrishnan, S.N. and Biega, V., “Adaptive Critic Based Neural Networks for Control.” Proceedings of the American Control Conference, 1995, Seattle, WA. 3 Bellman, R.E. ”Linear Equations and Quadratic Criteria: Introduction to the Mathematical Theory of Control Process”. Vol. 1, New York Academic Press, 1997 1 15