Rensselaer Polytechnic Institute 35476 Computer Applications Laboratory EXPERIMENTS IN OPTIMAL CONTROL Number of Sessions – 4 INTRODUCTION In modern control systems design it is sometimes necessary to design controllers that not only effectively control the behavior of a system, but also minimize or maximize some user defined criteria such as energy or time conservation, or obey other physical or time constraints imposed by the environment. Optimal Control Theory provides the mathematical tools for solving problems like these, either analytically or through computer iterative methods, by formulating the user criteria into a cost function and using the state equation representation for the system dynamics. The optimal control experiments in this lab involve the use of a PC-AT to illustrate the efficacy of microprocessors as system controllers. Two types of control will be considered: 1) Continuous (analog) 2) Discrete (digital) The general formulation of the optimal control problem is as follows: Let the system states (ie. the state variables) be represented by an n-dimensional vector x, where n is the order of the system. Let the control variables (input) be represented by an m-dimensional vector u, and the system output by an r-dimensional vector y. The system can, in general, be modelled by a set of differential equations of the form: . x = (x, u) , y = (x, u) x(t0) = c (1) where and are general (possibly nonlinear and time varying) expressions of x and u, t0 is the initial time and c is an n-dimensional set of initial conditions. The objective is to find a control input u(t) that will drive the system from the initial point c ( =x(t0) ) in the state space to the final point x(tf), and at the same time minimize a cost functional J (formally called performance index/criterion) given by equation (2): 1 tf J = h(x(tf), tf) + g(x(t), u(t), t) dt (2) t0 where: tf represents the end of the control interval, h and g are user defined penalty expressions. PART I – CONTINUOUS CONTROL MATHEMATICAL FORMULATION – RESULTS Even though the general problem is very hard to solve, there are very useful simplifications that lead to closed form (analytic) solutions. The most common simplification (and the one used for this experiment) is the LQR problem, where the system is Linear and the controller (Regulator) must satisfy a Quadratic cost functional. Assuming that the system is time invariant as well, then state and output equations (1) become: . x(t) = Ax(t) + Bu(t) , y(t) = Cx(t) + Du(t) x(0) = x0 (3) where: x(t) is the (n x 1) state vector, x0 is the initial state vector, u(t) is the (m x 1) control input vector, y(t) is the (r x 1) output vector, A is the (n x n) state dynamics matrix, B is the (n x m) control dynamics matrix, C is the (r x n) state-output matrix, D is the (r x m) input-output matrix (for all practical purposes assumed 0 thereafter). Corresponding to the system is a performance index represented in a quadratic form as: tf 1 1 { x'(t)Qx(t) + u'(t)Ru(t) } dt J(u(t), x(t), t) = x'(t)Hx(t) + 2 2 (4) 0 where: H is the (n x n) terminal state penalty matrix, Q is the (n x n) state penalty matrix, R is the (m x m) control penalty matrix. It is important to emphasize that the H, Q, and R matrices are user selectable, and it is through the proper selection of these that the various environment constraints are to be satisfied. If H, Q, and R are positive definite and symmetric, then it can be shown that a closed loop optimal control function u(t) exists (called u*(t)), and is uniquely given by: u*(t) = -R-1B'P(t)x(t) = G(t)x(t) 2 . G(t) = -R-1B'P(t) 3 (5) where: G(t) is the (m x n) optimal feedback gain matrix, P(t) is an (n x n) symmetric and positive definite matrix that satisfies the continuous matrix differential Riccati equation given by: . P (t) = -P(t)A - A'P(t) + P(t)BR-1B'P(t) - Q , P(tf) = H (6) From the above it is obvious that, even for the LQR problem, the set of coupled differential equations (6) must be solved before the controller gains be found. It should be noted that these gains are functions of time. The significance of the result of equation (5) is that the gains can be computed ahead of time, and the optimal controller is easily implemented on any digital computer. In all but the simplest cases of the real life applications, computer iterative methods are utilized to precalculate and store the P matrix and the gains, resulting in very efficient and robust open and closed loop controllers. A further simplification to the above solution takes place when tf approaches ∞, or in more realistic terms the control is applied for a very large period of time. In this case it is shown that: if (1) the system is controllable, (2) H is zero, (3) the A, B, Q, and R, are all constant matrices, then the P(t) matrix converges to a constant symmetric, real, positive definite matrix Pss. Obviously then . P (t) is zero, and the matrix differential Riccati (6) is transformed into a matrix algebraic equation (called the Steady State Riccati Equation) given by: 0 = -PA - A'P + PBR-1B'P - Q (7) The steady state solution Pss need not be unique. However, only one real positive definite solution can exist. Since P is of dimensions (n x n), we obtain n2 equations to solve for the n2 components of P. Yet due to the fact that it is symmetric, the number of computations is greatly reduced. Note that the steady state Riccati equation for the continuous case is easier to solve without the aid of a computer than is the discrete equivalent to be computer later, because no inverse of the unknown P appears in the equation. Once the P matrix is determined, the optimal control gain matrix G can be found from equation (5). Of course now the gains are constants as expected. Solving the general (non steady state) LQR problem requires substantial storage for the gains that depends on the tf and the sampling interval, and exceeds the scope of this experiment which will hereafter focus on the steady state case. Another obvious but never the less important result of the steady state case is that the system states must converge to zero regardless of their initial condition vector x0. An intuitive proof by contradiction of the above result can be derived using the following notion: The performance index is always positive as sum of positive factors since all matrices are positive definite. If it is to be minimized over an infinitely large time, then this minimum better be constant. Yet if the states are not approaching zero, then from equation (5) the control isn't either, and we end up summing two positive quantities, inside the integral, for an infinite time. This of course can never give a constant (minimum) value, therefore the states must go to zero. Applications exploiting this result are controllers designed to keep the states always near zero, counteracting random abrupt changes in the state values (or any other causes that can be modelled as impulses). The following example sets up a typical infinite time LQR problem and analytically solves the steady state Riccati equation, giving the student sufficient experience to deal with the actual experiment. 4 5 EXAMPLE Consider the following second order system: x. (t) 1 = 0 1 x1(t) + . -1 0 x2(t) x2(t) x1(t) y(t) = [ 1 0 ] x (t) 2 0 u(t) 1 with the corresponding performance index given by ∞ 1 0 x1(t) + u2(t) } dt J = { [x1(t) x2(t)] 0 1 x2(t) 0 The continuous steady state Riccati equation (7) can be written as: PBR-1B'P = PA + A'P + Q (8) and using this example's data it becomes: P11 P12 0 P P 1 [1][ 21 22 P 11 P12 0 1 P P -1 0 21 22 0 1 ] P11 P12 P P = 21 22 1 0 + 0 1 0 -1 P11 P12 + 1 0 P21 P22 The matrix equation can be rewritten as three simultaneous algebraic equations (remember P12 = P21): 2 P12 = 1 - 2P12 P12P22 = P11 - P22 2 P22 = 1 + 2P12 Solving the three equations, two real solutions for P are obtained: 1.912 0.4142 0.4142 1.352 -1.912 0.4142 , 0.4142 -1.352 By checking the principal minors of each of the two matrices, it is found that only the first of these is positive definite*, therefore it's the accepted solution. * A matrix is positive definite if all principal minors are positive. The principal minors are the determinants of the submatrices along the principal diagonal of the matrix which include the topmost left element of the diagonal. Thus the first principal minor is the element (1,1), the second is the 2x2 submatrix in the upper left corner, and so on. 6 The feedback gain matrix is then determined from equation (5) to be: G = [ -0.4142 -1.352 ] Thus, the optimal control input u(t) is given by: u(t) = -0.4142x1(t) - 1.352x2(t) (9) The student is expected to work through this example prior to attempting the problem for the experiment, to ensure a thorough understanding of the procedure. FIGURE 1. Block diagram of the open loop system. PROBLEM FORMULATION This experiment is based on the Steady State case of the LQR problem. Here the plant to be controlled is a 2nd order linear and time invariant system implemented on the Comdyna analog computer, the penalty matrices Q, R are constants and H is zero. The optimal control input u(t) will drive the system states x1 and x2 from the initial values of -3 volts and 0 volts respectively to zero, and it will be generated either using the Comdyna analog computer, or a PC-AT running the appropriate program. The second order system to be controlled is modeled by the transfer function: Output Y(s) H(s) = Input = U(s) = 0.5 s(s + 0.5) (10) which is to be implemented on the Comdyna analog computer following a suitably designed simulation diagram. The state variable representation of the system transfer function (10) is of the same general form as equations (3) with D equal to zero, and is given by: x. (t) 1 = 0 1 x1(t) + 0 u(t) . 0 -0.5 x2(t) 0.5 x2(t) x1(t) y(t) = [ 1 0 ] x (t) 2 7 , -3 x1(0) x (0) = 0 2 (11) Given the state variable equations, the open loop system block diagram can be obtained as in FIG. 1 and the actual analog computer simulation as in FIG 2. The student is urged to verify both results, paying particular attention to the initial conditions in FIG. 2. Note that the integrators invert the voltage on the IC input so a +3 volts will give the desired initial output voltage of -3 volts. FIGURE 2. Analog computer simulation of the open loop system. The performance index for this system is given in accordance with equation (4) by: ∞ 2 0 x1(t) + 0.4u2(t) } dt J = { [x1(t) x2(t)] 0 1 x2(t) (12) 0 EXPERIMENTAL PROCEDURE The aim of the experiment is to design a closed loop optimal controller that will drive the states to zero using the feedback gains obtained via the equations (5). For this purpose the following steps must be sequentially implemented: 1) The open loop system is to be built on the Comdyna analog computer in accordance with the simulation diagram of FIG. 2. Special care must be taken when implementing the initial conditions and the gains because of the sign inversions at the output of the amplifiers. The strip chart recorder must be thoroughly calibrated and its various scales mastered, before any useful work can be done. Remember, the Comdyna dial must be on the Pot Set position and the push button labeled IC be pressed in during setup; during operation the dial must always be on the Oper position, and the push button labeled OP must be pressed in just before each run starts. 2) The impulse response of the system is to be simulated using the Comdyna and the PC-AT, and the natural (uncontrolled) modes of the states x1, x2 are to be plotted on the strip chart. The concept behind this, is that when an Asymptotically Stable system is excited with an impulse function (t), 8 then the states converge to zero regardless of their initial values, producing the same final result as the optimal controller does. Hence the plots of the state trajectories obtained in this part, are to be compared with the ones produced by applying the closed loop optimal control. Before running this part it is necessary to verify that both states are indeed stable by solving the system differential equations. However, it should be noted that this system is not strictly Asymptotically Stable due to the pole at the origin. To implement the impulse function, connect the D/A0 port of the PC-AT to the input of the system, and select the Uncontrolled State Response option from the program menu (details on running the PC-AT program are provided later). It is important to understand that since the (t) function is theoretical pulse of infinite amplitude and infinitesimal duration, no human realizable input can reproduce it. Hence a +10 volt step function is applied for a period of 296 msec to produce the same net result. The duration is found experimentally and is valid only for the -3 volt initial condition of state x1. Also you may observe that state x1 reaches the desired zero as theoretically expected, yet it then continues linearly increasing as time passes by. This should be attributed to leaks within the Comdyna integrators (from the first to the second integrator) rather than to a theory inconsistency. FIGURE 3. Block diagram of the closed loop system. 3) The algebraic Ricatti equation for the above system is to be solved and the P matrix elements and the optimal feedback gains g1, g2 are to be computed using the equations (5) and the example. Even though MATLAB or any other equivalent math package can be used for verification purposes, a detailed analytical solution of the P matrix elements and the gains is mandatory. Having found the gains, the optimal controller is given by equation (13) as: u*(t) = g1x1(t) + g2x2(t) (13) The block diagram of the complete closed loop system is given on FIG. 3, and its simulation diagram is given on FIG. 4. The control calculation (equation (13)) of FIG. 4 is to be implemented either on the Comdyna by the student, or through the PC-AT program upon selecting the relevant menu option. 9 Figure 4. Analog computer simulation of the closed loop system. 4) The complete closed loop system, both the plant and the feedback controller, is to be built on the Comdyna analog computer using the FIG. 4, and a strip chart recording of the state trajectories be obtained for comparison with the results from the other runs. This controller should produce the best results for a given set of dynamics and penalties. (Why?) The implementation on the Comdyna can be done either using the multipliers, or passing the states individually through the .1 inputs of adders to effectively increase them by a factor of 10, and then through potentiometers with values .1 times the gains. Selecting the later method requires the use two analog computers. 5) The experiment is to be run using the PC-AT program to implement the closed loop controller. For this part the system built on the Comdyna is to be connected to the PC-AT as in FIG. 4. State x1 must be fed to the PC-AT input port A/D0, state x2 to the PC-AT input port A/D1, and the system input u to the PC-AT output port D/A0. All other ports are to be left free. The control algorithm is to be executed using different sampling periods (at least ten runs ranging from 50 to 1000 msec), and the plots of the state trajectories be compared with the previous results. To access the PC-AT program do the following: • Turn the PC-AT on (if off) and go to the OPTIMAL subdirectory by typing: CD \CAL_LAB\OPTIMAL. • Type OPTIMAL to run the program. After the introductory screen a menu will appear. Select your option using the arrow keys and ENTER, or pressing the highlighted character. • A Data Input Menu will appear with the variable names on the left and editable fields on the right. If the cursor does not appear press the Insert key. To move among the fields and edit the data within the fields use the cursor arrows and the regular editing keys (ENTER, Backspace, Delete). • Unacceptable keystrokes are met with error messages, and a basic help screen is available by pressing F1. Be warned that whenever ranges appear next to variable names, they act in an advisory capacity, and it is at the user's discretion to enter meaningful data in the fields. • When done editing, press the END key to start execution, or ESC to exit the Input menu. During control execution no other key is to be pressed, except CTRL-BREAK which is used to abort the current run. 10 • Aborting control execution brings you back to the Input Menu for more data entry. For the Continuous Case the precalculated gains during step 2 must be entered at the relevant fields once since they are constant, and only the sampling period is to be changed before each new run. Since the computer program does not invert the controller sign in the output, the actual calculated gains must be entered in the relevant fields (compare with the analog implementation of step 3). Record your observations and make suitable strip chart recordings of the states. WRITE-UP AND ANALYSIS Your report should contain: 1) Solution of the continuous steady state Riccati equation and calculation of the feedback gain. (Show that the solution to the steady state Riccati equation is positive definite). 2) Comments on the asymptotic stability of the system. 3) Various strip chart recordings for procedure parts 1), 3), and 4). These should be appropriately labeled. 4) Discussion of results, including a comparison of the strip chart recordings and the significance of the sampling times (especially when it increases). When discussing the sampling times, reference must be made to the strip chart recordings. 5) Conclusion. PART II – DISCRETE CONTROL MATHEMATICAL FORMULATION – RESULTS In the case of digital control, the optimization must be treated as a discrete time problem. Thus, the sampled system is modelled by the difference equations: x(k + 1) = Ax(k) + Bu(k) , y(k) = Cx(k) + Du(k) x(0) = c (14) and the discrete performance index in a quadratic form is given by N 1 1 J = x'(N)Hx(N) + { x'(k)Qx(k) + u'(k)Ru(k) } 2 2 (15) k=0 where A, B, C, D, H, Q and R are similar to those in the continuous case and N is a fixed number of time intervals. By applying the conditions of optimality to the problem, the optimal control is found to be: u*(k) = G(k)x(k) (16) where G is again the (m x n) feedback gain matrix given by: G(k) = -R-1B'(A')-1[P(k) - Q ] 11 (17) and P is the (n x n) real positive definite solution of the discrete matrix difference Riccati equation given by: P(k) = Q + A'[P-1(k + 1) + B'R-1B]-1A (18) When N approaches ∞ and the same conditions mentioned in the continuous case apply, then the P(k) matrix converges to a constant symmetric, real, positive definite matrix P. Hence: P(k) = P(k + 1) = P (19) and the matrix difference Riccati becomes a matrix algebraic equation as before. Yet even in the steady state case equation (18) is very difficult to solve analytically, hence iterative methods implemented on computers handle this task. Once the P matrix is found, the gains and the controller are easily computed using equations (17) and (16). The following example should provide some reference as to the procedure to follow to obtain the feedback gain matrix for a simple system. EXAMPLE Given a discrete system described by the state and output equations: x1(k + 1) cosT sinT x1(k) x (k + 1) = -sinT cosT x (k) 2 2 x1(k) y(k) = [ 1 0 ] x (k) 2 + 1 - cosT u(k) sinT where T is the sampling interval in seconds. The discrete performance index is given by: ∞ J = { x'(k) 1 0 x(k) + u2(k) } 0 1 k=0 If we let the sampling interval T be equal to /40 seconds, the discrete equations become: 0.997 0.0785 x1(k) 0.00308 x1(k + 1) x (k + 1) = -0.0785 0.997 x (k) + 0.0785 u(k) 2 2 The discrete Riccati equation is not easily solved by hand due to the fact that a complicated cubic equation has to be solved. Using the PC program the solution is found as: P = 24.6 5.15 5.15 17.5 Thus, from equation (17) the feedback gain matrix is: G(k) = [ -0.331 -1.28 ] 12 and from (12) the discrete optimal control is becomes: u(k) = -0.331x1(k) - 1.28x2(k) PROBLEM FORMULATION This part of the experiment implements the steady state case of the Discrete Linear Quadratic Regulator problem. The same linear and time invariant system as before is used, but now for all calculations it is treated as a sampled data system modelled by the following state and output equations: x1(k + 1) 1 2 - 2e-.5T x1(k) + T - 2 + 2e-.5T u(k) , x (k + 1) = 0 e-.5T x2(k) 1 - e-.5T 2 x1(k) y(k) = [ 1 0 ] x (k) 2 -3 x1(0) x (0) = 0 2 (20) where T is the sampling interval. The discrete performance index for this system is given by: ∞ J = 2 0 x1(k) { [x1(k) x2(k)] 0 1 x (k) + 0.4u2(k) } 2 (21) k=0 It should be noted here that the discrete A state dynamics and B control dynamics matrices have been chosen (out of the infinitely many possibilities representing the given system) so that the discrete states x(k) are the continuous states x(t). Given a continuous system to be modeled by a discrete system with zero order holds on the outputs, the discrete matrices Ad and Bd may be obtained from the continuous matrices Ac and Bc by: A T Ad = e c -1 Bd = [Ad - I]Ac Bc where T is the sample period. Note that Cc = Cd and Dc = Dd. (Why?) The section "Discrete State Equations" in the chapter "Open-Loop Discrete-Time Systems" of Digital Control System Analysis and Design by C. L. Phillips and H. T. Nagle provides more details on this continuous to discrete transformation. EXPERIMENTAL PROCEDURE The procedure for this section is similar to that of the PC-AT based Continuous Control section, yet now the system and input dynamics (A, B) vary with sampling, hence they must be calculated in advance. Use any tool you feel comfortable with (e.g. Lotus 1-2-3) to compute them. The system is to implemented on the Comdyna analog computer and be connected to the PC-AT according to FIG. 4 as before. Run the program and select the Discrete Optimal Control option from the menu. Enter/edit the data in the various fields and press END when done. At this point the program solves 13 the steady state discrete Ricatti iteratively, and returns the P matrix elements as well as the gains G. Care must be taken to avoid typos when entering the data, especially R, because divisions by zero will crash the program. It's recommended that a hard copy (Screen dump) be taken before starting the actual run. To do this press the Print Screen button after the Riccati solution is on the screen. Pressing END will again start the control execution until CTRL-BREAK stops it. Record your observations, and make appropriate strip chart recordings. WRITE-UP AND ANALYSIS Your report should contain: 1) Parameters that are input for each of ten sampling times, and the P matrix for each of the ten sampling times (from 100 msec. to 1000 msec. in steps of 100 msec.). 2) Strip chart recordings of x1 and x2 for each of the ten sampling times. 3) Discussion of results, which should include a comparison of the discrete control to the continuous control problem, with regard to sampling times, etc. References must be made to the strip chart recordings. 4) Conclusions. PART III – DISCRETE CONTROL SUPPLEMENT This is a variation of PART II, where the Q matrix and the value of R in the discrete performance index are changed. As mentioned at the beginning, these matrices are "freely" selected by the designer. Thus it becomes of great importance to observe and analyze how their values affect the states and the control input. Given are four sets of Q and R, selected in such a way that the each time a different state or control will be heavily penalized. The procedure for PART II with the exception of the sampling times (only 100 msec. is used for this part), must be followed when using each of the sets below. The write-up format should also be adhered to. Q = 2 0, 0 1 R = 0.1 Q = 2 0, 0 1 R = 1.0 Q = 9 0, 0 1 R = 0.4 Q = 2 0, 0 9 R = 0.4 The effects of changing Q and R are to be noted in the writeup. The behavior of the responses for the different cases must be explained (for example, the difference in response times). 14 REFERENCES For the state variables and feedback control only introductory material is necessary. Suggested reading: Frederick, D. K. and Carlson, A. B., Linear Systems in Communication and Control, Section 3.5, pp. 79-88. DeRusso, P. M., Roy, R. J., and Close, C. M., State Variables for Engineers, Sections 8.1 - 8.5. For the continuous and discrete optimal control two independent approaches exist: the Linear Programming methodology and the Calculus of Variations approach. Kirk covers both approaches from a tutorial point of view, yet the material is still extremely difficult to digest. The perspective reader should concentrate on the notions and the examples rather than trying to verify the mathematical derivations. Sage is more application-oriented and extends his results to the Estimation Problem. Linear Algebra and Advanced Calculus are prerequisites for anything other than browsing. Suggested reading: Kirk,D., Optimal Control Theory, Prentice-Hall, NJ 1970 Chapter 1 pp. 3-21: general introduction to systems and terminology. Chapter 2 pp. 34-42: Typical 2nd Order LQR example, very useful to read. Chapter 3 pp. 84-86: Example of the Discrete LQR detailed in section 3.10. Chapter 4: Complete introduction to Calculus of Variations, contains proofs of all subsequent results. Chapter 5: pp. 209-218: The LQR problem with examples, state transition matrix and Kalman solutions. Sage, A. P. and White, C. C.,Optimum Systems Control, Prentice -Hall, NJ 1977 Chapter 5 pp. 88-100: The LQR problem with examples, only the Kalman solution. Chapter 6 pp 128-132 & pp. 135-136: The Discrete LQR derivation and examples. 15 APPENDIX – ANALOG COMPUTER WIRING DIAGRAMS POT 3 AMP 1 AMP 2 POT 1 AMP 7 POT 2 16 INTEGRATOR 1 3 SUMMER 3 INTEGRATOR 2 POT 3 AMP 1 AMP 2 POT 1 AMP 7 POT 4 POT 2 AMP 3 POT 5 17