OPTIMISATION AND OPTIMAL CONTROL AIM To provide an understanding of the principles of optimisation techniques in the static and dynamic contexts. 1 LEARNING OBJECTIVES On completion of the module the student should be able to demonstrate: an understanding of the basic principles of optimisation and the ability to apply them to linear and non-linear unconstrained and constrained static problems, an understanding of the fundamentals of optimal control, system identification and parameter estimation. 2 CONTENT (20 lectures common to BEng and MSc) Principles of optimisation, essential features and examples of application. (1 lecture) Basic principles of static optimisation theory, unconstrained and constrained optimisation, Lagrange multipliers, necessary and sufficient conditions for optimality, limitations of analytical methods. (3 lectures) Numerical solution of static optimisation problems: unidimensional search methods; unconstrained multivariable optimisation, direct and indirect methods, gradient and Newton type approaches; non-linear programming with constraints. 3 (4 lectures) Introduction to genetic optimisation algorithms. (1 lecture) Basic concepts of linear programming. (1 lecture) On-line optimisation, integrated system optimisation and parameter estimation. (1 lecture) 4 The optimal control problem for continuous dynamic systems: calculus of variations, the maximum principle, and two-point boundary value problems. The linear quadratic regulator problem and the matrix Riccati equation. (4 lectures) Introduction to system identification, parameter estimation and self-adaptive control (3 lectures) Introduction to the Kalman filter. (2 lectures) 5 (10 Lectures MSc Students only) 6 LABORATORY WORK The module will be illustrated by laboratory exercises and demonstrations on the use of MATLAB and the associated Optimization and Control Tool Boxes for solving unconstrained and constrained static optimisation problems and for solving linear quadratic regulator problems. ASSESSMENT Via written examination. MSc only - also laboratory session and report 7 READING LIST P.E Gill, W. Murray and M.H. Wright: Practical Optimization" (Academic Press, 1981) T.E. Edgar. D.M. Himmelblau and L.S. Lasdon: "Optimization of Chemical Processes“ 2nd Edition (McGrawHill, 2001) M.S. Bazaraa, H.D. Sherali and C.M. Shetty: “Nonlinear Programming - Theory and Algorithms” 2nd Edition (Wiley Interscience, 1993) J. Nocedal and S.J. Wright: “Numerical Optimization”, (Springer, 1999) J.E. Dennis, Jr. and R.B. Schnabel: “Numerical methods for Unconstrained Optimization and Nonlinear Equations” (SIAM Classics in Applied Mathematics, 1996) (Prentice Hall, 1983) K.J. Astrom and B Wittenmark: "Adaptive Control", 2nd Edition (Prentice Hall, 1993) F.L. Lewis and V S Syrmos: "Optimal Control" 82nd Edition (Wiley, 1995) PRINCIPLES OF OPTIMISATION Typical engineering problem: You have a process that can be represented by a mathematical model. You also have a performance criterion such as minimum cost. The goal of optimisation is to find the values of the variables in the process that yield the best value of the performance criterion. Two ingredients of an optimisation problem: (i) process or model (ii) performance criterion 9 Some typical performance criteria: maximum profit minimum cost minimum effort minimum error minimum waste maximum throughput best product quality Note the need to express the performance 10 criterion in mathematical form. Static optimisation: variables have numerical values, fixed with respect to time. Dynamic optimisation: variables are functions of time. 11 Essential Features Every optimisation problem contains three essential categories: 1. At least one objective function to be optimised 2. Equality constraints 3. Inequality constraints 12 By a feasible solution we mean a set of variables which satisfy categories 2 and 3. The region of feasible solutions is called the feasible region. x2 nonlinear inequality constraints linear equality constraint linear inequality constraint nonlinear inequality constraint linear inequality constraint x1 13 An optimal solution is a set of values of the variables that are contained in the feasible region and also provide the best value of the objective function in category 1. For a meaningful optimisation problem the model needs to be underdetermined. 14 Mathematical Description Minimize : f (x) objective function h(x) 0 equality constraints Subject to: g(x) 0 inequality constraints where x n , is a vector of n variables (x1, x2 , , xn ) h(x) is a vector of equalities of dimension m1 g(x) is a vector of inequalities of dimension m2 15 Steps Used To Solve Optimisation Problems 1. Analyse the process in order to make a list of all the variables. 2. Determine the optimisation criterion and specify the objective function. 3. Develop the mathematical model of the process to define the equality and inequality constraints. Identify the independent and dependent variables to obtain the number of degrees of freedom. 4. If the problem formulation is too large or complex simplify it if possible. 5. Apply a suitable optimisation technique. 6. Check the result and examine it’s sensitivity to changes in model parameters and assumptions. 16 Classification of Optimisation Problems Properties of f(x) single variable or multivariable linear or nonlinear sum of squares quadratic smooth or non-smooth sparsity 17 Properties of h(x) and g(x) simple bounds smooth or non-smooth sparsity linear or nonlinear no constraints 18 Properties of variables x •time variant or invariant •continuous or discrete •take only integer values •mixed 19 Obstacles and Difficulties • Objective function and/or the constraint functions may have finite discontinuities in the continuous parameter values. • Objective function and/or the constraint functions may be non-linear functions of the variables. • Objective function and/or the constraint functions may be defined in terms of complicated interactions of the variables. This may prevent calculation of unique values of the variables at the optimum. 20 • Objective function and/or the constraint functions may exhibit nearly “flat” behaviour for some ranges of variables or exponential behaviour for other ranges. This causes the problem to be insensitive, or too sensitive. • The problem may exhibit many local optima whereas the global optimum is sought. A solution may be obtained that is less satisfactory than another solution elsewhere. • Absence of a feasible region. • Model-reality differences. 21 Typical Examples of Application static optimisation • Plant design (sizing and layout). • Operation (best steady-state operating condition). • Parameter estimation (model fitting). • Allocation of resources. • Choice of controller parameters (e.g. gains, time constants) to minimise a given performance index (e.g. overshoot, settling time, integral of error squared). 22 dynamic optimisation • Determination of a control signal u(t) to transfer a dynamic system from an initial state to a desired final state to satisfy a given performance index. • Optimal plant start-up and/or shut down. • Minimum time problems 23 BASIC PRINCIPLES OF STATIC OPTIMISATION THEORY Continuity of Functions Functions containing discontinuities can cause difficulty in solving optimisation problems. Definition: A function of a single variable x is continuous at a point xo if: (a) f ( xo ) exists (b) lim f ( x ) exists x xo (c) lim f ( x ) f ( xo ) x xo 24 If f(x) is continuous at every point in a region R, then f(x) is said to be continuous throughout R. f(x) is discontinuous. f(x) x f(x) is continuous, but f(x) f ( x) df ( x) is not. dx x 25 Unimodal and Multimodal Functions A unimodal function f(x) (in the range specified for x) has a single extremum (minimum or maximum). A multimodal function f(x) has two or more extrema. If f ( x) 0at the extremum, the point is called a stationary point. There is a distinction between the global extremum (the biggest or smallest between a set of extrema) and local extrema (any extremum). Note: many numerical procedures terminate at a local extremum.26 A multimodal function f(x) local max (stationary) global max (not stationary) stationary point (saddle point) local min (stationary) global min (stationary) x 27 Multivariate Functions Surface and Contour Plots We shall be concerned with basic properties of a scalar function f(x) of n variables (x1,...,xn). If n = 1, f(x) is a univariate function If n > 1, f(x) is a multivariate function. For any multivariate function, the equation z = f(x) defines a surface in n+1 dimensional space n 1. 28 In the case n = 2, the points z = f(x1,x2) represent a three dimensional surface. Let c be a particular value of f(x1,x2). Then f(x1,x2) = c defines a curve in x1 and x2 on the plane z = c. If we consider a selection of different values of c, we obtain a family of curves which provide a contour map of the function z = f(x1,x2). 29 contour map of z ex1 (4x12 2x22 4x1x2 2x2 1) 2 1.5 3 4 5 1.7 1.8 2 6 z = 20 1 1.8 0.5 x2 0 1.7 2 1.0 0.7 0.4 saddle point -0.5 -1 3 4 -1.5 -2 5 -2.5 -3 -3 0.2 6 local minimum -2 -1 x1 0 1 2 30 31 32 Example: Surface and Contour Plots of “Peaks” Function z 3(1 x1 ) exp x ( x2 1) 2 2 1 2 10(0.2 x1 x x )exp( x x ) 3 1 5 2 1 3exp ( x1 1) x 2 2 2 2 1 2 2 33 10 multimodal! 5 0 z -5 -10 20 15 20 15 10 x2 10 5 5 0 x1 0 100 global max 90 80 70 x2 saddle local min 60 50 40 local max 30 20 10 local max 10 20 global min 30 40 50 x1 60 70 80 90 34 35 Gradient Vector The slope of f(x) at a point x x in the direction of the ith co-ordinate axis is f ( x) xi x x The n-vector of these partial derivatives is termed the gradient vector of f, denoted by: f ( x ) x1 f ( x ) (a column vector) f ( x ) L M M M M M Nx n O P P P P P Q 36 The gradient vector at a point x x is normal to the the contour through that point in the direction of increasing f. f ( x ) increasing f x At a stationary point: f (x) 0 (a null vector) 37 Example f (x) x1 x22 x2 cos x1 f (x) O L M x x sin x O x P L f (x) M P M P f ( x ) 2 x x cos x Q M P N M Nx P Q 1 2 2 2 1 2 1 1 2 and the stationary point (points) are given by the simultaneous solution(s) of:- x x2 sin x1 0 2 2 2 x1 x2 cos x1 0 38 Note: If f (x) is a constant vector, f(x) is then linear. e.g. f (x) c x f( x ) = c T 39 Hessian Matrix (Curvature Matrix The second derivative of a n - variable function is defined by the n2 partial derivatives: F I G J H K f (x) , i 1, , n; j 1, , n xi x j written as: f (x) f (x) , i j, , i j. 2 xix j xi 2 2 40 These n2 second partial derivatives are usually represented by a square, symmetric matrix, termed the Hessian matrix, denoted by: L f ( x) M x M H( x) f ( x) M f ( x) M M x x N 2 2 1 2 2 1 n O P P P f ( x) P x P Q 2 f ( x) x1xn 2 2 n 41 Example: For the previous example: L f M x f ( x) M f M M Nx x 2 2 1 2 2 1 2 O P L P M P N P Q f x2 cos x1 2 x2 sin x1 x1x2 2 f 2 x2 sin x1 2 x1 2 x2 2 O P Q Note: If the Hessian matrix of f(x) is a constant matrix, f(x) is then quadratic, expressed as: T 1 T f (x) 2 x Hx c x f (x) Hx c, 2 f ( x) H 42 Convex and Concave Functions A function is called concave over a given region R if: f (xa (1 )xb ) f (xa ) (1 ) f (xb ) where: xa , xb R, and 0 1. The function is strictly concave if is replaced by >. A function is called convex (strictly convex) if is replaced by (<). 43 concave function f(x) f ( x ) 0 xa convex function x xb f(x) f ( x) 0 xa xb x44 2 f If f ( x ) 0 then f ( x ) is concave. 2 x 2 f If f ( x ) 0 then f ( x ) is convex. 2 x For a multivariate function f(x) the conditions are:f(x) Strictly convex convex concave strictly concave H(x) Hessian matrix +ve def +ve semi def -ve semi def -ve def 45 Tests for Convexity and Concavity H is +ve def (+ve semi def) iff x Hx 0 ( 0), x 0. T H is -ve def (-ve semi def) iff x Hx 0 ( 0), x 0. T Convenient tests: H(x) is strictly convex (+ve def) (convex) (+ve semi def)) if: 1. all eigenvalues of H(x) are 0 ( 0) or 2. all principal determinants of H(x) are 0 ( 0) 46 H(x) is strictly concave (-ve def) (concave (- ve semi def)) if: 1. all eigenvalues of H(x) are 0 ( 0) or 2. the principal determinants of H(x) are alternating in sign: 1 0, 2 0, 3 0, ( 1 0, 2 0, 3 0, ) 47 Example f ( x) 2 x 3x1 x2 2 x f ( x) 4 x1 3 x2 x1 2 1 2 2 2 f ( x) 2 f ( x) 4 3 2 x1 x2 x1 f ( x) 2 f ( x) 3x1 4 x2 4 2 x2 x2 4 3 4 3 H(x) , 1 4, 2 7 3 4 3 4 eigenvalues: | I 2 H | 4 3 3 2 8 7 0 4 1 1, 2 7. Hence, f (x) is strictly convex. 48 Convex Region xb convex xa region xa non convex region xb A convex set of points exist if for any two points, xa and xb, in a region, all points: x xa (1 )xb , 0 1 on the straight line joining xa and xb are in the set. If a region is completely bounded by concave functions then the functions form a convex region. 49 Necessary and Sufficient Conditions for an Extremum of an Unconstrained Function A condition N is necessary for a result R if R can be true only if N is true. R N A condition S is sufficient for a result R if R is true if S is true. SR A condition T is necessary and sufficient for a result R iff T is true. TR 50 There are two necessary and a single sufficient conditions to guarantee that x* is an extremum of a function f(x) at x = x*: 1. f(x) is twice continuously differentiable at x*. 2. f (x* ) 0 , i.e. a stationary point exists at x*. 3. 2 f (x* ) H(x* ) is +ve def for a minimum to exist at x*, or -ve def for a maximum to exist at x* 1 and 2 are necessary conditions; 3 is a sufficient condition. Note: an extremum may exist at x* even though it is not possible to demonstrate the 51 fact using the three conditions. Example: Consider: f (x) 4 4.5x1 4 x2 x12 2 x22 2 x1x2 x14 2 x12 x2 The gradient vector is: L 4.5 2 x 2 x 4 x 4 x x O f (x) M P 4 4 x 2 x 2 x N Q 1 3 1 2 2 1 1 2 2 1 yielding three stationary points located by setting f (x) 0 and solving numerically: x*=(x1,x2) f(x*) eigenvalues 2 f (x) of classification A.(-1.05,1.03) -0.51 10.5 3.5 global min B.(1.94,3.85) 0.98 37.0 0.97 local min C.(0.61,1.49) 2.83 7.0 where: 2 L 2 12 x 4 x f ( x) M N 2 4 x 2 1 1 -2.56 2 saddle 2 4 x1 4 O P Q 52 contour map 6 5 4 3 x2 2 1 0 -4 -3 -2 -1 0 x1 1 2 3 4 53 Interpretation of the Objective Function in Terms of its Quadratic Approximation If a function of two variables can be approximated within a region of a stationary point by a quadratic function: h11 h12 x1 x1 1 f ( x1 , x2 ) 2 x1 x2 c1 c2 h12 h22 x2 x2 L O L O M P M P N Q NQ L O M P NQ h x h x h12 x1 x2 c1 x1 c2 x2 2 1 2 11 1 1 2 2 22 2 then the eigenvalues and eigenvectors of: h L H( x , x ) f ( x , x ) M h N * 1 * 2 2 * 1 11 * 2 12 h12 h22 O P Q can be used to interpret the nature of f(x1,x2) at: x1 x , x2 x * 1 * 2 54 They provide information on the shape of f(x1,x2) at x1 x , x2 x If H( x1* , x2* ) is +ve def, the eigenvectors are at right angles (orthogonal) and correspond to the principal axes of elliptical contours of f(x1,x2). * 1 * 2 A valley or ridge lies in the direction of the eigenvector associated with a relative small eigenvalue. These interpretations can be generalized to the multivariate quadratic approximation: f (x) x Hx c x 1 2 T T 55 Case 1: Equal Eigenvalues - circular contours which are interpreted as a circular hill (max)(-ve eigenvalues) or circular valley (min) (+ve eigenvalues) 2 1.5 1 0.5 x2 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 x1 0.5 1 1.5 2 56 57 Case 2: Unequal Eigenvalues (same sign) – elliptical contours which are interpreted as an elliptical hill (max)(-ve eigenvalues) or elliptical valley (min)(+ve eigenvalues) 2 1.5 1 0.5 X2 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 x1 0.5 1 1.5 2 58 59 Case 3: Eigenvalues of opposite sign but equal in magnitude - symmetrical saddle. 2 1.5 1 0.5 x2 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 x1 0.5 1 1.5 2 60 61 Case 4: Eigenvalues of opposite sign but unequal in magnitude - asymmetrical saddle. 2 1.5 1 0.5 x2 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 x1 0.5 1 1.5 62 2 63 Optimisation with Equality Constraints min f (x); x x n subject to: h(x) 0; m constraints (m n) Elimination of variables: example: min f (x) 4 x12 5x22 (a) s.t. 2 x1 3x2 6 (b) x1 , x2 Using (b) to eliminate x1 gives: x1 6 3 x2 2 (c) and substituting into (a) :- f ( x2 ) (6 3x2 ) 5x 2 2 642 At a stationary point f ( x2 ) 0 6(6 3x2 ) 10x2 0 x2 28x2 36 x 1286 . * 2 Then using (c): 6 3x x 1071 . 2 * 1 * 2 Hence, the stationary point (min) is: (1.071, 1.286) 65 The Lagrange Multiplier Method Consider a two variable problem with a single equality constraint: min f ( x1 , x2 ) x1 , x2 s. t. h( x1 , x2 ) 0 At a stationary point we may write: f f (a) df dx1 dx2 0 x1 x2 h h (b) dh dx1 dx2 0 x1 x2 LMf x M h M M Nx 1 1 O P L O L O P M P M P P N Q N Q P Q f 0 x2 dx1 h dx2 0 x2 66 If: f f x1 x2 0 nontrivial nonunique solutions h h for dx1 and dx2 will exist. x1 x2 This is achieved by setting h h f h f x1 x2 with h h x1 x1 x2 x1 x2 h x2 where is known as a Lagrange multiplier. 67 If an augmented objective function, called the Lagrangian is defined as: L( x1 , x2 , ) f ( x1 , x2 ) h( x1 , x2 ) we can solve the constrained optimisation problem by solving: U | V || W L f h 0 x1 x1 x1 provides equations (a) and (b) L f h 0 x2 x2 x2 L h( x1 , x2 ) 0 re-statement of equality constraint 68 Generalizing : To solve the problem: min f (x); x x n subject to: h(x) 0; m constraints (m n) define the Lagrangian: L(x, ) f (x) T h(x), m and the stationary point (points) is obtained from:- h ( x) x L(x, ) x f (x) 0 x L(x, ) h(x) 0 T 69 Example Consider the previous example again. The Lagrangian is:L 4 x 5x (2 x1 3x2 6) 2 1 2 2 L 8 x1 2 0 x1 L 10 x2 3 0 x2 L 2 x1 3x2 6 0 (a) (b) (c) Substituting (a) and (b) into (c) gives: 70 3 9 30 x1 , x2 6 0 4.281 4 10 2 10 7 15 90 Hence, x1 1071 . , x2 1286 . 14 70 which agrees with the previous result. 4 3 2 1 0 x2 -1 -2 -3 -4 -4 -3 -2 -1 0 x1 1 2 3 4 71 Necessary Conditions for a Local Extremum of an Optimisation Problem Subject to Equality and Inequality Constraints (Kuhn-Tucker Conditions) Consider the problem: min f (x); x s.t.: x n h(x) 0; m equalities (m n) g(x) 0; p inequalities Here, we define the Lagrangian: L(x, , ) f (x) T h(x) T g(x); m , p The necessary conditions for x* to be a local 72 extremum of f(x) are:- (a) f(x), hj(x), gj(x) are all twice differentiable at x* (b) The Lagrange multipliers exist (c) All constraints are satisfied at x* h(x* ) 0 hj (x* ) 0; g(x*) 0 g j (x* ) 0 (d) The Lagrange multipliers *j (at x*) for the inequality constraints are not negative, i.e. 0 * j (e) The binding (active) inequality constraints are zero, the inactive inequality constraints are > 0, and the associated j’s are 0 at x*, i.e. g j (x ) 0 * j * (f) The Lagrangian function is at a stationary point 73 x L(x, , ) 0 Notes: 1. Further analysis or investigation is required to determine if the extremum is a minimum (or maximum) 2. If f(x) is convex, h(x) are linear and g(x) are concave: x* will be a global extremum. 74 Limitations of Analytical Methods The computations needed to evaluate the above conditions can be extensive and intractable. Furthermore, the resulting simultaneous equations required for solving x*, * and * are often nonlinear and cannot be solved without resorting to numerical methods. The results may be inconclusive. For these reasons, we often have to resort to numerical methods for solving optimisation problems, using computer codes (e.g.. MATLAB) 75 Example Determine if the potential minimum x* = (1.00,4.90) satisfies the Kuhn Tucker conditions for the problem: min f (x) 4 x1 x22 12 x s. t. h1 (x) 25 x12 x22 0 g1 (x) 10 x1 x 10 x2 x 34 0 2 1 2 2 g2 (x) ( x1 3) 2 ( x2 1) 2 0 g3 (x) x1 2 g 4 ( x ) x2 0 76 contours and constraints 8 7 -60 6 -40 -50 -25 (1.00,4.90) ) 5 -20 -10 h1(x)=0 4 3 -30 -5 0 g1(x)=0 x2 5 2 10 1 0 x2=0 -1 -2 -2 0 2 4 x1 6 8 77 We test each Kuhn-Tucker condition in turn: (a) All functions are seen by inspection to be twice differentiable. (b) We assume the Lagrange multipliers exist (c) Are the constraints satisfied? h1: 25 (100 . ) (4.90) 0.01 0 yes 2 2 g1: 10(100 . ) (100 . ) 10(4.90) (4.90) 34 0.01 0 yes binding 2 2 2 g2 : (1.00-3) (4.90 1) 19.21 0 yes not active 2 g3: 1.00 -2 yes not active g4 : 4.90 0 yes not active 78 To test the rest of the conditions we need to determine the Lagrange multipliers using the stationarity conditions. First we note that from condition (e) we require: g j (x ) 0, j 12 , ,3,4 * j * can have any value because g1 (x ) 0 * 1 * must be zero because g2 (x ) 0 * 2 * must be zero because g3 (x ) 0 * 3 * must be zero because g4 (x ) 0 * 4 * 79 Now consider the stationarity condition: x L(x, , ) 0 where, since 2* 3* 0 L=4 x1 x22 12 1 (25 x12 x22 ) 1 (10 x1 x12 10 x2 x22 34) Hence: x1 L 4 2*1 x1* 1* (10 2 x1* ) 4 2*1 8 1* x2 L 2 x2* 2*1 x2* 1* (10 2 x2* ) 9.8 9.8*1 0.2 1* L O L O L O M P M P M P N Q NQN Q 2 8 *1 4 * * 1 1015 . , 1 = 0.754 * 9.8 0.2 1 9.8 80 Now we can check the remaining conditions: (d) Are *j 0, j 1,2,3,4 ? 1* 0.754, *2 *3 *4 0 Hence, the answer is yes (e) Are *j g j (x* ) 0, j 1,2,3,4 ? Yes, because we have already used this above. (f) Is the Lagrangian function at a stationary point? Yes, because we have already used this above. 81 Hence, all the Kuhn-Tucker conditions are satisfied and we can have confidence in the solution : x* = (1.00,4.90) 82