Introduction to Optimization (Part 1) Daniel Kirschen Economic dispatch problem A B C L • Several generating units serving the load • What share of the load should each generating unit produce? • Consider the limits of the generating units • Ignore the limits of the network © 2011 D. Kirschen and University of Washington 2 Characteristics of the generating units T G J/h Input B Fuel (Input) Electric Power (Output) MW Pmin Pmax • Thermal generating units Output • Consider the running costs only • Input / Output curve – Fuel vs. electric power • Fuel consumption measured by its energy content • Upper and lower limit on output of the generating unit © 2011 D. Kirschen and University of Washington 3 Cost Curve • Multiply fuel input by fuel cost • No-load cost – Cost of keeping the unit running if it could produce zero MW Cost $/h No-load cost MW Pmax Pmin © 2011 D. Kirschen and University of Washington Output 4 Incremental Cost Curve • Incremental cost curve Cost [$/h] D FuelCost vs Power D Power • Derivative of the cost curve • In $/MWh • Cost of the next MWh ∆F ∆P MW Incremental Cost [$/MWh] MW © 2011 D. Kirschen and University of Washington 5 Mathematical formulation • Objective function C = CA (PA ) + CB (PB ) + CC (PC ) A B C L • Constraints – Load / Generation balance: L = PA + PB + PC – Unit Constraints: PAmin £ PA £ PAmax PBmin £ PB £ PBmax PCmin £ PC £ PCmax © 2011 D. Kirschen and University of Washington This is an optimization problem 6 Introduction to Optimization “An engineer can do with one dollar which any bungler can do with two” A. M. Wellington (1847-1895) © 2011 D. Kirschen and University of Washington 8 Objective • Most engineering activities have an objective: – Achieve the best possible design – Achieve the most economical operating conditions • This objective is usually quantifiable • Examples: – minimize cost of building a transformer – minimize cost of supplying power – minimize losses in a power system – maximize profit from a bidding strategy © 2011 D. Kirschen and University of Washington 9 Decision Variables • The value of the objective is a function of some decision variables: F = f ( x1 , x2 , x3 , .. xn ) • Examples of decision variables: – Dimensions of the transformer – Output of generating units, position of taps – Parameters of bids for selling electrical energy © 2011 D. Kirschen and University of Washington 10 Optimization Problem • What value should the decision variables take so that F = f ( x1 , x2 , x3 , .. xn ) is minimum or maximum? © 2011 D. Kirschen and University of Washington 11 Example: function of one variable f(x*) f(x) x* x f(x) is maximum for x = x* © 2011 D. Kirschen and University of Washington 12 Minimization and Maximization f(x*) f(x) x* x -f(x) -f(x*) If x = x* maximizes f(x) then it minimizes - f(x) © 2011 D. Kirschen and University of Washington 13 Minimization and Maximization • maximizing f(x) is thus the same thing as minimizing g(x) = -f(x) • Minimization and maximization problems are thus interchangeable • Depending on the problem, the optimum is either a maximum or a minimum © 2011 D. Kirschen and University of Washington 14 Necessary Condition for Optimality f(x*) f(x) df >0 dx df <0 dx x* x If x = x * maximises f ( x ) then: df f ( x ) < f ( x ) for x < x Þ > 0 for x < x * dx * * df f ( x ) < f ( x ) for x > x Þ < 0 for x > x * dx * © 2011 D. Kirschen and University of Washington * 15 Necessary Condition for Optimality df =0 dx f(x) x* x df If x = x maximises f ( x ) then = 0 for x = x * dx * © 2011 D. Kirschen and University of Washington 16 Example f(x) df For what values of x is =0? dx x In other words, for what values of x is the necessary condition for optimality satisfied? © 2011 D. Kirschen and University of Washington 17 Example f(x) • • • • A B C A, B, C, D are stationary points A and D are maxima B is a minimum C is an inflexion point © 2011 D. Kirschen and University of Washington D x 18 How can we distinguish minima and maxima? f(x) A B C D For x = A and x = D, we have: x d2 f dx 2 <0 The objective function is concave around a maximum © 2011 D. Kirschen and University of Washington 19 How can we distinguish minima and maxima? f(x) A B C For x = B we have: D d2 f dx 2 x >0 The objective function is convex around a minimum © 2011 D. Kirschen and University of Washington 20 How can we distinguish minima and maxima? f(x) A B C For x = C , we have: D d2 f dx 2 x =0 The objective function is flat around an inflexion point © 2011 D. Kirschen and University of Washington 21 Necessary and Sufficient Conditions of Optimality • Necessary condition: df = 0 dx • Sufficient condition: – For a maximum: d2 f <0 2 dx – For a minimum: d2 f >0 2 dx © 2011 D. Kirschen and University of Washington 22 Isn’t all this obvious? • Can’t we tell all this by looking at the objective function? – Yes, for a simple, one-dimensional case when we know the shape of the objective function – For complex, multi-dimensional cases (i.e. with many decision variables) we can’t visualize the shape of the objective function – We must then rely on mathematical techniques © 2011 D. Kirschen and University of Washington 23 Feasible Set • The values that the decision variables can take are usually limited • Examples: – Physical dimensions of a transformer must be positive – Active power output of a generator may be limited to a certain range (e.g. 200 MW to 500 MW) – Reactive power output of a generator may be limited to a certain range (e.g. -100 MVAr to 150 MVAr) © 2011 D. Kirschen and University of Washington 24 Feasible Set f(x) xMIN A D xMAX x Feasible Set The values of the objective function outside the feasible set do not matter © 2011 D. Kirschen and University of Washington 25 Interior and Boundary Solutions f(x) xMIN A • • • • © 2011 D. Kirschen and University of Washington B A and D are interior maxima B and E are interior minima XMIN is a boundary minimum XMAX is a boundary maximum D E xMAX x Do not satisfy the Optimality conditions! 26 Two-Dimensional Case f(x1,x2) x1* x2 * x2 © 2011 D. Kirschen and University of Washington x1 f(x1,x2) is minimum for x1*, x2* 27 Necessary Conditions for Optimality f(x1,x2) ¶f ( x 1 ,x 2 ) ¶x 1 =0 x * ,x * 1 2 ¶f ( x 1 ,x 2 ) ¶x 2 =0 x * ,x * 1 2 x1* x2* x1 x2 © 2011 D. Kirschen and University of Washington 28 Multi-Dimensional Case At a maximum or minimum value of f ( x1 , x2 , x3 , .. xn ) we must have: ¶f =0 ¶x 1 ¶f =0 ¶x 2 ¶f =0 ¶x n A point where these conditions are satisfied is called a stationary point © 2011 D. Kirschen and University of Washington 29 Sufficient Conditions for Optimality f(x1,x2) minimum maximum x1 x2 © 2011 D. Kirschen and University of Washington 30 Sufficient Conditions for Optimality f(x1,x2) Saddle point x1 x2 © 2011 D. Kirschen and University of Washington 31 Sufficient Conditions for Optimality Calculate the Hessian matrix at the stationary point: æ ¶2 f ç ¶x 2 1 ç ç ¶2 f ç ç ¶x 2 ¶x 1 ç ç ¶2 f çç è ¶x n ¶x 1 © 2011 D. Kirschen and University of Washington ¶2 f ¶x 1 ¶x 2 ¶2 f ¶x 22 ¶2 f ¶x n ¶x 2 ¶2 f ö ¶x 1 ¶x n ÷ ÷ 2 ¶ f ÷ ÷ ¶x 2 ¶x n ÷ ÷ ¶ 2 f ÷÷ ÷ ¶x n2 ø 32 Sufficient Conditions for Optimality • Calculate the eigenvalues of the Hessian matrix at the stationary point • If all the eigenvalues are greater or equal to zero: – The matrix is positive semi-definite – The stationary point is a minimum • If all the eigenvalues are less or equal to zero: – The matrix is negative semi-definite – The stationary point is a maximum • If some or the eigenvalues are positive and other are negative: – The stationary point is a saddle point © 2011 D. Kirschen and University of Washington 33 Contours f(x1,x2) F2 F1 x1 F1 F2 x2 © 2011 D. Kirschen and University of Washington 34 Contours A contour is the locus of all the point that give the same value to the objective function x2 © 2011 D. Kirschen and University of Washington Minimum or maximum x1 35 Example 1 Minimise C = x 12 + 4 x 22 - 2 x 1 x 2 Necessary conditions for optimality: ¶C = 2 x1 - 2 x 2 = 0 ¶x 1 ¶C = -2 x 1 + 8 x 2 = 0 ¶x 2 © 2011 D. Kirschen and University of Washington ì x1 = 0 í îx 2 = 0 is a stationary point 36 Example 1 Sufficient conditions for optimality: æ ¶ 2C ç ¶x 2 1 ç Hessian Matrix: ç ¶ 2C ç è ¶x 2 ¶x 1 ¶ 2C ö ¶x 1 ¶x 2 ÷ æ 2 -2 ö ÷ =ç ÷ 2 ¶ C ÷ è -2 8 ø ÷ ¶x 22 ø must be positive definite (i.e. all eigenvalues must be positive) l-2 2 2 l-8 = 0 Þ l 2 - 10 l + 12 = 0 10 ± 52 Þl = ³0 2 © 2011 D. Kirschen and University of Washington The stationary point is a minimum 37 Example 1 x2 C=9 C=1 C=4 x1 Minimum: C=0 © 2011 D. Kirschen and University of Washington 38 Example 2 Minimize C = -x12 + 3x22 + 2x1 x2 Necessary conditions for optimality: ¶C = -2x1 + 2x2 = 0 ¶x1 ì x1 = 0 í ¶C îx 2 = 0 = 2x1 + 6x2 = 0 ¶x2 © 2011 D. Kirschen and University of Washington is a stationary point 39 Example 2 Sufficient conditions for optimality: æ ¶ 2C ç ¶x 2 1 ç Hessian Matrix: ç ¶ 2C ç è ¶x 2 ¶x 1 l+2 -2 -2 l-6 = 0 Þ l2 - 4 l - 8 = 0 4 + 80 Þl = >0 2 or l = 4- ¶ 2C ö ¶x 1 ¶x 2 ÷ æ -2 2 ö ÷ =ç ÷ 2 è 2 6 ø ÷ ¶ C ÷ 2 ¶x 2 ø 80 2 © 2011 D. Kirschen and University of Washington <0 The stationary point is a saddle point 40 Example 2 x2 C=0 C=9 C=4 C=1 C=-9 C=-4 C=-4 C=-1 C=-9 x1 C=1 C=4 C=9 This stationary point is a saddle point © 2011 D. Kirschen and University of Washington C=0 41 Optimization with Constraints Optimization with Equality Constraints • There are usually restrictions on the values that the decision variables can take Minimise f ( x1 , x2 ,.. , xn ) Objective function subject to: w 1 ( x1 , x2 ,.. , xn ) = 0 Equality constraints w m ( x1 , x2 ,.. , xn ) = 0 © 2011 D. Kirschen and University of Washington 43 Number of Constraints • N decision variables • M equality constraints • If M > N, the problems is over-constrained – There is usually no solution • If M = N, the problem is determined – There may be a solution • If M < N, the problem is under-constrained – There is usually room for optimization © 2011 D. Kirschen and University of Washington 44 Example 1 Minimise f ( x 1 , x 2 ) = 0.25 x + x 2 1 2 2 Subject to w ( x 1 , x 2 ) º 5 - x 1 - x 2 = 0 x2 w ( x1 , x 2 ) º 5 - x1 - x 2 = 0 Minimum f ( x 1 , x 2 ) = 0.25 x + x 2 1 © 2011 D. Kirschen and University of Washington 2 2 x1 45 Example 2: Economic Dispatch x1 G1 x2 G2 L C 1 = a1 + b1 x 12 Cost of running unit 1 C 2 = a 2 + b 2 x 22 Cost of running unit 2 C = C 1 + C 2 = a1 + a 2 + b1 x 12 + b 2 x 22 Total cost Optimization problem: Minimise C = a1 + a 2 + b1 x 12 + b 2 x 22 Subject to: x 1 + x 2 = L © 2011 D. Kirschen and University of Washington 46 Solution by substitution Minimise C = a1 + a 2 + b1 x 12 + b 2 x 22 Subject to: x 1 + x 2 = L Þ x 2 = L - x1 Þ C = a1 + a 2 + b1 x 12 + b 2 ( L - x 1 ) 2 Unconstrained minimization dC = 2 b1 x 1 - 2 b 2 ( L - x 1 ) = 0 dx 1 b2 L Þ x1 = b1 + b 2 d 2C © 2011 D. Kirschen and University of Washington dx 2 1 b1 L ö æ çÞ x2 = ÷ è b1 + b 2 ø = 2b1 + 2 b 2 > 0 Þ minimum 47 Solution by substitution • Difficult • Usually impossible when constraints are nonlinear • Provides little or no insight into solution • Solution using Lagrange multipliers © 2011 D. Kirschen and University of Washington 48 Gradient Consider a function f (x1 , x2 ,.. , xn ) æ ç ç ç The gradient of f is the vector Ñf = ç ç ç ç ç çè © 2011 D. Kirschen and University of Washington ¶f ö ¶x1 ÷ ÷ ¶f ÷ ¶x2 ÷÷ ÷ ¶f ÷ ÷ ¶xn ÷ø 49 Properties of the Gradient • Each component of the gradient vector indicates the rate of change of the function in that direction • The gradient indicates the direction in which a function of several variables increases most rapidly • The magnitude and direction of the gradient usually depend on the point considered • At each point, the gradient is perpendicular to the contour of the function © 2011 D. Kirschen and University of Washington 50 Example 3 f ( x , y ) = ax 2 + by 2 æ ¶f ö ç ¶x ÷ æ 2 ax ö Ñf = ç ÷ = ç ÷ ¶f ç ÷ è 2 by ø è ¶y ø y B C A x D © 2011 D. Kirschen and University of Washington 51 Example 4 f ( x , y ) = ax + by æ ¶f ö ç ¶x ÷ æ a ö Ñf = ç ÷ = ç ÷ ¶f ç ÷ è bø è ¶y ø f = f3 f = f2 y f = f1 Ñf Ñf Ñf x © 2011 D. Kirschen and University of Washington 52 Lagrange multipliers Minimise f ( x 1 , x 2 ) = 0.25 x 12 + x 22 subject to w ( x 1 , x 2 ) º 5 - x 1 - x 2 = 0 w ( x1 , x 2 ) = 5 - x1 - x 2 f = 0.25 x 12 + x 22 = 6 f = 0.25 x 12 + x 22 = 5 © 2011 D. Kirschen and University of Washington 53 Lagrange multipliers Ñf æ ¶f ç ¶x 1 Ñf = ç ç ¶f ç è ¶x 2 ö ÷ ÷ ÷ ÷ ø f ( x1 , x 2 ) = 6 f ( x1 , x 2 ) = 5 Ñf Ñf © 2011 D. Kirschen and University of Washington 54 Lagrange multipliers æ ¶w ç ¶x 1 Ñw = ç çç ¶w è ¶x 2 w ( x1 , x 2 ) ö ÷ ÷ ÷÷ ø f ( x1 , x 2 ) = 6 f ( x1 , x 2 ) = 5 Ñw Ñw © 2011 D. Kirschen and University of Washington Ñw 55 Lagrange multipliers The solution must be on the constraint To reduce the value of f, we must move in a direction opposite to the gradient w ( x1 , x 2 ) f ( x1 , x 2 ) = 6 f ( x1 , x 2 ) = 5 Ñf A ? Ñf B © 2011 D. Kirschen and University of Washington 56 Lagrange multipliers • We stop when the gradient of the function is perpendicular to the constraint because moving further would increase the value of the function w ( x1 , x 2 ) f ( x1 , x 2 ) = 6 f ( x1 , x 2 ) = 5 Ñf A Ñf Ñw At the optimum, the gradient of the function is parallel to the gradient of the constraint © 2011 D. Kirschen and University of Washington C Ñf Ñw B Ñw 57 Lagrange multipliers At the optimum, we must have: Ñf Ñw Which can be expressed as: Ñf + l Ñw = 0 In terms of the co-ordinates: ¶f ¶w +l =0 ¶x 1 ¶x 1 ¶f ¶w +l =0 ¶x 2 ¶x 2 The constraint must also be satisfied: w ( x 1 , x 2 ) = 0 l is called the Lagrange multiplier © 2011 D. Kirschen and University of Washington 58 Lagrangian function To simplify the writing of the conditions for optimality, it is useful to define the Lagrangian function: ( x 1 , x 2 ,l ) = f ( x 1 , x 2 ) + lw ( x 1 , x 2 ) The necessary conditions for optimality are then given by the partial derivatives of the Lagrangian: ¶ ( x 1 , x 2 ,l ) ¶x 1 ¶ ( x 1 , x 2 ,l ) ¶x 2 ¶ © 2011 D. Kirschen and University of Washington ( x 1 , x 2 ,l ) ¶l ¶f ¶w = +l =0 ¶x 1 ¶x 1 ¶f ¶w = +l =0 ¶x 2 ¶x 2 = w ( x1 ,x 2 ) = 0 59 Example Minimise f ( x 1 , x 2 ) = 0.25 x 12 + x 22 subject to w ( x 1 , x 2 ) º 5 - x 1 - x 2 = 0 ( x 1 , x 2 , l ) = 0.25 x 12 + x 22 + l ( 5 - x 1 - x 2 ) ¶ ( x 1 , x 2 ,l ) ¶x 1 ¶ ( x 1 , x 2 ,l ) ¶x 2 ¶ ( x 1 , x 2 ,l ) ¶l º 0.5 x 1 - l = 0 º 2x2 -l = 0 º 5 - x1 - x 2 = 0 © 2011 D. Kirschen and University of Washington 60 Example ¶ ( x 1 , x 2 ,l ) ¶x 1 ¶ ( x 1 , x 2 ,l ) ¶x 2 ¶ ( x 1 , x 2 ,l ) ¶l º 0.5 x 1 - l = 0 Þ x1 = 2 l º 2x2 -l = 0 1 Þ x2 = l 2 º 5 - x1 - x 2 = 0 1 Þ 5 - 2l - l = 0 2 Þl=2 Þ x1 = 4 Þ x2 =1 © 2011 D. Kirschen and University of Washington 61 Example Minimise f ( x 1 , x 2 ) = 0.25 x + x 2 1 2 2 Subject to w ( x 1 , x 2 ) º 5 - x 1 - x 2 = 0 x2 w ( x1 , x 2 ) º 5 - x1 - x 2 = 0 1 f ( x1 , x 2 ) = 5 Minimum 4 © 2011 D. Kirschen and University of Washington x1 62 Important Note! If the constraint is of the form: ax 1 + bx 2 = L It must be included in the Lagrangian as follows: = f ( x1 ,.. , xn ) + l ( L - ax1 - bx2 ) And not as follows: = f ( x1 ,.. , xn ) + l ( ax1 + bx2 ) © 2011 D. Kirschen and University of Washington 63 Application to Economic Dispatch x1 G1 x2 G2 L minimise f ( x 1 , x 2 ) = C 1 ( x 1 ) + C 2 ( x 2 ) s.t . w ( x 1 , x 2 ) º L - x 1 - x 2 = 0 ( x 1 , x 2 , l ) = C1 ( x 1 ) + C 2 ( x 2 ) + l ( L - x 1 - x 2 ) dC 1 ¶ º -l =0 ¶x 1 dx 1 dC 2 ¶ º -l =0 ¶x 2 dx 2 ¶ º L - x1 - x 2 = 0 ¶l © 2011 D. Kirschen and University of Washington dC 1 dx 1 = dC 2 dx 2 =l Equal incremental cost solution 64 Equal incremental cost solution Cost curves: C1 ( x 1 ) C2 ( x 2 ) x1 Incremental cost curves: x2 dC 1 dC 2 dx 1 dx 2 © 2011 D. Kirschen and University of Washington x1 x2 65 Interpretation of this solution dC 1 dC 2 dx 1 dx 2 l x1* L x1 x2* - - + L-x -x * 1 © 2011 D. Kirschen and University of Washington x2 * 2 If < 0, reduce λ If > 0, increase λ 66 Physical interpretation DC = lim dx Dx®0 Dx dC C( x ) Dx DC x For Dx sufficiently small: dC DC » ´ Dx dx If Dx = 1 MW : dC DC » dx dC(x) dx x © 2011 D. Kirschen and University of Washington The incremental cost is the cost of one additional MW for one hour. This cost depends on the output of the generator. 67 Physical interpretation dC 1 dx 1 dC 2 dx 2 : Cost of one more MW from unit 1 : Cost of one more MW from unit 2 Suppose that dC 1 dx 1 > dC 2 dx 2 Decrease output of unit 1 by 1MW Þ decrease in cost = Increase output of unit 2 by 1MW Þ increase in cost = dC 2 dC 1 Net change in cost = <0 dx 2 dx 1 © 2011 D. Kirschen and University of Washington dC 1 dx 1 dC 2 dx 2 68 Physical interpretation It pays to increase the output of unit 2 and decrease the output of unit 1 until we have: dC 1 dx 1 = dC 2 dx 2 =l The Lagrange multiplier λ is thus the cost of one more MW at the optimal solution. This is a very important result with many applications in economics. © 2011 D. Kirschen and University of Washington 69 Generalization Minimise f ( x1 , x2 ,.. , xn ) subject to: w 1 ( x1 , x2 ,.. , xn ) = 0 w m ( x1 , x2 ,.. , xn ) = 0 Lagrangian: = f ( x1 ,.. , xn ) + l1w 1 ( x1,.. , xn ) + + lmw m ( x1 ,.. , xn ) • One Lagrange multiplier for each constraint • n + m variables: x1, …, xn and λ1, …, λm © 2011 D. Kirschen and University of Washington 70 Optimality conditions = f ( x1,.. , xn ) + l1w 1 ( x1,.. , xn ) + ¶w 1 ¶f ¶ = + l1 + ¶x 1 ¶x 1 ¶x 1 + lmw m ( x1 ,.. , xn ) ¶w m +lm =0 ¶x 1 n equations ¶w 1 ¶f ¶ = + l1 + ¶x n ¶x n ¶x n +lm ¶w m ¶x n =0 ¶ = w1 ( x1 , ,x n ) = 0 ¶l 1 m equations ¶ = w m ( x1 , ,x n ) = 0 ¶l m © 2011 D. Kirschen and University of Washington n + m equations in n + m variables 71