MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Civil and Environmental Engineering 1.731 Water Resource Systems Lecture 19,20 Real-Time Optimization, Dynamic Programming Nov. 14, 16, 2006 Real-time optimization Real-time optimization problems rely on decision rules that specify how decisions should maximize future benefit, given the current state of a system. State dependence provides a convenient way to deal with uncertainty. Some examples: • Reservoir releases – Decision rule specifies how current release should depend on current storage. Primary uncertainty is future reservoir inflow. • Water treatment – Decision rule specifies how current operating conditions (e.g. temperature or chemical inputs) should depend on current concentration in treatment tank. Primary uncertainty is future influent concentration. • Irrigation management - Decision rule specifies how current applied irrigation water should depend on current soil moisture and temperature. Primary uncertainties are future meteorological variables. Real-time optimization can be viewed as a feedback control process: Input It System Control State xt+1 x t +1 = g(x t , u t , It ) ut t+1→ t Time loop At each time step: • Observe state • Derive control from decision rule • Apply control. t = 0,..., T − 1 Decision rule ut(xt) State variables: Control variables: Input variables: Decision rule: xt ut It u t ( xt ) (decision variables, depend on controls and inputs) (decision variables, selected to maximize benefit) (inputs, assigned specified values) (function that gives ut for any xt) State equation: xt + 1 = g ( x t , u t , I t ) initial condition: x0 1 Dynamic Programming Dynamic programming provides a general framework for deriving decision rules. Most dynamic programming problems are divided into stages (e.g. time periods, spatial intervals, etc.): f1(u1 ,x1, x2) f0(u0, x0, x1) Stage 1 0 x0 u0 … Stage 2 1 x1 2 x2 u1 ft-1(ut-1, xt-1, xt) Stage t t-1 xt-1 ut-1 … fT-1(uT-1, xT-1, xT) Stage T t xt T-1 xT-1 uT-1 T XT VT(xT) I0 It-1 I1 IT-1 Benefit accrued over Stage t+1 is f t (ut , xt , xt +1 ) : Optimization problem: Select u t ,..., uT −1 that maximizes benefit-to-go (benefit accrued from current time t through terminal time T) at each time t: Max Ft ( xt ,..., xT −1 , ut ,..., uT −1 ) t = 0,…,T-1 u t ,...uT −1 where benefit-to-go at t is terminal benefit (salvage value) VT ( xT ) plus sum of benefits for stages t through T-1: : Ft ( xt ,..., xT −1 , u t ,..., uT −1 ) = T −1 ∑ f i (ui , xi , xi+1 ) + VT ( xT ) Benefit from remaining stages Terminal benefit i =t Benefit-to-go subject to: xi +1 = g t ( xi , u i , I i ) ; i = t ,..., T − 1 (state equation) and other constraints on the decision variables: {xt , ut } ∈ Γt , {xT } ∈ ΓT (decision variables lie within some set Γt at t = 0,…,T). Objective may be rewritten if we repeatedly apply state equation to write all xi ( i > t ) as functions of xt , ut ,..., uT −1 , I t ,..., I T −1 : Ft ( xt ,..., xT , ut ,..., uT −1 ) = Ft ( xt , u t ,..., uT −1 , I t ,..., I T −1 ) Decision rule ut ( xt ) at each t is obtained by finding sequence of controls u t ,..., uT −1 that maximizes Ft ( xt , u t ,..., uT −1 , I t ,..., I T −1 ) for a given state xt and a given set of specified inputs I t ,..., I T −1 . 2 Backward Recursion for Benefit-to-go Dynamic programming uses a recursion to construct decision rule ut ( xt ) : Define return function Vt ( xt ) to be maximum benefit-to-go from t through T: Vt ( xt ) = Max [Ft ( xt , u t ,..., uT −1 , I t ,..., I T −1 )] u t ,...,u T −1 Separate benefit term for Stage t+1: ⎡ ⎤ Vt ( xt ) = Max ⎢ f t (u t , xt , xt +1 ) + Ft +1 ( xt +1 , u t +1 ,..., uT −1 , I t +1 ,..., I T −1 )⎥ Max u t ⎣⎢ u t +1 ,...uT −1 ⎦⎥ Replace second term in brackets with definition for Vt +1 ( xt +1 ) : Vt ( xt ) = Max[ f t (u t , xt , xt +1 ) + Vt +1 ( xt +1 )] ut Substitute state equation for xt+1: Vt ( xt ) = Max{ f t [u t , xt , g t ( xt , u t , I t )] + Vt +1[ g t ( xt , ut , I t )]} ut This equation is a backward recursion for Vt ( xt ) , initialized with terminal benefit VT +1[ g ( xt , u t , I t )] . Expression in braces depends only on ut [which is varied to find the maximum], xt [the argument of Vt (xt)], and It [the specified input]. At each stage find the ut that maximizes Vt ( xt ) for all possible xt. Store the results (e.g. as a function or table) to obtain the desired decision rule ut ( xt ) . Example 1: Aqueduct Diversions Maximize benefits from water diverted from 2 locations along an aqueduct. Here the stages correspond to aqueduct sections rather than time (T = 2). State equation: xt +1 = xt − ut t = 0, 1, 2 Inflow x0 x2 x1 Outflow V2(x2) u0 u1 f0(u0) f1(u1) Diversions & benefits Here stage numbers refer to diversion index rather than time. Benefit at each stage depends only on control (diversion), not on state. Terminal benefit depends on system outflow. No input included in state equation. 3 Total benefit from diverted water and outflow x3 is: F ( x1 , u 0 , u1 ) = f 0 (u 0 ) + f1 (u1 ) + V2 ( x 2 ) The stage benefit and terminal benefit are: 1 f1 (u1 ) = u1 V2 ( x2 ) = x2 (1 − x2 ) f 0 (u 0 ) = u 0 2 Additional constraints are: u1 ≥ 0 u0 ≥ 0 0 ≤ x2 ≤ 1 Solve problem by proceeding backward, starting with last stage (t = 2): Stage 2: Find V1(x1) for specified x1: V1 ( x1 ) = Max[ f1 (u1 ) + V2 ( x 2 )] u1 = Max[ f1 (u1 ) + V2 ( x1 − u1 )] u1 = Max[u1 + ( x1 − u1 )(1 − x1 + u1 )] u1 [ = Max − u12 + 2 x1u1 + x1 (1 − x1 ) u1 ] Solution that satisfies constraints: u1 = x1 V1 ( x1 ) = x1 Stage 1: Maximize V0(x0) for specified x0: V0 ( x0 ) = Max[ f 0 (u 0 ) + V1 ( x1 )] u0 ⎡1 ⎤ = Max ⎢ u 0 + x0 − u 0 ⎥ u0 ⎣ 2 ⎦ Solution that satisfies constraints: u0 = 0 V0 ( x0 ) = x0 Solution specifies that all water is withdrawn from second diversion point, where it is most valuable. Discretization of Variables In more complex problems the states, controls and inputs are frequently discretized into a finite number of compatible levels. Simplest case: xt , u t , I t → xtk , u tj , I tl ; j, k, l = 1, 2,…, L where L = number of discrete levels. 4 Then the optimization of problem at each stage reduces to an exhaustive search through all feasible levels of utj to find the one that maximizes: f t [u tj , xtk , g t ( xtk , u tj , I tl )] + Vt +1[ g t ( xtk , u tj , I tl )] for each feasible xtk and a single specified input level I tl . This is discrete dynamic programming. Example 2: Reservoir Operations Maximize benefits from water released from reservoir with variable inflows. Stages correspond to 3 time periods (months, seasons, etc. T = 3). Inflow It Storage xt Release ut State equation: xt +1 = xt − ut + I t t = 0, 1, 2 Total benefit from released water and final storage x3: F ( x0 , u 0 ,..., u 2 ) = f 0 (u 0 ) + f1 (u1 ) + f 2 (u 2 ) + V3 ( x3 ) Discretize all variables into compatible levels: xt = {0,1,2) It = {0,1) t = 0, 1, 2 ut = {0,1,2} Inflows: I0 I1 1 0 Terminal (outflow) benefits: I2 1 V3(x3) = 0 for all x3 values Benefits for each release: ut f0(u0) f1(u1) 0 0 0 1 3 4 2 2 5 f2(u2) 0 1 3 Possible state transitions are derived from state equation, inputs, and permissible variable values: Benefit is shown in parentheses after each feasible control value. 5 Show the possible transitions with a diagram where each feasible state level xtk is a circle and each feasible control level utj is a line connecting circles: Return 10 Control 0(0) 1(3) 7 1(1) 3 2(2) 1(3) Control, benefit, and return values 1(4) 5 0(0) 1(1) 3 1(4) 0(0) 2(2) 0 0(0) 2(3) 0(0) 8 Benefit 0 0(0) 2(3) 2(5) 5 1(3) 1 Stage 1 0(0) 1(1) 1 Stage 2 0 Stage 3 Solve series of 3 optimization problems defined by recursion equation for t = 2, 1, 0. Start at last stage and move backward: Stage 3: Find V2(x2) for each level of x2: V2 ( x 2 ) = Max[ f 2 (u 2 ) + V3 ( x3 )] = Max[ f 2 (u 2 ) + V3 ( x 2 − u 2 + I 2 )] u2 u2 Identify optimum u2(x2) values for each possible x2, V3(x3) specified as an input: X2 0 1 2 u2(x2) f2(u2) + V3(x3) 0 0 + 0 =0 1 1 + 0 = 1 = V2(x2) Optimum 0 1 2 0 1 3 + 0 + 0 + 0 =0 =1 = 3 = V2(x2) Optimum 1 2 1 3 + + =1 = 3 = V2(x2) Optimum 0 0 6 Stage 2: Find V1(x1) for each level of x1: V1 ( x1 ) = Max[ f1 (u1 ) + V2 ( x 2 )] = Max[ f1 (u1 ) + V2 ( x1 − u1 + I1 )] u1 u1 Identify optimum u1(x1) value for each possible x1, obtain V2(x2) from Stage 2: x1 0 1 2 u1(x1) f1(u1) + V2(x2) 0 0 + 1 = 1 = V1(x1) Optimum 0 1 0 4 + + 3 1 =3 = 5 = V1(x1) Optimum 0 1 2 0 4 5 + + + 3 3 1 =4 = 7 = V1(x1) =6 Optimum Stage 1: Find V0(x0) for each level of x0: V0 ( x0 ) = Max[ f 0 (u 0 ) + V1 ( x1 )] = Max[ f 0 (u 0 ) + V1 ( x0 − u 0 + I 0 )] u0 u0 Identify optimum u0(x0) values for each x0, obtain V1(x1) from Stage 1: x1 u0(x0) f0(u0) + V1(x1) 0 0 1 0 3 + 5 + 1 = 5 = V0(x0) =4 Optimum 1 0 1 2 0 3 2 + 7 + 5 + 1 = 7 = V0(x0) =8 =3 Optimum 2 1 2 3 2 + 7 + 5 = 10 = V0(x0) =7 Optimum The optimum ut(xt) decision rules for t = 0, 1, 2 define a complete optimum decision strategy: x0 0 1 2 u0 0 1 1 x1 0 1 2 u1 0 1 1 x2 0 1 2 u2 1 2 2 7 10 1 7 3 0 1 8 1 5 3 0 1 0 5 2 1 Optimum controls and corresponding return values 2 1 0 Return function value 1 0 Optimal control Note that there is a path leaving every state value. The optimum paths give a strategy for maximizing benefit-to-go from t onward, for any value of state xt. Optimal benefit for each possible initial storage is V0(x0). Computational Effort The solution to the discretized optimization problem can be found by exhaustive enumeration (by comparing benefit-to-go for all possible ut ( xt ) combinations). Dynamic programming is much more efficient than enumeration since it divides the original T stage optimization problem into T smaller problems, one for each stage. To compare computational effort of enumeration and dynamic programming assume: • State dimension = M, Stages = T, Levels = L • Equal number of levels for ut and xt at every stage 2 • All possible state transitions are permissible (i.e. L transitions at each stage) Then total number of Vt evaluations required is: M(T+1) Exhaustive enumeration: L 2M Dynamic Programming: TL For M = 1, L = 10, T = 10 the number of Vt evaluations required is: Exhaustive enumeration: 10 11 3 Dynamic Programming: 10 8