MASSACHUSETTS INSTITUTE OF TECHNOLOGY

advertisement
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Department of Civil and Environmental Engineering
1.731 Water Resource Systems
Lecture 19,20 Real-Time Optimization, Dynamic Programming
Nov. 14, 16, 2006
Real-time optimization
Real-time optimization problems rely on decision rules that specify how decisions should
maximize future benefit, given the current state of a system. State dependence provides a
convenient way to deal with uncertainty. Some examples:
• Reservoir releases – Decision rule specifies how current release should depend on current
storage. Primary uncertainty is future reservoir inflow.
• Water treatment – Decision rule specifies how current operating conditions (e.g. temperature
or chemical inputs) should depend on current concentration in treatment tank. Primary
uncertainty is future influent concentration.
• Irrigation management - Decision rule specifies how current applied irrigation water should
depend on current soil moisture and temperature. Primary uncertainties are future
meteorological variables.
Real-time optimization can be viewed as a feedback control process:
Input It
System
Control
State xt+1
x t +1 = g(x t , u t , It )
ut
t+1→ t
Time loop
At each time step:
• Observe state
• Derive control from decision rule
• Apply control.
t = 0,..., T − 1
Decision rule
ut(xt)
State variables:
Control variables:
Input variables:
Decision rule:
xt
ut
It
u t ( xt )
(decision variables, depend on controls and inputs)
(decision variables, selected to maximize benefit)
(inputs, assigned specified values)
(function that gives ut for any xt)
State equation:
xt + 1 = g ( x t , u t , I t )
initial condition: x0
1
Dynamic Programming
Dynamic programming provides a general framework for deriving decision rules. Most dynamic
programming problems are divided into stages (e.g. time periods, spatial intervals, etc.):
f1(u1 ,x1, x2)
f0(u0, x0, x1)
Stage 1
0
x0
u0
…
Stage 2
1
x1
2
x2
u1
ft-1(ut-1, xt-1, xt)
Stage t
t-1
xt-1
ut-1
…
fT-1(uT-1, xT-1, xT)
Stage T
t
xt
T-1
xT-1
uT-1
T
XT
VT(xT)
I0
It-1
I1
IT-1
Benefit accrued over Stage t+1 is f t (ut , xt , xt +1 ) :
Optimization problem:
Select u t ,..., uT −1 that maximizes benefit-to-go (benefit accrued from current time t through
terminal time T) at each time t:
Max Ft ( xt ,..., xT −1 , ut ,..., uT −1 ) t = 0,…,T-1
u t ,...uT −1
where benefit-to-go at t is terminal benefit (salvage value) VT ( xT ) plus sum of benefits for
stages t through T-1:
: Ft ( xt ,..., xT −1 , u t ,..., uT −1 ) =
T −1
∑ f i (ui , xi , xi+1 )
+ VT ( xT )
Benefit from
remaining stages
Terminal
benefit
i =t
Benefit-to-go
subject to:
xi +1 = g t ( xi , u i , I i ) ; i = t ,..., T − 1 (state equation)
and other constraints on the decision variables:
{xt , ut } ∈ Γt , {xT } ∈ ΓT (decision variables lie within some set Γt at t = 0,…,T).
Objective may be rewritten if we repeatedly apply state equation to write all xi ( i > t ) as
functions of xt , ut ,..., uT −1 , I t ,..., I T −1 :
Ft ( xt ,..., xT , ut ,..., uT −1 ) = Ft ( xt , u t ,..., uT −1 , I t ,..., I T −1 )
Decision rule ut ( xt ) at each t is obtained by finding sequence of controls u t ,..., uT −1 that
maximizes Ft ( xt , u t ,..., uT −1 , I t ,..., I T −1 ) for a given state xt and a given set of specified inputs
I t ,..., I T −1 .
2
Backward Recursion for Benefit-to-go
Dynamic programming uses a recursion to construct decision rule ut ( xt ) :
Define return function Vt ( xt ) to be maximum benefit-to-go from t through T:
Vt ( xt ) = Max [Ft ( xt , u t ,..., uT −1 , I t ,..., I T −1 )]
u t ,...,u T −1
Separate benefit term for Stage t+1:
⎡
⎤
Vt ( xt ) = Max ⎢ f t (u t , xt , xt +1 ) +
Ft +1 ( xt +1 , u t +1 ,..., uT −1 , I t +1 ,..., I T −1 )⎥
Max
u t ⎣⎢
u t +1 ,...uT −1
⎦⎥
Replace second term in brackets with definition for Vt +1 ( xt +1 ) :
Vt ( xt ) = Max[ f t (u t , xt , xt +1 ) + Vt +1 ( xt +1 )]
ut
Substitute state equation for xt+1:
Vt ( xt ) = Max{ f t [u t , xt , g t ( xt , u t , I t )] + Vt +1[ g t ( xt , ut , I t )]}
ut
This equation is a backward recursion for Vt ( xt ) , initialized with terminal benefit
VT +1[ g ( xt , u t , I t )] . Expression in braces depends only on ut [which is varied to find the
maximum], xt [the argument of Vt (xt)], and It [the specified input].
At each stage find the ut that maximizes Vt ( xt ) for all possible xt.
Store the results (e.g. as a function or table) to obtain the desired decision rule ut ( xt ) .
Example 1: Aqueduct Diversions
Maximize benefits from water diverted from 2 locations along an aqueduct.
Here the stages correspond to aqueduct sections rather than time (T = 2).
State equation:
xt +1 = xt − ut
t = 0, 1, 2
Inflow
x0
x2
x1
Outflow
V2(x2)
u0
u1
f0(u0)
f1(u1)
Diversions & benefits
Here stage numbers refer to diversion index rather than time.
Benefit at each stage depends only on control (diversion), not on state.
Terminal benefit depends on system outflow. No input included in state equation.
3
Total benefit from diverted water and outflow x3 is:
F ( x1 , u 0 , u1 ) = f 0 (u 0 ) + f1 (u1 ) + V2 ( x 2 )
The stage benefit and terminal benefit are:
1
f1 (u1 ) = u1
V2 ( x2 ) = x2 (1 − x2 )
f 0 (u 0 ) = u 0
2
Additional constraints are:
u1 ≥ 0
u0 ≥ 0
0 ≤ x2 ≤ 1
Solve problem by proceeding backward, starting with last stage (t = 2):
Stage 2: Find V1(x1) for specified x1:
V1 ( x1 ) = Max[ f1 (u1 ) + V2 ( x 2 )]
u1
= Max[ f1 (u1 ) + V2 ( x1 − u1 )]
u1
= Max[u1 + ( x1 − u1 )(1 − x1 + u1 )]
u1
[
= Max − u12 + 2 x1u1 + x1 (1 − x1 )
u1
]
Solution that satisfies constraints:
u1 = x1
V1 ( x1 ) = x1
Stage 1: Maximize V0(x0) for specified x0:
V0 ( x0 ) = Max[ f 0 (u 0 ) + V1 ( x1 )]
u0
⎡1
⎤
= Max ⎢ u 0 + x0 − u 0 ⎥
u0 ⎣ 2
⎦
Solution that satisfies constraints:
u0 = 0
V0 ( x0 ) = x0
Solution specifies that all water is withdrawn from second diversion point, where it is most
valuable.
Discretization of Variables
In more complex problems the states, controls and inputs are frequently discretized into a finite
number of compatible levels.
Simplest case:
xt , u t , I t
→ xtk , u tj , I tl ; j, k, l = 1, 2,…, L where L = number of discrete levels.
4
Then the optimization of problem at each stage reduces to an exhaustive search through all
feasible levels of utj to find the one that maximizes:
f t [u tj , xtk , g t ( xtk , u tj , I tl )] + Vt +1[ g t ( xtk , u tj , I tl )]
for each feasible xtk and a single specified input level I tl .
This is discrete dynamic programming.
Example 2: Reservoir Operations
Maximize benefits from water released from reservoir with variable inflows.
Stages correspond to 3 time periods (months, seasons, etc. T = 3).
Inflow It
Storage xt
Release ut
State equation:
xt +1 = xt − ut + I t
t = 0, 1, 2
Total benefit from released water and final storage x3:
F ( x0 , u 0 ,..., u 2 ) = f 0 (u 0 ) + f1 (u1 ) + f 2 (u 2 ) + V3 ( x3 )
Discretize all variables into compatible levels:
xt = {0,1,2) It = {0,1) t = 0, 1, 2
ut = {0,1,2}
Inflows:
I0
I1
1
0
Terminal (outflow) benefits:
I2
1
V3(x3) = 0 for all x3 values
Benefits for each release:
ut
f0(u0) f1(u1)
0
0
0
1
3
4
2
2
5
f2(u2)
0
1
3
Possible state transitions are derived from state equation, inputs, and permissible variable values:
Benefit is shown in parentheses after each feasible control value.
5
Show the possible transitions with a diagram where each feasible state level xtk is a circle and
each feasible control level utj is a line connecting circles:
Return
10
Control
0(0)
1(3)
7
1(1)
3
2(2)
1(3)
Control, benefit, and
return values
1(4)
5
0(0)
1(1)
3
1(4)
0(0)
2(2)
0
0(0)
2(3)
0(0)
8
Benefit
0
0(0)
2(3)
2(5)
5
1(3)
1
Stage 1
0(0)
1(1)
1
Stage 2
0
Stage 3
Solve series of 3 optimization problems defined by recursion equation for t = 2, 1, 0.
Start at last stage and move backward:
Stage 3: Find V2(x2) for each level of x2:
V2 ( x 2 ) = Max[ f 2 (u 2 ) + V3 ( x3 )] = Max[ f 2 (u 2 ) + V3 ( x 2 − u 2 + I 2 )]
u2
u2
Identify optimum u2(x2) values for each possible x2, V3(x3) specified as an input:
X2
0
1
2
u2(x2) f2(u2) + V3(x3)
0
0 + 0
=0
1
1 + 0
= 1 = V2(x2)
Optimum
0
1
2
0
1
3
+ 0
+ 0
+ 0
=0
=1
= 3 = V2(x2)
Optimum
1
2
1
3
+
+
=1
= 3 = V2(x2)
Optimum
0
0
6
Stage 2: Find V1(x1) for each level of x1:
V1 ( x1 ) = Max[ f1 (u1 ) + V2 ( x 2 )] = Max[ f1 (u1 ) + V2 ( x1 − u1 + I1 )]
u1
u1
Identify optimum u1(x1) value for each possible x1, obtain V2(x2) from Stage 2:
x1
0
1
2
u1(x1) f1(u1) + V2(x2)
0
0 + 1
= 1 = V1(x1)
Optimum
0
1
0
4
+
+
3
1
=3
= 5 = V1(x1)
Optimum
0
1
2
0
4
5
+
+
+
3
3
1
=4
= 7 = V1(x1)
=6
Optimum
Stage 1: Find V0(x0) for each level of x0:
V0 ( x0 ) = Max[ f 0 (u 0 ) + V1 ( x1 )] = Max[ f 0 (u 0 ) + V1 ( x0 − u 0 + I 0 )]
u0
u0
Identify optimum u0(x0) values for each x0, obtain V1(x1) from Stage 1:
x1
u0(x0)
f0(u0) + V1(x1)
0
0
1
0
3
+ 5
+ 1
= 5 = V0(x0)
=4
Optimum
1
0
1
2
0
3
2
+ 7
+ 5
+ 1
= 7 = V0(x0)
=8
=3
Optimum
2
1
2
3
2
+ 7
+ 5
= 10 = V0(x0)
=7
Optimum
The optimum ut(xt) decision rules for t = 0, 1, 2 define a complete optimum decision strategy:
x0
0
1
2
u0
0
1
1
x1
0
1
2
u1
0
1
1
x2
0
1
2
u2
1
2
2
7
10
1
7
3
0
1
8
1
5
3
0
1
0
5
2
1
Optimum controls
and corresponding
return values
2
1
0
Return function value
1
0
Optimal
control
Note that there is a path leaving every state value. The optimum paths give a strategy for
maximizing benefit-to-go from t onward, for any value of state xt.
Optimal benefit for each possible initial storage is V0(x0).
Computational Effort
The solution to the discretized optimization problem can be found by exhaustive enumeration
(by comparing benefit-to-go for all possible ut ( xt ) combinations).
Dynamic programming is much more efficient than enumeration since it divides the original T
stage optimization problem into T smaller problems, one for each stage.
To compare computational effort of enumeration and dynamic programming assume:
• State dimension = M, Stages = T, Levels = L
• Equal number of levels for ut and xt at every stage
2
•
All possible state transitions are permissible (i.e. L transitions at each stage)
Then total number of Vt evaluations required is:
M(T+1)
Exhaustive enumeration: L
2M
Dynamic Programming: TL
For M = 1, L = 10, T = 10 the number of Vt evaluations required is:
Exhaustive enumeration: 10
11
3
Dynamic Programming: 10
8
Download