DP Examples (ver. Power Point 2000)

advertisement
DP can give complete quantitative solution
Example 1: Discrete, finite capacity, inventory control problem
• Sk = Ck = Dk = {0, 1, 2}
• xk + uk  2 : finite capacity
• xk+1 = max(0, xk + uk – wk )
no backlogging
• xk + uk  2  uk  2 – xk
U(xk)={0,…,2-xk)
• Prob{wk=0}=0.1, Prob{wk=1}=0.7, Prob{wk=2}=0.2
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
1
DP can give complete quantitative solution
Example 1 continued: Inventory control problem
• N=3
• gn(xn) = 0
• gk(xk, uk, wk) = uk + 1∙max(0, xk + uk – wk) + 3∙max(0, wk + xk – uk)
order
holding
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
lost demand
2
DP can give closed-form solution
Example 2: A gambling model
A gambler is going to bet in N successive plays. The gambler can
bet any (nonnegative) amount up to his present fortune. What
betting strategy maximizes his final fortune?
P(lose) = p, P(win) = 1 – p = q
: Bernoulli
Solution: For convenience, and with no loss in generality, we look
to maximize the log of the final fortune. The model is as
follows.
• Utility of fortune  1 / wealth
 U(x) = log(x)
: also Bernoulli!
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
3
DP can give closed-form solution
Example 2 continued: Variable definitions
• xk = fortune at beginning of kth play (after outcome of (k – 1)th
play, before kth)
• uk = bet for kth play as a percentage of xk
• wk =
1
-1
: win w.p. p
: lose w.p. q = 1 – p
0kN–1
• gk(xk, uk, wk) = 0,
• gN(xN) = -log(xN)
to maximize
• xk+1 = xk + wk uk xk
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
4
DP can give closed-form solution
Example 2 continued: DP algorithm for the problem
J N ( x N )  log( x N )

J k ( xk )  max E{0  J k 1 ( xk 1 )}
0  u k 1 wk
 max E{J k 1 ( xk  wk uk xk )}
0  u k 1 wk
 max { p  J k 1 ( xk  uk xk )  q  J k 1 ( xk  uk xk )}
0  u k 1
k  0, 1, ..., N  1
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
5
DP can give closed-form solution
Example 2 continued: Solving the DP at k=N-1
Thus,
J N 1 ( xN 1 )  max { p  log( xN 1  u N 1 xN 1 )  q  log( xN 1  u N 1 xN 1 )}
0  u N 1 1
 max { p  log( 1  u N 1 )  q  log( 1  u N 1 )  log( xN 1 )}
0  u N 1  1
( p  q)  u N 1

p
q



0
2
u N 1
1  u N 1 1  u N 1
1  u N 1
 u *N 1  p  q
 2 p 1
: feasible if 0  p  q  1
 0  2 p 1  1
1 2  p 1
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
6
DP can give closed-form solution
Example 2 continued: Solving the DP at k=N-1
Thus,
J N 1 ( xN 1 )  max { p  log( xN 1  u N 1 xN 1 )  q  log( xN 1  u N 1 xN 1 )}
0  u N 1 1
 max { p  log( 1  u N 1 )  q  log( 1  u N 1 )  log( xN 1 )}
0  u N 1  1
: consider uN-1 = 1 separately
if p = 1 (q = 0)  u*N-1 = 1 : bet it all!  u*N-1 = p – q
if 0 ≤ p < ½, then u*N-1 = 0
(p < q  q  log(1 – uN-1) dominates)
 p  log(1 + uN-1)+ q  log(1 – uN-1)< q log(1 – u2N-1) ≤ 0
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
7
DP can give closed-form solution
8
Example 2 continued: Closed-form solution for k=N-1
Hence,
p  log( 2 p )  q  log( 2  2 p)  log x N 1
 p  log p  q  log q  log x N 1  log 2
J N 1 ( xN 1 ) 
 log x N 1  p  log p  q  log q  log 2
0
log( 1)  log x N 1
C
0  p 1 2
 log x N 1  [1 2  p  1]C
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
1 2  p 1
DP can give closed-form solution
Example 2 continued: Closed-form solution for k=N-1
Hence,
pq
1 2  p 1
0
0  p 1 2
u *N 1 
can view these as constant functions (controls = percentage) or as
*
*
feedback policies (total bet u k  u k xk )
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
9
DP can give closed-form solution
Example 2 continued: Solving the DP at k=N-2
Proceeding one stage (play) back:
J N  2 ( x N  2 )  max
E {0  J N 1 ( x N  2  wN  2u N  2 x N  2 )}
0  u N 2 1 wN 2
 max { p log( x N  2  u N  2 x N  2 )  q log( x N  2  u N  2 x N  2 )  ...
0u N 2 1
...  1  [1 2  p  1]C
But except for constant C, this is the same equation as for k = N – 1
 solution the same, plus consant C
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
10
DP can give closed-form solution
Example 2 continued: General closed-from DP solution
log xN  k2  kC
1 2  p 1
log xN  k2
0  p 1 2
 J N 2k ( xN 2k ) 
pq
1 2  p 1
 u N 2k (u N 2k ) 
0
0  p 1 2
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
11
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3: A stock option model
• xk
: price of a given stock at beginning of kth day
k
• xk+1 = xk + wk = x0   wl
l 0
• {wk} i.i.d., wk ~ F( )
w   F (w)dw  

 Random Walk
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
12
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3 continued: A stock option model
• Actions: Have an option to buy one share of the stock at fixed
price c; N days to exercise option. If you buy when stock’s price is
s:
 s – c = profit (can be negative)
What strategy maximizes profit?
 Terminating Process (Bertsekas, Prob. 8, Ch. 1)
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
13
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3 continued: Solution

uk  

B (1)
buy
DB (0) don' t buy

rk ( xk , u k , wk )  

xk  c ; u k  B
0
; u k  DB
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
14
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3 continued: Solution
However, process terminates (see prob. 8, ch. 1) when uk=B
 introduce fictitious termination state T s.t.


xk 1  


xk  wk ; uk  DB , xk  T
T
; uk  DB , xk  T
T
; uk  B
mixed symbolic and numeric states  discrete event
system
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
15
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3 continued: Solution
Cost structure changed to:
 rk ; xk  T
rk ( xk , uk , wk )  
 0 ; otherwise
There is no simple analytical solution for Jk(xk) or u*k=*(xk), but
we can obtain some qualitative properties (structure) of solutions.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
16
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3 continued: DP algorithm for the problem
J N 1 ( xN 1 )  max  x N 1  c,
uN – 1 = B


J k ( xk )  max xk  c,
uk = B


0
T  xN
uN – 1 = DB

J k 1 ( xk  wk )dF ( wk )
uk = DB
expected “profit-to-go”
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
17
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3 continued: Lemma (Ross)
constant does not affect property
(i) Jk+1(xk) – xk + c is decreasing in xk
 after a certain value of stock price profit-to-go is negative 
buy none
(ii) Jk(xk) is increasing and continuous in xk (backward induction)
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
18
DP can be used to obtain qualitative properties
(structure) of optimal solutions
Example 3 continued: Theorem (Ross)
There exists numbers s1 ≤ s2 ≤ … ≤ sN-k ≤ … ≤ sN such that
u
*
N k



B ; x N  k  sk
DB ;
x N  k  sk
critical stock price values
k periods remaining
where,
sk  min {s : J N k ( s)  s  c}
These results can be used to solve the problem numerically, or to
gain insight into the process.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
19
20
DP for deterministic problems
Example 3 continued: Remark
For a deterministic situation, optimizing over policies (feedback)
results in no advantage over optimizing over actions (sequences
of controls/decisions)
Hence, the optimization problem can be solved using
linear/nonlinear programming. Furthermore, for a finite state
and action deterministic problem, we can equivalently
formulate the problem as a shortest path problem for an acyclic
graph.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
21
DP for deterministic problems
Example 3 continued: Forward search
1
cij
.
. .
c01
c02
start
c03
k=0
0
2
3
k=1
.
. .
.
. .
k=2
0
End
(Artificial)
0
k=N-1
k=N
There are efficient ways to find shortest path, e.g. Branch and Bound algorithms.
However, DP has some advantages:
•
always leads to global optimum
•
can handle difficult constraint sets
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
DP can handle difficult constraint sets
Example 4: Integer-valued variables
1 2
min {u  x  u1 }
2
s.t.
xk , u k  
2
0
2
1
: no cost at final stage N=2
xk 1  xk  u k , k  0, 1
x0  1
x2  0

 boundary values

Remark: reachable set from x0 = 1 is Z2
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
22
DP can handle difficult constraint sets
Example 4 continued: Solution
k=2
J 2 ( x2 )  0
k=1
one-stage cost
1 2
J1 ( x1 )  min {x  u1  0}
u1U1 ( x1 )
2
U1 ( x1 )  {u1  Z : 0  x1  u1}
J2
2
1
 u   ( x1 )   x1
*
1
*
1
singleton
1 2
 J1 ( x1 )  x1
2
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
23
DP can handle difficult constraint sets
Example 4 continued: Solution
k=0
J 0 ( x0 )  min {u02  J1 ( x1 )}
u 0 U 0 ( x0 )
 2 1
2
 min u0  ( x0  u0 ) 
u 0 U 0 ( x0 )
2


3 2 1 2

 min  u0  x0  x0u0 
u 0 U 0 ( x0 ) 2
2


: x0  1
1
3 2
 J 0 (1)  min  u0  u0  
u 0 Z 2
2

 u0*  0
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
24
DP can handle difficult constraint sets
Example 4 continued: Optimal Policy
k=0
 u   (1)  0  x1  0  x0
*
0
*
0
u   ( x1 )   x1  u  1
*
1
*
1
*
1
x2  0
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
25
Download