DP can give complete quantitative solution Example 1: Discrete, finite capacity, inventory control problem • Sk = Ck = Dk = {0, 1, 2} • xk + uk 2 : finite capacity • xk+1 = max(0, xk + uk – wk ) no backlogging • xk + uk 2 uk 2 – xk U(xk)={0,…,2-xk) • Prob{wk=0}=0.1, Prob{wk=1}=0.7, Prob{wk=2}=0.2 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 DP can give complete quantitative solution Example 1 continued: Inventory control problem • N=3 • gn(xn) = 0 • gk(xk, uk, wk) = uk + 1∙max(0, xk + uk – wk) + 3∙max(0, wk + xk – uk) order holding ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm lost demand 2 DP can give closed-form solution Example 2: A gambling model A gambler is going to bet in N successive plays. The gambler can bet any (nonnegative) amount up to his present fortune. What betting strategy maximizes his final fortune? P(lose) = p, P(win) = 1 – p = q : Bernoulli Solution: For convenience, and with no loss in generality, we look to maximize the log of the final fortune. The model is as follows. • Utility of fortune 1 / wealth U(x) = log(x) : also Bernoulli! ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 3 DP can give closed-form solution Example 2 continued: Variable definitions • xk = fortune at beginning of kth play (after outcome of (k – 1)th play, before kth) • uk = bet for kth play as a percentage of xk • wk = 1 -1 : win w.p. p : lose w.p. q = 1 – p 0kN–1 • gk(xk, uk, wk) = 0, • gN(xN) = -log(xN) to maximize • xk+1 = xk + wk uk xk ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 4 DP can give closed-form solution Example 2 continued: DP algorithm for the problem J N ( x N ) log( x N ) J k ( xk ) max E{0 J k 1 ( xk 1 )} 0 u k 1 wk max E{J k 1 ( xk wk uk xk )} 0 u k 1 wk max { p J k 1 ( xk uk xk ) q J k 1 ( xk uk xk )} 0 u k 1 k 0, 1, ..., N 1 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 5 DP can give closed-form solution Example 2 continued: Solving the DP at k=N-1 Thus, J N 1 ( xN 1 ) max { p log( xN 1 u N 1 xN 1 ) q log( xN 1 u N 1 xN 1 )} 0 u N 1 1 max { p log( 1 u N 1 ) q log( 1 u N 1 ) log( xN 1 )} 0 u N 1 1 ( p q) u N 1 p q 0 2 u N 1 1 u N 1 1 u N 1 1 u N 1 u *N 1 p q 2 p 1 : feasible if 0 p q 1 0 2 p 1 1 1 2 p 1 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 6 DP can give closed-form solution Example 2 continued: Solving the DP at k=N-1 Thus, J N 1 ( xN 1 ) max { p log( xN 1 u N 1 xN 1 ) q log( xN 1 u N 1 xN 1 )} 0 u N 1 1 max { p log( 1 u N 1 ) q log( 1 u N 1 ) log( xN 1 )} 0 u N 1 1 : consider uN-1 = 1 separately if p = 1 (q = 0) u*N-1 = 1 : bet it all! u*N-1 = p – q if 0 ≤ p < ½, then u*N-1 = 0 (p < q q log(1 – uN-1) dominates) p log(1 + uN-1)+ q log(1 – uN-1)< q log(1 – u2N-1) ≤ 0 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 7 DP can give closed-form solution 8 Example 2 continued: Closed-form solution for k=N-1 Hence, p log( 2 p ) q log( 2 2 p) log x N 1 p log p q log q log x N 1 log 2 J N 1 ( xN 1 ) log x N 1 p log p q log q log 2 0 log( 1) log x N 1 C 0 p 1 2 log x N 1 [1 2 p 1]C ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 2 p 1 DP can give closed-form solution Example 2 continued: Closed-form solution for k=N-1 Hence, pq 1 2 p 1 0 0 p 1 2 u *N 1 can view these as constant functions (controls = percentage) or as * * feedback policies (total bet u k u k xk ) ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 9 DP can give closed-form solution Example 2 continued: Solving the DP at k=N-2 Proceeding one stage (play) back: J N 2 ( x N 2 ) max E {0 J N 1 ( x N 2 wN 2u N 2 x N 2 )} 0 u N 2 1 wN 2 max { p log( x N 2 u N 2 x N 2 ) q log( x N 2 u N 2 x N 2 ) ... 0u N 2 1 ... 1 [1 2 p 1]C But except for constant C, this is the same equation as for k = N – 1 solution the same, plus consant C ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 10 DP can give closed-form solution Example 2 continued: General closed-from DP solution log xN k2 kC 1 2 p 1 log xN k2 0 p 1 2 J N 2k ( xN 2k ) pq 1 2 p 1 u N 2k (u N 2k ) 0 0 p 1 2 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 11 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3: A stock option model • xk : price of a given stock at beginning of kth day k • xk+1 = xk + wk = x0 wl l 0 • {wk} i.i.d., wk ~ F( ) w F (w)dw Random Walk ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 12 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: A stock option model • Actions: Have an option to buy one share of the stock at fixed price c; N days to exercise option. If you buy when stock’s price is s: s – c = profit (can be negative) What strategy maximizes profit? Terminating Process (Bertsekas, Prob. 8, Ch. 1) ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 13 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Solution uk B (1) buy DB (0) don' t buy rk ( xk , u k , wk ) xk c ; u k B 0 ; u k DB ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 14 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Solution However, process terminates (see prob. 8, ch. 1) when uk=B introduce fictitious termination state T s.t. xk 1 xk wk ; uk DB , xk T T ; uk DB , xk T T ; uk B mixed symbolic and numeric states discrete event system ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 15 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Solution Cost structure changed to: rk ; xk T rk ( xk , uk , wk ) 0 ; otherwise There is no simple analytical solution for Jk(xk) or u*k=*(xk), but we can obtain some qualitative properties (structure) of solutions. ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 16 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: DP algorithm for the problem J N 1 ( xN 1 ) max x N 1 c, uN – 1 = B J k ( xk ) max xk c, uk = B 0 T xN uN – 1 = DB J k 1 ( xk wk )dF ( wk ) uk = DB expected “profit-to-go” ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 17 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Lemma (Ross) constant does not affect property (i) Jk+1(xk) – xk + c is decreasing in xk after a certain value of stock price profit-to-go is negative buy none (ii) Jk(xk) is increasing and continuous in xk (backward induction) ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 18 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Theorem (Ross) There exists numbers s1 ≤ s2 ≤ … ≤ sN-k ≤ … ≤ sN such that u * N k B ; x N k sk DB ; x N k sk critical stock price values k periods remaining where, sk min {s : J N k ( s) s c} These results can be used to solve the problem numerically, or to gain insight into the process. ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 19 20 DP for deterministic problems Example 3 continued: Remark For a deterministic situation, optimizing over policies (feedback) results in no advantage over optimizing over actions (sequences of controls/decisions) Hence, the optimization problem can be solved using linear/nonlinear programming. Furthermore, for a finite state and action deterministic problem, we can equivalently formulate the problem as a shortest path problem for an acyclic graph. ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 21 DP for deterministic problems Example 3 continued: Forward search 1 cij . . . c01 c02 start c03 k=0 0 2 3 k=1 . . . . . . k=2 0 End (Artificial) 0 k=N-1 k=N There are efficient ways to find shortest path, e.g. Branch and Bound algorithms. However, DP has some advantages: • always leads to global optimum • can handle difficult constraint sets ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm DP can handle difficult constraint sets Example 4: Integer-valued variables 1 2 min {u x u1 } 2 s.t. xk , u k 2 0 2 1 : no cost at final stage N=2 xk 1 xk u k , k 0, 1 x0 1 x2 0 boundary values Remark: reachable set from x0 = 1 is Z2 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 22 DP can handle difficult constraint sets Example 4 continued: Solution k=2 J 2 ( x2 ) 0 k=1 one-stage cost 1 2 J1 ( x1 ) min {x u1 0} u1U1 ( x1 ) 2 U1 ( x1 ) {u1 Z : 0 x1 u1} J2 2 1 u ( x1 ) x1 * 1 * 1 singleton 1 2 J1 ( x1 ) x1 2 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 23 DP can handle difficult constraint sets Example 4 continued: Solution k=0 J 0 ( x0 ) min {u02 J1 ( x1 )} u 0 U 0 ( x0 ) 2 1 2 min u0 ( x0 u0 ) u 0 U 0 ( x0 ) 2 3 2 1 2 min u0 x0 x0u0 u 0 U 0 ( x0 ) 2 2 : x0 1 1 3 2 J 0 (1) min u0 u0 u 0 Z 2 2 u0* 0 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 24 DP can handle difficult constraint sets Example 4 continued: Optimal Policy k=0 u (1) 0 x1 0 x0 * 0 * 0 u ( x1 ) x1 u 1 * 1 * 1 * 1 x2 0 ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 25