Uploaded by Xrisdwhyner Devs

hw1 partial sols

advertisement
MATH 6367 Homework 1
Name and ID:
1. Consider the system
xk+1 = xk + uk + wk ,
k = 0, 1, 2, 3,
with initial state x0 = 5, and the cost function
3
X
(x2k + u2k ).
k=0
Apply the DP algorithm for the following two cases:
(a) The control constraint set Uk (xk ) is {u | 0 ≤ xk + u ≤ 5, u : integer} for all xk and k, and the
disturbance wk is equal to zero for all k.
(b) The control constraint is as in part (a) and the disturbance wk takes the values −1 and 1 with
equal probability 1/2 for all xk and uk , except if xk + uk is equal to 0 or 5, in which case wk = 0
with probability 1.
2. A game of the blackjack variety is played by two players as follows: Both players throw a die. The
first player, knowing his opponent’s result, may stop or may throw the die again and add the result to
the result of his previous throw. He then may stop or throw again and add the result of the new throw
to the sum of his previous throws. He may repeat this process as many times as he wishes. If his sum
exceeds seven (i.e., he busts), he loses the game. If he stops before exceeding seven, the second player
takes over and throws the die successively until the sum of his throws is four or higher. If the sum of
the second player is over seven, he loses the game. Otherwise the player with the larger sum wins, and
in case of a tie the second player wins. The problem is to determine a stopping strategy for the first
player that maximizes his probability of winning for each possible initial throw of the second player.
Formulate the problem in terms of DP and find an optimal stopping strategy for the case where the
second player’s initial throw is three.
Hint: Take N = 6 and a state space consisting of the following 14 states:
x1 : busted
x1+i : already stopped at sum i (1 ≤ i ≤ 7),
x8+i : current sum is i but the player has not yet stopped (1 ≤ i ≤ 6).
The optimal strategy is to throw until the sum is four or higher.
Solution: Let the state be the status of the first player. The state space, then, consists of the
states:
 1

x : busted
x1+i : already stopped at sum i (1 ≤ i ≤ 7),

 8+i
x
: current sum is i but the player has not yet stopped (1 ≤ i ≤ 6).
• The first player can roll 6 more times at most, after the initial throw. Before he rolls the
dice he is in one of the above states. If in state x8 or x1 , he has no choice. If in states x2 to
x7 , he has already stopped. If in states x9 to x14 , he can apply a control (i.e., roll or stop),
which will maximize Prob(Win | xi ), i = 9, · · · , 14.
• For a given throw of the second player we can compute Prob(Win | x1 ), · · · , Prob(Win | x8 ).
• Then going backwards in time (from the 6th roll) we calculate the strategy which maximizes:
Prob(Win | xi ), i = 14, · · · , 9. Let
P ∗ (Win | xi ) = max Prob(Win | xi , u)
u
Given that the initial throw of the second player is three:
Prob(Win | x1 ) = 0
Prob(Win | x5 ) = 1/3
Prob(Win | x2 ) = 1/3
Prob(Win | x6 ) = 1/2
Prob(Win | x3 ) = 1/3
Prob(Win | x7 ) = 2/3
Prob(Win | x4 ) = 1/3
Prob(Win | x8 ) = 5/6
Let xi be the state after the ith roll by player 1.
Stage 6
x6 ∈ {x1 , x8 } no controls can be applied
Homework 1
Page 2 of 11
Stage 5
x5 ∈ {x1 , x7 , x8 , x14 } control possible only for x14
Prob(Win | x14 , u : stop) = Prob(Win | x7 ) = 2/3
Prob(Win | x14 , u : roll) =
Prob(Win | x8 )
= 5/36
6
(Note that µj (xk ) is independent of j.) We have, then:
P ∗ (Win | x14 ) = 2/3,
µ(x14 ) : stop
Stage 4
x4 ∈ {x1 , x6 , x7 , x8 , x13 , x14 }
Prob(Win | x13 , u : stop) = Prob(Win | x6 ) = 1/2
Prob(Win | x13 , u : roll) =
Prob(Win | x8 ) + P ∗ (Win | x14 )
< 1/2
6
We have, then:
P ∗ (Win | x13 ) = 1/2,
µ(x13 ) : stop
Stage 3
x3 ∈ {x1 , x5 , x6 , x7 , x8 , x12 , x13 , x14 }
Prob(Win | x12 , u : stop) = Prob(Win | x5 ) = 1/3
Prob(Win | x12 , u : roll) =
Prob(Win | x8 ) + P ∗ (Win | x13 ) + P ∗ (Win | x14 )
= 1/3
6
We have, then:
P ∗ (Win | x12 ) = 1/3,
µ(x12 ) : stop or roll
Stage 2
x2 ∈ {x1 , x4 , x5 , x6 , x7 , x8 , x11 , x12 , x13 , x14 }
Prob(Win | x11 , u : stop) = Prob(Win | x4 ) = 1/3
Prob(Win | x11 , u : roll) =
Prob(Win | x8 ) + P ∗ (Win | x12 ) + · · · + P ∗ (Win | x14 )
= 7/18
6
We have, then:
P ∗ (Win | x11 ) = 7/18,
µ(x11 ) : roll
Finally:
P ∗ (Win | x10 ) = 49/108,
µ(x10 ) : roll
P ∗ (Win | x9 ) = 343/648,
µ(x9 ) : roll
3. Assume that we have a vessel whose maximum weight capacity is z and whose cargo is to consist of
different quantities of N different items. Let vi denote the value of the ith type of item, wi the weight
of ith type of item, and xi the number of items of type
P i that are loaded in the vessel. The
PN problem is
to find the most valuable cargo, i.e., to maximize N
x
v
subject
to
the
constraints
i
i
i=1
i=1 xi wi ≤ z
and xi = 0, 1, 2, · · · . Formulate this problem in terms of DP.
Homework 1
Page 3 of 11
4. A farmer annually producing xk units of a certain crop stores (1−uk )xk units of his production, where
0 ≤ uk ≤ 1, and invests the remaining uk xk units, thus increasing the next year’s production to a
level xk+1 given by
xk+1 = xk + wk uk xk , k = 0, 1, · · · , N − 1.
Homework 1
Page 4 of 11
The scalars wk are independent random variables with identical probability distributions that do not
depend either on xk or uk . Furthermore, E{wk } = w̄ > 0. The problem is to find the optimal
investment policy that maximizes the total expected product stored over N years
N
−1
n
o
X
xN +
(1 − uk )xk
E
w
k
k=0,1,··· ,N −1
k=0
Show the optimality of the following policy that consists of constant functions:
(a) If w̄ > 1, µ∗0 (x0 ) = · · · = µ∗N −1 (xN −1 ) = 1.
(b) If 0 < w̄ < 1/N , µ∗0 (x0 ) = · · · = µ∗N −1 (xN −1 ) = 0.
(c) If 1/N ≤ w̄ ≤ 1,
µ∗0 (x0 ) = · · · = µ∗N −k̄−1 (xN −k̄−1 ) = 1,
µ∗N −k̄ (xN −k̄ ) = · · · = µ∗N −1 (xN −1 ) = 0,
where k̄ is such that 1/(k̄ + 1) < w̄ ≤ 1/k̄.
Solution: The DP algorithm is:
JN (xN ) = xN
Jk (xk ) = max
0≤uk ≤1
n
o
(1 − uk )xk + E Jk+1 ((1 + wk uk )xk )
wk
• Case 1: w̄ > 1 Claim:
JN −k (xN −k ) = xN −k (1 + w̄)k ,
µ∗0 (x0 )
= ··· =
µ∗N −1 (xN −1 )
k = 1, · · · , N
=1
The proof follows by induction.
n
o
(1 − uN −1 )xN −1 + E (1 + wN −1 uN −1 )xN −1
wN −1
0≤uN −1 ≤1
= xN −1 max
2 + (w̄ − 1)uN −1
JN −1 (xN −1 ) =
max
0≤uN −1 ≤1
= xN −1 (1 + w̄),
where µ∗N −1 (xN −1 ) = 1.
Assume that JN −k (xN −k ) = xN −k (1 + w̄)k . Then
n
o
JN −k−1 (xN −k−1 ) =
max
(1 − uN −k−1 )xN −k−1 + (1 + w̄uN −k−1 )(1 + w̄)k xN −k−1
0≤uN −k−1 ≤1
n
o
= xN −k−1
max
1 + (1 + w̄)k + (1 + w̄)k w̄ − 1 uN −k−1
0≤uN −k−1 ≤1
= xN −k−1 (1 + w̄)k+1 ,
where µ∗N −k−1 (xN −k−1 ) = 1.
• Case 2: 0 < w̄ < 1/N Claim:
JN −k (xN −k ) = (k + 1)xN −k ,
µ∗0 (x0 )
= ··· =
µ∗N −1 (xN −1 )
Homework 1
k = 1, · · · , N
=0
Page 5 of 11
The proof follows by induction.
JN −1 (xN −1 ) = xN −1
max
0≤uN −1 ≤1
= 2xN −1 ,
2 + (w̄ − 1)uN −1
where µ∗N −1 (xN −1 ) = 0.
Assume that JN −k (xN −k ) = (k + 1)xN −k . Then
n
o
JN −k−1 (xN −k−1 ) =
max
(1 − uN −k−1 )xN −k−1 + (k + 1)(1 + w̄uN −k−1 xN −k−1 )
0≤uN −k−1 ≤1
n
o
= xN −k−1
max
(k + 2) + (k + 1)w̄ − 1 uN −k−1
0≤uN −k−1 ≤1
= (k + 2)xN −k−1 ,
where µ∗N −k−1 (xN −k−1 ) = 0.
• Case 3: 1/N ≤ w̄ ≤ 1 Apply the DP algorithm beginning with stage N . Proceed as in
Case 2, setting the control equal to zero until:
o
n
JN −k̄−1 (xN −k̄−1 ) = xN −k̄−1
max
(k̄ + 2) + (k̄ + 1)w̄ − 1 uN −k̄−1
0≤uN −k̄−1 ≤1
where N − k̄ − 1 is the first stage where w̄ > 1/(k̄ + 1). Since (k̄ + 1)w̄ − 1 > 0, take:
µ∗N −k̄−1 (xN −k̄−1 ) = 1
JN −k̄−1 (xN −k̄−1 ) = (k̄ + 1)(1 + w̄)xN −k̄−1
From this point, proceed as in Case 1. At each iteration the power of (1 + w̄) will be raised
and the control will be set to one.
5. An unscrupulous innkeeper charges a different rate for a room as the day progresses, depending on
whether he has many or few vacancies. His objective is to maximize his expected total income during
the day. Let x be the number of empty rooms at the start of the day, and let y be the number of
customers that will ask for a room in the course of the day. We assume (somewhat unrealistically)
that the innkeeper knows y with certainty, and upon arrival of a customer, quotes one of m prices ri ,
i = 1, · · · , m, where 0 < r1 ≤ r2 ≤ · · · ≤ rm . A quote of a rate ri is accepted with probability pi and
is rejected with probability 1 − pi , in which case the customer departs, never to return during that
day. Formulate this as a problem with y stages and show that the maximal expected income, as a
function of x and y, satisfies the recursion
h
i
J(x, y) = max pi (ri + J(x − 1, y − 1)) + (1 − pi )J(x, y − 1)
i=1,··· ,m
for all x ≥ 1 and y ≥ 1, with initial conditions
J(x, 0) = J(0, y) = 0,
Homework 1
for all x and y.
Page 6 of 11
6. An investor observes at the beginning of each period k the price xk of a stock and decides whether to
buy 1 unit, sell 1 unit, or do nothing. There is a transaction cost c for buying or selling. The stock
price can take one of n different values v 1 , · · · , v n and the transition probabilities pkij = P {xk+1 =
v j | xk = v i } are known. The investor wants to maximize the total worth of his stock at a fixed final
period N minus his investment costs from period 0 to period N − 1 ( revenue from a sale is viewed as
negative cost). We assume that the function
Pk (x) = E{xN | xk = x} − x
is monotonically nonincreasing as a function of x; that is, the expected profit from a purchase is a
nonincreasing function of the purchase price. Assume that the investor starts with N or more units of
stock and an unlimited amount of cash, so that a purchase or sale decision is possible at each period
regardless of the past decisions and the current price. For every period k, let xk be the largest value
¯
of x ∈ {v 1 , · · · , v n } such that Pk (x) > c, and let x̄k be the smallest value of x ∈ {v 1 , · · · , v n } such
that Pk (x) < −c. Show that it is optimal to buy if xk ≤ xk , sell if xk ≥ x̄k , and do nothing otherwise.
¯
Homework 1
Page 7 of 11
Hint: Formulate the problem as one of maximizing
−1
n NX
o
E
(uk Pk (xk ) − c|uk |) ,
k=0
where uk ∈ {−1, 0, 1}.
Solution: The total net expected profit from the (buy/sell) investment decissions after transaction costs are deducted is
−1
n NX
o
E
(uk Pk (xk ) − c|uk |) ,
k=0
where


 1
uk = −1


0
if a unit of stock is bought at the kth period,
if a unit of stock is sold at the kth period,
otherwise.
With a policy that maximizes this expression, we simultaneously maximize the expected total
worth of the stock held at time N minus the investment costs (including sale revenues). The DP
algorithm is given by
h
i
Jk (xk ) =
max
uk Pk (xk ) − c|uk | + E Jk+1 (xk+1 ) | xk
uk ∈{−1,0,1}
with
JN (xN ) = 0,
where Jk+1 (xk+1 ) is the optimal expected profit when the stock price is xk+1 at time k + 1. Since
uk does not influence xk+1 and E{Jk+1 (xk+1 ) | xk }, a decision uk ∈ {−1, 0, 1} that maximizes
uk Pk (xk ) − c|uk | at time k is optimal. Since Pk (xk ) is monotonically nonincreasing in xk , it
follows that it is optimal to set

,

 1 xk ≤ x
¯k
uk = −1 xk ≥ x̄k ,


0 otherwise,
where xk and x̄k are as in the problem statement. Note that the optimal expected profit Jk (xk )
¯
is given by
−1
n NX
o
Jk (xk ) = E
max (ui Pi (xi ) − c|ui |) .
i=k
ui ∈{−1,0,1}
Homework 1
Page 8 of 11
Download