Additional notes on game theory SA305: Spring 2013

advertisement
Additional notes on game theory
SA305: Spring 2013
These notes assume you have read the other notes that have been posted and/or were in class on Friday, April 19.
1
Maximin versus Minimax
Here is an example where the max of a min is not the min of the max. Suppose there is a batter versus a pitcher. The
pitcher is trying to throw a ball and the batter is trying to get a hit. The pitcher can throw either a fastball, denoted
by F, or a curveball, denoted by C. The batter can either guess a fastball is coming, also denoted by F, or can guess
a curveball coming, also denoted by C.
Note that the pitcher and batter only play pure strategies.
The payoff matrix is
pitcher
F
C
F .300 .200
C .100 .400
batter
Here, each entry represents the onbase percentage of the batter given the different strategies played, where onbase
percentage is the likelihood that the batter has success. Thus, the batter would prefer a higher percentage and the
pitcher a lower percentage.
We denote the batters decisions by yF and yC and the pitchers decisions by xF and xC . The sets that describe
their strategic possibilities are
B = {(yF , yC ) : yF , yC ∈ {0, 1}, yF + yC = 1} and P = {(xF , xC ) : xF , xC ∈ {0, 1}, xF + xC = 1},
which means that they each can only play one pure strategy. Note that for a given pair of strategies (yF , yC ) ∈ B and
(xF , xC ) ∈ cP , the payoff is
.300 .200
xF
yF xF (.300) + yF xC (.200) + yC xF (.100) + yC xC (.400) = yF yC
.
.100 .400
xC
Note that because only pure strategies are considered, exactly one entry from the payoff matrix is chosen. Then, from
the batter’s perspective, the worst-case approach is to consider the following:
.300 .200
xF
yF yC
max
min
.
.100 .400
xC
(yF ,yC )∈B (xF ,xC )∈P
Remember how this is evaluated: the batter selects a strategy and then the pitcher chooses a strategy. Thus, if the
batter selects F , the pitcher would choose C, with a payoff of .200 and if the batter selects C, the pitcher would choose
F with a payoff of .100. Given the batter chooses first,
.300 .200
xF
yF yC
max
min
= .200.
.100 .400
xC
(yF ,yC )∈B (xF ,xC )∈P
Now consider the pitchers problem
min
yF
max
(xF ,xC )∈B (yF ,yC )∈P
.300 .200
xF
yC
.
.100 .400
xC
Note that now the pitcher is first. If the pitcher chooses F , then the batter would choose F and the payoff is .300. If
the pitcher chooses C, the batter would choose C and the payoff is .400. Thus,
.300 .200
xF
yF yC
min
max
= .300.
.100 .400
xC
(xF ,xC )∈B (yF ,yC )∈P
So, in conclusion,
min
max
(xF ,xC )∈B (yF ,yC )∈P
yF
.300
yC
.100
.200
.400
xF
xC
6=
max
min
(yF ,yC )∈B (xF ,xC )∈P
1
yF
.300 .200
xF
yC
.
.100 .400
xC
2
Rock-paper-scissors
Recall the rock-paper-scissor payoff matrix

−1
0
1
0
A= 1
−1

1
−1
0
where the rows and columns are in rock-paper-scissor strategy order. Also, let yR , yP , yS represent the probabilities
the row player plays rock, paper, and scissor respectively. Analogously, let xR , xP , xS represent the probabilities the
column player plays rock, paper, and scissor respectively. The row player uses an additional variable v to denote the
payoff she receives if she plays a particular strategy. Recall from class that the row player is using the linear program:
max
s.t.
v
v ≤ yP − yS
v ≤ yS − yR
v ≤ yR − yP
yR + yP + yS = 1
yR , yP , yS ≥ 0.
The column player uses an additional variable w to denote the payoff he receives if he plays a particular strategy.
Recall from class the column player’s linear program:
min w
s.t. w ≥ xS − xP
w ≥ xR − xS
w ≥ xP − xR
xR + xP + xS = 1
xR , xP , xS ≥ 0.
You were asked to take a dual of each of these linear programs. Recall that a good way to minimize mistakes in taking
duals is to rewrite your linear programs so that all variables are on the left-hand side and all constants are on the
right-hand side. Also, it helps to line up the variables consistently. So, for the row player, a good way to rewrite her
linear program is as follows.
max v
s.t.
v
−yP +yS ≤ 0
v +yR
−yS ≤ 0
v −yR +yP
≤0
yR +yP +yS = 1
yR , yP ,
yS ≥ 0,
and for the column player, a good way to rewrite his linear program is
min w
s.t. w
w
w
+xP
−xR
+xR
xR
xR ,
−xP
+xP
xP ,
2
−xS
+xS
+xS
xS
≥0
≥0
≥0
=1
≥ 0.
3
The general model
In general, the model of the two-player zero sum game can be summarized as follows. We have a row player and a col
player. The row player uses strategies specified in the set R and the col player uses strategies specified in the set C.
The following data is also given:
Pij
= the payoff the row player receives if row plays strategy i ∈ R, and col plays j ∈ C.
The decision variables are
yi
xj
= the probability row plays strategy i ∈ R,
= the probability col plays strategy j ∈ C.
From the perspective of row, the game strategies are determined by solving the minimax problem:




X X

X
X
r∗ = max min
Pij yi xj :
yi = 1 =
xj , xi ≥ 0, ∀i ∈ R, yj ≥ 0, ∀j ∈ C
.
y  x 

i∈R j∈C
i∈R
(MM-R)
j∈C
Note that in (MM-R), the row player “moves first” and selects the y strategies and the col player “responds” by
choosing col strategies x. In class we saw that by introducing a variable v to represent the inner minimization, (MM-R)
can be solved by solving linear program:
max
v
s.t.
v≤
P
Pij yi
for j ∈ C
(LP-R)
i∈R
yi ≥ 0,
i ∈ R.
From the perspective of col, the game strategies are determined by solving the minimax1 :



X X


X
X
Pij yi xj :
yi = 1 =
xj , xi ≥ 0, ∀i ∈ R, yj ≥ 0, ∀j ∈ C
.
c∗ = min max
x  y 

i∈R j∈C
i∈R
(MM-C)
j∈C
As in (MM-R), by introducing a variable w to represent the inner maximization, (MM-C) can be solved by solving
linear program:
min w
P
s.t. w ≥
Pij xj for i ∈ R
(LP-C)
j∈C
xj ≥ 0,
j ∈ C.
In class we have seen that (LP-C) and (LP-R) are duals of one another. This provides a proof that von Neuman’s
famous theorem holds, which states that:
r∗ = c∗ .
Moreover, the duality relationship of (LP-R) and (LP-C) indicates we need only solve (LP-R) or (LP-C) but not
both. The associated dual solution to the optimal basic solution of either is the optimal solution to the other. So, for
example, if one were to solve (LP-R) to obtain an optimal basic feasible solution x∗ , then the associated dual solution
y ∗ would be optimal to (LP-C).
1 Although
it is a “maximin” problem, both are referred to as minimax as maximin has a different meaning
3
Download