ECE 594D Game Theory and Multiagent Systems Homework #5

advertisement
ECE 594D Game Theory and Multiagent Systems
Homework #5 Solutions
1. Consider the following cost sharing problem:
• Player set: N = {1, 2, 3}
• Opportunity costs: c : 2N → R
c({1}) = 9,
c({2}) = 8,
c({3}) = 9
c({1, 2}) = 14, c({1, 3}) = 15, c({2, 3}) = 13
c({1, 2, 3}) = 21
c(∅) = 0
(a) Follow the approach presented in lecture.
(b) Follow the approach presented in lecture.
(c) Below are the marginal contributions for player 1. The remaining marginal contribution
for the other players could be computed in a similar fashion.
f mc (1, {1}) = C({1}) − C(∅) = 9 − 0 = 9
f mc (1, {1, 2}) = C({1, 2}) − C({2}) = 14 − 8 = 6
f mc (1, {1, 3}) = C({1, 3}) − C({3}) = 15 − 9 = 6
f mc (1, {1, 2, 3}) = C({1, 2, 3}) − C({2, 3}) = 21 − 13 = 8
(d) Recall that the general equation for the Shapley value is
Sh(i, S; c) =
X
T ⊆S\{i}
|T |!(|S| − |T | − 1)!
(c(T ∪ {i}) − c(T ))
|S|!
Below are the Shapley values for player 1. The remaining Shapley values for the other
players could be computed in a similar fashion.
Sh(1, {1}) =
X |T |!(|S| − |T | − 1)!
(c(T ∪ {i}) − c(T ))
|S|!
T ⊆∅
= c({1}) − c(∅) = 9
X |T |!(|S| − |T | − 1)!
Sh(1, {1, 2}) =
(c(T ∪ {i}) − c(T ))
|S|!
T ⊆{2}
= 1/2 (c({1}) − c(∅)) + 1/2 (c({1, 2}) − c({2})) = 7.5
X |T |!(|S| − |T | − 1)!
Sh(1, {1, 3}) =
(c(T ∪ {i}) − c(T ))
|S|!
T ⊆{3}
= 1/2 (c({1}) − c(∅)) + 1/2 (c({1, 3}) − c({3})) = 7.5
X |T |!(|S| − |T | − 1)!
Sh(1, {1, 2, 3}) =
(c(T ∪ {i}) − c(T ))
|S|!
T ⊆{2,3}
= 1/3 (c({1}) − c(∅)) + 1/6 (c({1, 2}) − c({2}))
+1/6 (c({1, 3}) − c({3})) + 1/3 (c({1, 2, 3}) − c({2, 3}))
= (1/3)9 + (1/6)6 + (1/6)6 + (1/3)8 = 7 + 2/3
(e) Here, we will focus on computing the Shapley value for player 1 to the set {1, 2, 3}. The
remaining can be done in the same fashion. The orderings for the set {1, 2, 3} and the
ordered based marginal contributions are as follows:
1 ← 2 ← 3 ⇒ c({1}) − c(∅) = 9
1 ← 3 ← 2 ⇒ c({1}) − c(∅) = 9
2 ← 1 ← 3 ⇒ c({1, 2}) − c({2}) = 6
3 ← 1 ← 2 ⇒ c({1, 3}) − c({3}) = 6
3 ← 2 ← 1 ⇒ c({1, 2, 3}) − c({2, 3}) = 8
2 ← 3 ← 1 ⇒ c({1, 2, 3}) − c({2, 3}) = 8
The Shapley value is the average of these 6 marginal contributions, i.e.,
Sh(1, {1, 2, 3}) = 1/6(9 + 9 + 6 + 6 + 8 + 8) = 7 + 2/3
(f) Verified.
2. Here our goal is to provide an example of the working of the VCG mechanism. Basically, this
is just a numerical illustration of the general proof provided in the lecture.
(a) The workings of the VCG mechanism for this problem are the following:
• Bids: Each player {x, y, z} provides a reported valuation for the three allocations,
Y Z
i.e., bX
x , bx , bx
• Selection: The chosen alternative maximizes the sum of the players reported valuations, i.e.,
σ(b) = arg max bqx + bqy + bqz
q∈{X,Y,Z}
• Monetary transfers: The tax charged to player i given bid profile b is
X σ(b) X σ(b )
ti (b) =
bj −
bj −i
j6=i
j6=i
where
σ(b−i ) = arg max
X
q∈{X,Y,Z} j6=i
bqj
(b) Here, show that bi = (viX , viY , viZ ) is a dominant strategy. Proof is essentially the same
as the general proof provided in the lecture notes. Verify that the steps in the proof hold
using the numbers given in this problem.
3. (a) Construct a two-player game that meets the following specifications:
• The better reply graph has a cycle (i.e., it does not have the finite improvement
property).
• The row player has 2 actions and the column player has 3 actions.
• There are no payoff “ties”, i.e. a player is never indifferent between two actions.
• There is a unique pure Nash equilibrium.
Be sure to sketch the better reply graph.
(b) Is this possible when each player only has two moves?
One possibility is the figure below:
1,2
1,-1
-1,1
-1,2
-1,1
1,-1
The specifications are not possible for a 2-by-2 game.
4. Complete the payoffs in the following game so that it is a potential game:
T
M
B
L
1,?
?,4
1,?
C
2,?
5,3
6,?
R
1,?
2,?
7,?
The idea is to use the potential function property:
φ(ai , a−i ) − φ(a0i , a−i ) = ui (ai , a−i ) − ui (a0i , a−i )
to construct a feasible potential function with the information provided. In this case, there
is not a unique solution. This potential function can them be used to complete the unknown
costs.
One possible potential function is:


0 −3 0
1 0 1 ,
0 1 6
where the TR, MC, BL elements were arbitrarily set to zero.
This potential function forces the unknown payoffs to equal:
T
M
B
L
1,?
2,4
1,?
C
2,?
5,3
6,?
R
1,?
2,4
7,?
Arbitrarily setting the BL payoff for the column player to zero and TL column player payoff
results in:
L
1,2
2,4
1,0
T
M
B
C
2,-1
5,3
6,1
R
1,2
2,4
7,6
5. Consider the following two player game:
a2
5, 2
0, 4
a1
b1
b2
0, 4
2, 6
(a) Define the empirical distribution of play for each player i ∈ {1, 2} as
t−1
qiai (t) =
qibi (t) =
1X
I{ai (τ ) = ai }
t
1
t
τ =0
t−1
X
I{ai (τ ) = bi }.
τ =0
where I{ai (τ ) = ai } = 1 if ai (τ ) = ai and 0 otherwise. At time 1, we have q1b1 (1) = 1
and q2a2 (1) = 1. Accordingly, we have
a1 (1) = arg max U1 (x, q2 (1)) = a1
x∈a1 ,b1
a2 (1) = arg max U2 (x, q1 (1)) = b2 .
x∈a2 ,b2
Updating the empirical frequencies, we now have q1 (2) = q2 (2) = (1/2, 1/2). The next
action profile is then
a1 (2) = arg max U1 (x, q2 (2)) = a1
x∈a1 ,b1
a2 (2) = arg max U2 (x, q1 (2)) = b2 .
x∈a2 ,b2
Updating the empirical frequencies, we now have q1 (3) = (2/3, 1/3) and q2 (3) = (1/3, 2/3).
The next action profile is then
a1 (3) = arg max U1 (x, q2 (3)) = a1
x∈a1 ,b1
a2 (3) = arg max U2 (x, q1 (3)) = b2 .
x∈a2 ,b2
Updating the empirical frequencies, we now have q1 (4) = (3/4, 1/4) and q2 (4) = (1/4, 3/4).
The next action profile is then
a1 (4) = arg max U1 (x, q2 (4)) = b1
x∈a1 ,b1
a2 (4) = arg max U2 (x, q1 (4)) = b2 .
x∈a2 ,b2
The action profile will stay at (b1 , b2 ) for all future times.
(b) Suppose a(t) is s strict Nash equilibrium and this action profile is selected according to
the fictitious play learning rule at time t. Then, for each player i ∈ {1, 2} we have
Ui (ai (t), a−i (t)) > Ui (ai , a−i (t)), ∀ai 6= ai (t)
Ui (ai (t), q−i (t)) ≥ Ui (ai , q−i (t)), ∀ai
The action choices of each agent i ∈ {1, 2} at time t + 1 would be of the form
ai (t + 1) = arg max Ui (ai , q−i (t + 1))
ai ∈Ai
= arg max
ai ∈Ai
t
1
Ui (ai , q−i (t)) +
Ui (ai , a−i (t))
t+1
t+1
where the equality comes from expanding out the definition of the empirical frequencies.
Since ai (t) is the argument that maximizes each term in the above summation, then
ai (t + 1) = ai (t). Hence, the statement is true.
(c) For the three player game, the ensuring action profile would be a(1) = (b1 , b2 , b3 ) which
is a Nash equilibrium under the situation where x < 1. However, if x is a negative, e.g.,
x = −100, at the ensuing time a3 (2) = a3 which disproves the statement.
Download