ECE 594D Game Theory and Multiagent Systems Homework #5 Solutions 1. Consider the following cost sharing problem: • Player set: N = {1, 2, 3} • Opportunity costs: c : 2N → R c({1}) = 9, c({2}) = 8, c({3}) = 9 c({1, 2}) = 14, c({1, 3}) = 15, c({2, 3}) = 13 c({1, 2, 3}) = 21 c(∅) = 0 (a) Follow the approach presented in lecture. (b) Follow the approach presented in lecture. (c) Below are the marginal contributions for player 1. The remaining marginal contribution for the other players could be computed in a similar fashion. f mc (1, {1}) = C({1}) − C(∅) = 9 − 0 = 9 f mc (1, {1, 2}) = C({1, 2}) − C({2}) = 14 − 8 = 6 f mc (1, {1, 3}) = C({1, 3}) − C({3}) = 15 − 9 = 6 f mc (1, {1, 2, 3}) = C({1, 2, 3}) − C({2, 3}) = 21 − 13 = 8 (d) Recall that the general equation for the Shapley value is Sh(i, S; c) = X T ⊆S\{i} |T |!(|S| − |T | − 1)! (c(T ∪ {i}) − c(T )) |S|! Below are the Shapley values for player 1. The remaining Shapley values for the other players could be computed in a similar fashion. Sh(1, {1}) = X |T |!(|S| − |T | − 1)! (c(T ∪ {i}) − c(T )) |S|! T ⊆∅ = c({1}) − c(∅) = 9 X |T |!(|S| − |T | − 1)! Sh(1, {1, 2}) = (c(T ∪ {i}) − c(T )) |S|! T ⊆{2} = 1/2 (c({1}) − c(∅)) + 1/2 (c({1, 2}) − c({2})) = 7.5 X |T |!(|S| − |T | − 1)! Sh(1, {1, 3}) = (c(T ∪ {i}) − c(T )) |S|! T ⊆{3} = 1/2 (c({1}) − c(∅)) + 1/2 (c({1, 3}) − c({3})) = 7.5 X |T |!(|S| − |T | − 1)! Sh(1, {1, 2, 3}) = (c(T ∪ {i}) − c(T )) |S|! T ⊆{2,3} = 1/3 (c({1}) − c(∅)) + 1/6 (c({1, 2}) − c({2})) +1/6 (c({1, 3}) − c({3})) + 1/3 (c({1, 2, 3}) − c({2, 3})) = (1/3)9 + (1/6)6 + (1/6)6 + (1/3)8 = 7 + 2/3 (e) Here, we will focus on computing the Shapley value for player 1 to the set {1, 2, 3}. The remaining can be done in the same fashion. The orderings for the set {1, 2, 3} and the ordered based marginal contributions are as follows: 1 ← 2 ← 3 ⇒ c({1}) − c(∅) = 9 1 ← 3 ← 2 ⇒ c({1}) − c(∅) = 9 2 ← 1 ← 3 ⇒ c({1, 2}) − c({2}) = 6 3 ← 1 ← 2 ⇒ c({1, 3}) − c({3}) = 6 3 ← 2 ← 1 ⇒ c({1, 2, 3}) − c({2, 3}) = 8 2 ← 3 ← 1 ⇒ c({1, 2, 3}) − c({2, 3}) = 8 The Shapley value is the average of these 6 marginal contributions, i.e., Sh(1, {1, 2, 3}) = 1/6(9 + 9 + 6 + 6 + 8 + 8) = 7 + 2/3 (f) Verified. 2. Here our goal is to provide an example of the working of the VCG mechanism. Basically, this is just a numerical illustration of the general proof provided in the lecture. (a) The workings of the VCG mechanism for this problem are the following: • Bids: Each player {x, y, z} provides a reported valuation for the three allocations, Y Z i.e., bX x , bx , bx • Selection: The chosen alternative maximizes the sum of the players reported valuations, i.e., σ(b) = arg max bqx + bqy + bqz q∈{X,Y,Z} • Monetary transfers: The tax charged to player i given bid profile b is X σ(b) X σ(b ) ti (b) = bj − bj −i j6=i j6=i where σ(b−i ) = arg max X q∈{X,Y,Z} j6=i bqj (b) Here, show that bi = (viX , viY , viZ ) is a dominant strategy. Proof is essentially the same as the general proof provided in the lecture notes. Verify that the steps in the proof hold using the numbers given in this problem. 3. (a) Construct a two-player game that meets the following specifications: • The better reply graph has a cycle (i.e., it does not have the finite improvement property). • The row player has 2 actions and the column player has 3 actions. • There are no payoff “ties”, i.e. a player is never indifferent between two actions. • There is a unique pure Nash equilibrium. Be sure to sketch the better reply graph. (b) Is this possible when each player only has two moves? One possibility is the figure below: 1,2 1,-1 -1,1 -1,2 -1,1 1,-1 The specifications are not possible for a 2-by-2 game. 4. Complete the payoffs in the following game so that it is a potential game: T M B L 1,? ?,4 1,? C 2,? 5,3 6,? R 1,? 2,? 7,? The idea is to use the potential function property: φ(ai , a−i ) − φ(a0i , a−i ) = ui (ai , a−i ) − ui (a0i , a−i ) to construct a feasible potential function with the information provided. In this case, there is not a unique solution. This potential function can them be used to complete the unknown costs. One possible potential function is: 0 −3 0 1 0 1 , 0 1 6 where the TR, MC, BL elements were arbitrarily set to zero. This potential function forces the unknown payoffs to equal: T M B L 1,? 2,4 1,? C 2,? 5,3 6,? R 1,? 2,4 7,? Arbitrarily setting the BL payoff for the column player to zero and TL column player payoff results in: L 1,2 2,4 1,0 T M B C 2,-1 5,3 6,1 R 1,2 2,4 7,6 5. Consider the following two player game: a2 5, 2 0, 4 a1 b1 b2 0, 4 2, 6 (a) Define the empirical distribution of play for each player i ∈ {1, 2} as t−1 qiai (t) = qibi (t) = 1X I{ai (τ ) = ai } t 1 t τ =0 t−1 X I{ai (τ ) = bi }. τ =0 where I{ai (τ ) = ai } = 1 if ai (τ ) = ai and 0 otherwise. At time 1, we have q1b1 (1) = 1 and q2a2 (1) = 1. Accordingly, we have a1 (1) = arg max U1 (x, q2 (1)) = a1 x∈a1 ,b1 a2 (1) = arg max U2 (x, q1 (1)) = b2 . x∈a2 ,b2 Updating the empirical frequencies, we now have q1 (2) = q2 (2) = (1/2, 1/2). The next action profile is then a1 (2) = arg max U1 (x, q2 (2)) = a1 x∈a1 ,b1 a2 (2) = arg max U2 (x, q1 (2)) = b2 . x∈a2 ,b2 Updating the empirical frequencies, we now have q1 (3) = (2/3, 1/3) and q2 (3) = (1/3, 2/3). The next action profile is then a1 (3) = arg max U1 (x, q2 (3)) = a1 x∈a1 ,b1 a2 (3) = arg max U2 (x, q1 (3)) = b2 . x∈a2 ,b2 Updating the empirical frequencies, we now have q1 (4) = (3/4, 1/4) and q2 (4) = (1/4, 3/4). The next action profile is then a1 (4) = arg max U1 (x, q2 (4)) = b1 x∈a1 ,b1 a2 (4) = arg max U2 (x, q1 (4)) = b2 . x∈a2 ,b2 The action profile will stay at (b1 , b2 ) for all future times. (b) Suppose a(t) is s strict Nash equilibrium and this action profile is selected according to the fictitious play learning rule at time t. Then, for each player i ∈ {1, 2} we have Ui (ai (t), a−i (t)) > Ui (ai , a−i (t)), ∀ai 6= ai (t) Ui (ai (t), q−i (t)) ≥ Ui (ai , q−i (t)), ∀ai The action choices of each agent i ∈ {1, 2} at time t + 1 would be of the form ai (t + 1) = arg max Ui (ai , q−i (t + 1)) ai ∈Ai = arg max ai ∈Ai t 1 Ui (ai , q−i (t)) + Ui (ai , a−i (t)) t+1 t+1 where the equality comes from expanding out the definition of the empirical frequencies. Since ai (t) is the argument that maximizes each term in the above summation, then ai (t + 1) = ai (t). Hence, the statement is true. (c) For the three player game, the ensuring action profile would be a(1) = (b1 , b2 , b3 ) which is a Nash equilibrium under the situation where x < 1. However, if x is a negative, e.g., x = −100, at the ensuing time a3 (2) = a3 which disproves the statement.