Best-Reply Mechanisms Noam Nisan, Michael Schapira and Aviv Zohar 1 On The Agenda • Best-Reply Dynamics • Convergence issues - Max Solvable Games • Strategic issues – Universally Max Solvable Games. • Best Reply as a Mechanism • Examples – Single Item Auction, Matching, Congestion Control. 2 Best-Reply Dynamics • Repeatedly: – Fix the strategies of all players but one. – Set that player’s strategy to be a best reply to the others. • Greedy, myopic. • A natural naïve approach for computing pure Nash. • Often used as an actual strategy (Internet protocols, markets…) • Does it make sense to use best-reply in such settings? 3 Example: Battle of the Sexes Column Player 2,1 0,0 0,0 1,2 Row Player 4 Three Desirable Properties • An equilibrium point – pure Nash – At some point in time everything settles down. – Does not have to exist (e.g. rock-paper-scissors). • (Fast) convergence to equilibrium – Polynomial in the size of strategy spaces. • Incentive Compatibility – Players will want to follow the prescribed strategy. 5 Potential Games • Defined using better-reply dynamics [Monderer&Shapley] • Potential games = all games for which better reply always converges. • Convergence may take exponential time. – It is PLS-Complete to find a pure Nash. [Fabrikant, Papadimitriou, Talwar] • Not incentive compatible (an example later). 6 Max Dominated Strategies • Definition: A strategy is max-dominated if it is not a best-reply to any strategy-profile of the other players. – Any strictly-dominated strategy is max-dominated. – Ties can be handled too. (Not in this talk.) 1 3 2 2 1 0 3 2 -1 Max Dominated Strategy 7 Max Solvable Games • Definition: A max-solvable game is a game in which iterated elimination of max-dominated strategies leaves only one strategy for each player. 1,2 3,5 5,1 4,1 2,0 1,2 8 Convergence • Theorem: max-solvable games have a unique pure Nash equilibrium. • Theorem: in max-solvable games, with n players, any (round-robin) best-reply dynamics converges in n(Si mi ) steps. – mi is the size of the strategy-space of player i. 9 Asynchronous Convergence • Asynchronous Convergence – Players do not have to act one at a time. – Best-reply relies on the current action of others. What if these messages get delayed? 2,1 0,0 0,0 1,2 10 Asynchronous Convergence • Theorem: Max-solvable games converge in any asynchronous timing that – does not delay any player’s activation indefinitely. – does not delay messages indefinitely. 11 Incentive Compatibility • Prescribed behavior: Best-Reply. – Will you follow it? • Notice: not a fully observable setting. A player does not always know the utilities of others. – To play best-reply a player only needs to know his own utility and the actions of others. • Max solvable games are not enough to guarantee incentive compatibility. 12 Example: Not Incentive Compatible Column Player 5,3 0,0 10,1 4,4 Row Player 13 Univesally Max Dominated Strategies • Definition: A set of strategies for some player is universally-max-dominated if its best payoff is strictly worse than all payoffs of the other strategies . 8 7 9 5 6 8 3 2 1 3 4 0 Not universallymax-dominated Universally-maxdominated 14 Univesally Max Solvable Games • Definition: A game is universally max-solvable if repeated elimination of universally-max dominated strategies leaves only one strategy profile. • Every universally-max-solvable game is also max-solvable 15 Universally-Max-Solvable Games • Theorem: The pure-Nash equilibrium in universally-max-solvable games is Collusionproof. – No group of players can change strategies without hurting at least one member. • Corollary: The pure-Nash is also Pareto optimal. 16 Best Reply Mechanisms • Players have hidden utility functions (Their types) • For simplicity, we assume a central mechanism that queries them about best-replies. • The goal: to decide on a strategy profile for them to play that is hopefully a pure Nash. • Needed: A penalty that the mechanism can activate to punish players that did not converge. – Natural in our examples. – Needs to be worse than the equilibrium outcome. 17 Best Reply Mechanisms • The mechanism: – Start with some strategy profile. – Go over the players in round-robin order and repeatedly update their best-reply. – If in some round no one changes strategy, stop and output the strategy profile. – If a certain (polynomial) number of rounds have passed and players still did not converge, invoke the penalty. 18 Best Reply Mechanisms Theorem: For a universally-max-solvable game the given mechanism is incentive compatible in ex-post Nash equilibrium. Meaning: • when queried you will always report your best-reply, and not some other strategy. • The result of the mechanism will be the pure-Nash equilibrium of the game. • “Ex-post” means that you will not act differently even if you knew the specific utility functions of all others. • All you assume: they also play best-reply. 19 Examples of Universally-Max-Solvable Games 20 Single Item Auction • • • • A single item is being auctioned. Each player has a private value in {1,2…,k}. Players announce what they are willing to pay. Highest bidder gets the item for his bid. (Ties are broken in some predefined way) 4 5 21 Single Item Auction • Utility of a player: 7 – 0 if he did not win. – Valuation minus payment if he did win. 4 Best Reply Strategy: • If Bid>Valuation decrease bid to valuation. (this involves tie breaking) • If not highest bidder and Bid<Valuation increase bid by 1. 22 The Mechanism • Start at any initial bids (Not necessarily 0) • Query players in order and ask if they want to change their bid • When no one wants to change, allocate the item. • If there is no convergence after k*n2 rounds give the item to no one. • Notice: – We do not force ascending bids. – Do not have to start at 0 23 Single Item Auction • Theorem: The single item auction is universallymax-solvable (after tie breaking). • Therefore: – A unique pure Nash exists. – We converge to it quickly if everyone is truthful – The mechanism we suggested is incentive compatible • Note that this is just the English auction behavior (but with rules that are less strict). 24 Congestion Control The setting: • A simplified model of packets flowing through a computer network. • Assume a network graph with capacities on the edges (Like a flow problem). 3 3 2 1 4 1 2 25 Congestion Control • Flows have a fixed unchangeable single path. • Vertices that get more flow than they can send out must dump some. S1 T1 3 3 2 1 4 T2 1 2 S2 26 Congestion Control Policy of the vertices: • Distribute the capacity of an edge equally between flows. • If some flow does not use its full share, distribute it evenly among the others. 1 • Similar to the fair-queuing 7 strategy in the Internet 5 • Maximizes the minimal flow. 4 27 Congestion Control Game • Each flow is a player. • Utility of a player: How much he manages to send through. • Decides alone how much to send through the network. • Players do not know the structure of the network. • Only know how much of their flow goes through, or if there is free capacity. 28 Congestion Control Best-reply strategy: – If there is free capacity increase your flow. – If you lose some of your flow decrease your flow. • (This is tie breaking between outcomes with equal payoff) • THM: congestion control is universally-max solvable. • Natural Penalty: Everyone sends full flows. • We thus have: – A Pareto optimal pure Nash that maximizes min flow. – Fast Convergence. – Incentive compatibility of following best-reply. 29 Stable Roommates • A set of college students needs to be paired up to share dorm rooms. • Each student has strict preferences over the other students (these are private). • We allow students to announce a single person they want to pair up with. 30 Stable Roommates Game • A player gets the utility associated with the roommate he selected if: – that roommate selected him – that roommate would prefer him over his current selection. • Nash equilibria in this game are stable matchings • There may be several. 31 Stable Roommates • The mechanism: – Allow students to iteratively update their selection – Stop after students no longer change – If after a while players do not stop, match no one. 32 Preference Cycles • Definition: A preference cycle is a cycle of players such that each player prefers the following player more than the previous player. 33 Stable Roomates • Theorem: A roommate matching game is that has no preference cycle is a universally-max-solvable game. • Example of no-preference-cycle: bipartite graphs with an agreed preference. (Med. students and hospitals) • Therefore for no-preference-cycle cases: – There is a unique stable matching. – Best-reply converges to it (asynchronously) and quickly – The mechanism we offered is incentive compatible. 34 Thanks! 35