Best-Reply Mechanisms Noam Nisan, Michael Schapira and Aviv Zohar 1

advertisement
Best-Reply Mechanisms
Noam Nisan, Michael Schapira and
Aviv Zohar
1
On The Agenda
• Best-Reply Dynamics
• Convergence issues - Max Solvable Games
• Strategic issues – Universally Max Solvable
Games.
• Best Reply as a Mechanism
• Examples
– Single Item Auction, Matching, Congestion Control.
2
Best-Reply Dynamics
• Repeatedly:
– Fix the strategies of all players but one.
– Set that player’s strategy to be a best reply to the others.
• Greedy, myopic.
• A natural naïve approach for computing pure Nash.
• Often used as an actual strategy (Internet protocols,
markets…)
• Does it make sense to use best-reply in such settings?
3
Example: Battle of the Sexes
Column
Player
2,1
0,0
0,0
1,2
Row
Player
4
Three Desirable Properties
• An equilibrium point – pure Nash
– At some point in time everything settles down.
– Does not have to exist (e.g. rock-paper-scissors).
• (Fast) convergence to equilibrium
– Polynomial in the size of strategy spaces.
• Incentive Compatibility
– Players will want to follow the prescribed strategy.
5
Potential Games
• Defined using better-reply dynamics
[Monderer&Shapley]
• Potential games = all games for which better
reply always converges.
• Convergence may take exponential time.
– It is PLS-Complete to find a pure Nash. [Fabrikant,
Papadimitriou, Talwar]
• Not incentive compatible (an example later).
6
Max Dominated Strategies
• Definition: A strategy is max-dominated
if it is not a best-reply to any strategy-profile of
the other players.
– Any strictly-dominated strategy is max-dominated.
– Ties can be handled too. (Not in this talk.)
1
3
2
2
1
0
3
2
-1
Max
Dominated
Strategy
7
Max Solvable Games
• Definition: A max-solvable game is a game in
which iterated elimination of max-dominated
strategies leaves only one strategy for each
player.
1,2
3,5
5,1
4,1
2,0
1,2
8
Convergence
• Theorem: max-solvable games have a unique
pure Nash equilibrium.
• Theorem: in max-solvable games, with n
players, any (round-robin) best-reply
dynamics converges in n(Si mi ) steps.
– mi is the size of the strategy-space of player i.
9
Asynchronous Convergence
• Asynchronous Convergence
– Players do not have to act one at a time.
– Best-reply relies on the current action of others.
What if these messages get delayed?
2,1
0,0
0,0
1,2
10
Asynchronous Convergence
• Theorem: Max-solvable games converge in any
asynchronous timing that
– does not delay any player’s activation indefinitely.
– does not delay messages indefinitely.
11
Incentive Compatibility
• Prescribed behavior: Best-Reply.
– Will you follow it?
• Notice: not a fully observable setting.
A player does not always know the utilities of
others.
– To play best-reply a player only needs to know his
own utility and the actions of others.
• Max solvable games are not enough to guarantee
incentive compatibility.
12
Example: Not Incentive Compatible
Column
Player
5,3
0,0
10,1
4,4
Row
Player
13
Univesally Max Dominated Strategies
• Definition: A set of strategies for some player is
universally-max-dominated if its best payoff is
strictly worse than all payoffs of the other
strategies .
8
7
9
5
6
8
3
2
1
3
4
0
Not universallymax-dominated
Universally-maxdominated
14
Univesally Max Solvable Games
• Definition: A game is universally max-solvable if
repeated elimination of universally-max
dominated strategies leaves only one strategy
profile.
• Every universally-max-solvable game is also
max-solvable
15
Universally-Max-Solvable Games
• Theorem: The pure-Nash equilibrium in
universally-max-solvable games is Collusionproof.
– No group of players can change strategies without
hurting at least one member.
• Corollary: The pure-Nash is also Pareto optimal.
16
Best Reply Mechanisms
• Players have hidden utility functions (Their
types)
• For simplicity, we assume a central mechanism
that queries them about best-replies.
• The goal: to decide on a strategy profile for them
to play that is hopefully a pure Nash.
• Needed: A penalty that the mechanism can
activate to punish players that did not converge.
– Natural in our examples.
– Needs to be worse than the equilibrium outcome.
17
Best Reply Mechanisms
• The mechanism:
– Start with some strategy profile.
– Go over the players in round-robin order and
repeatedly update their best-reply.
– If in some round no one changes strategy, stop and
output the strategy profile.
– If a certain (polynomial) number of rounds have
passed and players still did not converge, invoke the
penalty.
18
Best Reply Mechanisms
Theorem: For a universally-max-solvable game the given
mechanism is incentive compatible in ex-post Nash
equilibrium.
Meaning:
• when queried you will always report your best-reply,
and not some other strategy.
• The result of the mechanism will be the pure-Nash
equilibrium of the game.
• “Ex-post” means that you will not act differently even if
you knew the specific utility functions of all others.
• All you assume: they also play best-reply.
19
Examples of
Universally-Max-Solvable
Games
20
Single Item Auction
•
•
•
•
A single item is being auctioned.
Each player has a private value in {1,2…,k}.
Players announce what they are willing to pay.
Highest bidder gets the item for his bid.
(Ties are broken in some predefined way)
4
5
21
Single Item Auction
• Utility of a player:
7
– 0 if he did not win.
– Valuation minus payment if he did win.
4
Best Reply Strategy:
• If Bid>Valuation decrease bid to valuation.
(this involves tie breaking)
• If not highest bidder and Bid<Valuation
increase bid by 1.
22
The Mechanism
• Start at any initial bids (Not necessarily 0)
• Query players in order and ask if they want to
change their bid
• When no one wants to change, allocate the item.
• If there is no convergence after k*n2 rounds give
the item to no one.
• Notice:
– We do not force ascending bids.
– Do not have to start at 0
23
Single Item Auction
• Theorem: The single item auction is universallymax-solvable (after tie breaking).
• Therefore:
– A unique pure Nash exists.
– We converge to it quickly if everyone is truthful
– The mechanism we suggested is incentive compatible
• Note that this is just the English auction behavior
(but with rules that are less strict).
24
Congestion Control
The setting:
• A simplified model of packets flowing through a
computer network.
• Assume a network graph with capacities on the
edges (Like a flow problem).
3
3
2
1
4
1
2
25
Congestion Control
• Flows have a fixed unchangeable single path.
• Vertices that get more flow than they can send
out must dump some.
S1
T1
3
3
2
1
4
T2
1
2
S2
26
Congestion Control
Policy of the vertices:
• Distribute the capacity of an edge equally
between flows.
• If some flow does not use its full share, distribute
it evenly among the others.
1
• Similar to the fair-queuing
7
strategy in the Internet
5
• Maximizes the minimal flow. 4
27
Congestion Control Game
• Each flow is a player.
• Utility of a player: How much he manages to
send through.
• Decides alone how much to send through the
network.
• Players do not know the structure of the network.
• Only know how much of their flow goes
through, or if there is free capacity.
28
Congestion Control
Best-reply strategy:
– If there is free capacity increase your flow.
– If you lose some of your flow decrease your flow.
• (This is tie breaking between outcomes with equal payoff)
• THM: congestion control is universally-max solvable.
• Natural Penalty: Everyone sends full flows.
• We thus have:
– A Pareto optimal pure Nash that maximizes min flow.
– Fast Convergence.
– Incentive compatibility of following best-reply.
29
Stable Roommates
• A set of college students needs to be paired up to
share dorm rooms.
• Each student has strict preferences over the other
students (these are private).
• We allow students to announce a single person
they want to pair up with.
30
Stable Roommates Game
• A player gets the utility associated with the
roommate he selected if:
– that roommate selected him
– that roommate would prefer him over his current
selection.
• Nash equilibria in this
game are stable matchings
• There may be several.
31
Stable Roommates
• The mechanism:
– Allow students to iteratively update their selection
– Stop after students no longer change
– If after a while players do not stop, match no one.
32
Preference Cycles
• Definition: A preference cycle is a cycle of
players such that each player prefers the
following player more than the previous player.
33
Stable Roomates
• Theorem: A roommate matching game is that has no
preference cycle is a universally-max-solvable game.
• Example of no-preference-cycle: bipartite graphs with
an agreed preference. (Med. students and hospitals)
• Therefore for no-preference-cycle cases:
– There is a unique stable matching.
– Best-reply converges to it (asynchronously) and quickly
– The mechanism we offered is incentive compatible.
34
Thanks!
35
Download