Leading dynamics to good behavior

advertisement
Connections between Learning
Theory, Game Theory, and
Optimization
Lecture 14, October 7th 2010
Maria Florina (Nina) Balcan
Improved Equilibria via Public
Service Advertising
Good equilibria, Bad equilibria
Many games have both bad and good equilibria.
•
In some places, everyone drives their own car. In some,
everybody uses and pays for good public transit.
Fair cost-sharing
Fair cost-sharing: n players in weighted directed graph G.
Player i wants to get from si to ti, and they share cost
of edges they use with others.
G
Fair cost-sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti.
•
All players share cost of edges they use with others.
•
Each player wants to minimize his own cost.
•
s
Good equilibrium: all use edge of cost 1.
1
n
(paying 1/n each)
Bad equilibrium: all use edge of cost n.
t
(paying 1 each)
Inefficiency of equilibria, PoA and PoS
Price of Anarchy (PoA): ratio of worst Nash equilibrium to OPT.
[Koutsoupias-Papadimitriou’99]
Price of Stability (PoS): ratio of best Nash equilibrium to OPT.
[Anshelevich et. al, 2004]
E.g., for fair cost-sharing, PoS is log(n), whereas PoA is n.
Significant effort spent on understanding these in CS.
“Algorithmic Game Theory”, Nisan, Roughgarden, Tardos, Vazirani
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
PoA is n; PoS is log(n).
PoA is O(n):in any Nash no player pays more than OPT
s
PoA is (n):
s
1
n
t
1
n
t
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
PoA is n; PoS is log(n).
PoS is (log(n)):
0
0
s1
1
0
0
…
1/2
t
0
sn
1/n-1
1+²
1/n
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
PoA is n; PoS is log(n).
PoS is (log(n)):
0
0
s1
1
0
0
…
1/2
t
0
sn
1/n-1
1+²
1/n
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
PoA is n; PoS is log(n).
PoS is O(log(n)): potential function argument
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
is
Social cost of
where
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
is
Social cost of
where
A player moves, change in player’s cost = change in potential
Proof: player i moves, get from S to S’; let A be the edges in S
but not in S’, and B the edges in S’ but not in S.
Its change in cost:
Change in the potential
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
is
Social cost of
where
Fair Cost Sharing
•
n players in directed graph G, each edge e costs ce.
•
Player i wants to get from si to ti, and minimize its cost.
•
all players share cost of edges they use with others.
PoS is O(log(n)): potential function argument
•
Iterate best-response dynamics starting from an optimal
solution [i.e, while there is a player that can improve, pick an
arbitrary such player and let him to best response].
•
•
Potential always decreases, finite # of states, so
reach a pure Nash.
The potential does not increase & reach a pure Nash of
cost · H(n) ¢ OPT.
Congestion games more generally
Game defined by n players and m resources.
• Cost of a resource j is a function fj(nj) of the number nj of
players using it.
• Each player i chooses a set of resources (e.g., a path) from
collection Si of allowable sets of resources (e.g., paths from si to ti).
• Cost incurred by player i is the sum, over all resources being
used, of the cost of the resource.
• Generic potential function:
Best-response dynamics always gives an equilibrium.
Congestion games more generally
• Nice general class of games with many players.
• Always have a pure-strategy equilibrium.
• Have a potential function s.t. whenever a player
switches, potential drops by exactly that player’s
improvement.
– Best-response dynamics always gives an equilibrium.
• But maybe a large gap between the quality of the best
and the worst equilibrium.
• Lots of work on understanding properties of these games
and quality of their equilibria.
Good equilibria, Bad equilibria
Many games have both bad and good equilibria.
•
In some places, everyone drives their own car. In some,
everybody uses and pays for good public transit.
Guiding from Bad to Good
Can a helpful authority encourage (guide) behavior to
move from a bad state to a good state?
Standard motivation for PoS:
If a central authority could suggest a low-cost Nash (ride
public transit), and everyone followed the suggestion, then
this would be stable.
Price of Anarchy (PoA): ratio of worst Nash equilibrium to OPT.
Price of Stability (PoS): ratio of best Nash equilibrium to OPT.
Guiding from Bad to Good
What if only some  fraction
will pay attention?
[Balcan-Blum-Mansour, SODA 2009]
•
Can the authority guide behavior to a good state?
•
Will it just snap back? How does this depend on ?
Main Model
0. n players initially playing some arbitrary equilibrium.
1. Authority launches advertising, proposing joint action sad.
0 0
s1
1
1 1
0
…
1
sn
k
Main Model
0. n players initially playing some arbitrary equilibrium.
1. Authority launches advertising, proposing joint action sad.
Each player i follows with probability .
Call players that follow receptive players
0 0
s1
1
1 1
0
…
1
sn
k
Main Model
0. n players initially playing some arbitrary equilibrium.
1. Authority launches advertising, proposing joint action sad.
Each player i follows with probability .
Call players that follow receptive players
2. Remaining (non-receptive) players fall to some arbitrary
equilibrium for themselves, given play of receptive players.
3. All players follow best-response dynamics to an overall Nash
equilibrium.
Notes:
potential games, pure Nash eqs.
social cost:
Main Results
(PoS = log(n), PoA = n)
•
If only a constant fraction  of the players follow the
advice, then we can still get within O(1/) of the PoS.
•
Extend to cost-sharing + linear delays.
(PoS = 1, PoA = (n2))
• Threshold behavior: for  > ½, can get ratio O(1), but for
 < ½, ratio stays (n2). (assume degrees (log n)).
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Note: this is best you can hope for. E.g., k =2n.
0 0
s1
1
1 1
t
0
…
1
sn
k
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Advertiser proposes OPT (any apx also works)
Phase 1:
random vars
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Cost of non-receptive players at the end of Phase 2
- In any NE a non-receptive player i, can’t improve by switching to
his path PiOPT in OPT.
- Moreover, this option is guaranteed to be at least as
good as if other NR players didn’t exist.
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Cost of non-receptive players at the end of Phase 2
- In any NE a non-receptive player i, can’t improve by switching to
his path PiOPT in OPT.
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Cost of non-receptive players at the end of Phase 2
- In any NE a non-receptive player i, can’t improve by switching to
his path PiOPT in OPT.
- Calculate total cost of these guaranteed options.
Rearrange sum...
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Cost of non-receptive players at the end of Phase 2
Cost of receptive players at the end of Phase 2
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Cost of non-receptive players at the end of Phase 2
Cost of receptive players at the end of Phase 2
Use: X ~ Bi(n,p)
Fair Cost Sharing
(PoS = log(n), PoA = n)
If only a constant fraction  of the players follow the
advice, then we get within O(1/) of the PoS.
Expected total cost at the end of Phase 2: O(OPT/).
In Phase 3, potential argument shows behavior cannot get
worse by more than an additional log(n) factor.
Cost Sharing, Extension
+ linear delays:
- Still get same guarantee, but proof is trickier
Problem: can’t argue as if remaining NR players didn’t
exist since they add to delays
Cost Sharing, Extension
+ linear delays:
- Still get same guarantee, but proof is trickier
- Shadow game wrt non-receptieve players: pure linear latency
fns. Offset defined by equilib. at end of phase 2.
- This game has good PoA (5/2) .
# users on e at end of phase 2
Cost Sharing, Extension
+ linear delays:
- Still get same guarantee, but proof is trickier
- Shadow game: pure linear latency fns
- Behavior of NR at end of phase 2 is equilib for this game too.
- Show
Cost of the of nonreceptive players at the end of step 2: O(OPT/).
Cost Sharing, Extension
+ linear delays:
- Still get same guarantee, but proof is trickier
Cost of the of nonreceptive players at the end of step 2: O(OPT/).
Need to still argue about the cost of the receptive players.
Edge by edge charging:
- more receptive players, loose a factor of two compared to OPT
- more non-receptive players, already paid for, loose a factor of two
Party affiliation games
• Given graph G, each edge labeled + or -.
• Vertices have two actions: RED or BLUE. +
+
+
Pay 1 for each + edge with endpoints of different
color, and each – edge with endpoints of same color.
• Special cases:
• All + edges is consensus game.
• All – edges is cut-game.
-
Party affiliation games
OPT is an equilibrium so PoS = 1.
But even for consensus games, PoA = (n2)
Clique with perfect
matching removed
all edges labeled plus
Party affiliation games
(PoS = 1, PoA = (n2))
- Threshold behavior: for  > ½, can get ratio O(1), but for
 < ½, ratio stays (n2). (assume degrees (log n)).
(lower bound)
- Same example as for consensus PoA, but sparser across cut.
Degree ° n/8 across cut,
°=1/2-®
Party affiliation games
(PoS = 1, PoA = (n2))
- Threshold behavior: for  > ½, can get ratio O(1), but for
 < ½, ratio stays (n2). (assume degrees (log n)).
(lower bound)
- Same example as for consensus PoA, but sparser across cut.
Degree ° n/8 across cut,
°=1/2-®
• For large n, whp all nodes
have at most a 1/2-°/2
fraction on neighbs in R
• Initially, each node has a
°/4 fraction on nodes of the
other color.
• So, players “locked” into
place
Party affiliation games
(PoS = 1, PoA = (n2))
- Threshold behavior: for  > ½, can get ratio O(1), but for
 < ½, ratio stays (n2). (assume degrees (log n)).
(upper bound, consensus games)
- Advertising strategy = follow OPT, e.g. all red.
- By Hoeffding, all nodes with degree log n/(®-1/2)2 have more
than half of their neighbors in the set R, with prob. 1-1/n.
- At the end of step two, all nodes are red.
Note: for general cut games, OPT might not have zero cost
for each player.
Party affiliation games
(PoS = 1, PoA = (n2))
- Threshold behavior: for  > ½, can get ratio O(1), but for
 < ½, ratio stays (n2). (assume degrees (log n)).
(upper bound, general party affiliation games)
- Advertising strategy = follow OPT.
- Split nodes into those incurring low-cost vs those incurring
high-cost under OPT.
- Show that low-cost will switch to behavior in OPT. For
high-cost, don’t care.
- Cost only improves in final best-response process.
Party affiliation games
(PoS = 1, PoA = (n2))
- Threshold behavior: for  > ½, can get ratio O(1), but for
 < ½, ratio stays (n2). (assume degrees (log n)).
(upper bound, general party affiliation games)
• S is a ¯-dominating if every vertex not in S has more than a ½+¯
fraction of neighbs in S.
• If ® > ½+2¯, then set R of receptive players is ¯-dominating whp
• Split nodes into those incurring low-cost (less than a ¯-fraction
of incident edges incur a cost in OPT) vs those incurring highcost under OPT.
• Low-cost will switch to behavior in OPT. For high-cost, can
only incur a cost of only 1/¯ more their cost in OPT.
Summary
Analyze ability of a central authority to guide behavior
to a good equilibrium even if only ® fraction of players
are paying attention.
Influencing Dynamics
A more adaptive model [Balcan Blum Mansour, ICS 2010]
Each player has a few abstract actions.
Uses a learning, experts based alg. to decide which one to use
Expert 1
Expert 2
Play Best
Response
Play the Advertised
Behavior
[no rigid separation between receptive vs non-receptive players]
Open Questions
Get around problem of natural dynamics converging
to poor equilibrium without central authority by
giving players more information about the game?
Download