Connections between Learning Theory, Game Theory, and Optimization Lecture 14, October 7th 2010 Maria Florina (Nina) Balcan Improved Equilibria via Public Service Advertising Good equilibria, Bad equilibria Many games have both bad and good equilibria. • In some places, everyone drives their own car. In some, everybody uses and pays for good public transit. Fair cost-sharing Fair cost-sharing: n players in weighted directed graph G. Player i wants to get from si to ti, and they share cost of edges they use with others. G Fair cost-sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti. • All players share cost of edges they use with others. • Each player wants to minimize his own cost. • s Good equilibrium: all use edge of cost 1. 1 n (paying 1/n each) Bad equilibrium: all use edge of cost n. t (paying 1 each) Inefficiency of equilibria, PoA and PoS Price of Anarchy (PoA): ratio of worst Nash equilibrium to OPT. [Koutsoupias-Papadimitriou’99] Price of Stability (PoS): ratio of best Nash equilibrium to OPT. [Anshelevich et. al, 2004] E.g., for fair cost-sharing, PoS is log(n), whereas PoA is n. Significant effort spent on understanding these in CS. “Algorithmic Game Theory”, Nisan, Roughgarden, Tardos, Vazirani Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. PoA is n; PoS is log(n). PoA is O(n):in any Nash no player pays more than OPT s PoA is (n): s 1 n t 1 n t Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. PoA is n; PoS is log(n). PoS is (log(n)): 0 0 s1 1 0 0 … 1/2 t 0 sn 1/n-1 1+² 1/n Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. PoA is n; PoS is log(n). PoS is (log(n)): 0 0 s1 1 0 0 … 1/2 t 0 sn 1/n-1 1+² 1/n Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. PoA is n; PoS is log(n). PoS is O(log(n)): potential function argument Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. is Social cost of where Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. is Social cost of where A player moves, change in player’s cost = change in potential Proof: player i moves, get from S to S’; let A be the edges in S but not in S’, and B the edges in S’ but not in S. Its change in cost: Change in the potential Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. is Social cost of where Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. PoS is O(log(n)): potential function argument • Iterate best-response dynamics starting from an optimal solution [i.e, while there is a player that can improve, pick an arbitrary such player and let him to best response]. • • Potential always decreases, finite # of states, so reach a pure Nash. The potential does not increase & reach a pure Nash of cost · H(n) ¢ OPT. Congestion games more generally Game defined by n players and m resources. • Cost of a resource j is a function fj(nj) of the number nj of players using it. • Each player i chooses a set of resources (e.g., a path) from collection Si of allowable sets of resources (e.g., paths from si to ti). • Cost incurred by player i is the sum, over all resources being used, of the cost of the resource. • Generic potential function: Best-response dynamics always gives an equilibrium. Congestion games more generally • Nice general class of games with many players. • Always have a pure-strategy equilibrium. • Have a potential function s.t. whenever a player switches, potential drops by exactly that player’s improvement. – Best-response dynamics always gives an equilibrium. • But maybe a large gap between the quality of the best and the worst equilibrium. • Lots of work on understanding properties of these games and quality of their equilibria. Good equilibria, Bad equilibria Many games have both bad and good equilibria. • In some places, everyone drives their own car. In some, everybody uses and pays for good public transit. Guiding from Bad to Good Can a helpful authority encourage (guide) behavior to move from a bad state to a good state? Standard motivation for PoS: If a central authority could suggest a low-cost Nash (ride public transit), and everyone followed the suggestion, then this would be stable. Price of Anarchy (PoA): ratio of worst Nash equilibrium to OPT. Price of Stability (PoS): ratio of best Nash equilibrium to OPT. Guiding from Bad to Good What if only some fraction will pay attention? [Balcan-Blum-Mansour, SODA 2009] • Can the authority guide behavior to a good state? • Will it just snap back? How does this depend on ? Main Model 0. n players initially playing some arbitrary equilibrium. 1. Authority launches advertising, proposing joint action sad. 0 0 s1 1 1 1 0 … 1 sn k Main Model 0. n players initially playing some arbitrary equilibrium. 1. Authority launches advertising, proposing joint action sad. Each player i follows with probability . Call players that follow receptive players 0 0 s1 1 1 1 0 … 1 sn k Main Model 0. n players initially playing some arbitrary equilibrium. 1. Authority launches advertising, proposing joint action sad. Each player i follows with probability . Call players that follow receptive players 2. Remaining (non-receptive) players fall to some arbitrary equilibrium for themselves, given play of receptive players. 3. All players follow best-response dynamics to an overall Nash equilibrium. Notes: potential games, pure Nash eqs. social cost: Main Results (PoS = log(n), PoA = n) • If only a constant fraction of the players follow the advice, then we can still get within O(1/) of the PoS. • Extend to cost-sharing + linear delays. (PoS = 1, PoA = (n2)) • Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)). Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Note: this is best you can hope for. E.g., k =2n. 0 0 s1 1 1 1 t 0 … 1 sn k Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Advertiser proposes OPT (any apx also works) Phase 1: random vars Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Cost of non-receptive players at the end of Phase 2 - In any NE a non-receptive player i, can’t improve by switching to his path PiOPT in OPT. - Moreover, this option is guaranteed to be at least as good as if other NR players didn’t exist. Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Cost of non-receptive players at the end of Phase 2 - In any NE a non-receptive player i, can’t improve by switching to his path PiOPT in OPT. Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Cost of non-receptive players at the end of Phase 2 - In any NE a non-receptive player i, can’t improve by switching to his path PiOPT in OPT. - Calculate total cost of these guaranteed options. Rearrange sum... Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Cost of non-receptive players at the end of Phase 2 Cost of receptive players at the end of Phase 2 Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Cost of non-receptive players at the end of Phase 2 Cost of receptive players at the end of Phase 2 Use: X ~ Bi(n,p) Fair Cost Sharing (PoS = log(n), PoA = n) If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS. Expected total cost at the end of Phase 2: O(OPT/). In Phase 3, potential argument shows behavior cannot get worse by more than an additional log(n) factor. Cost Sharing, Extension + linear delays: - Still get same guarantee, but proof is trickier Problem: can’t argue as if remaining NR players didn’t exist since they add to delays Cost Sharing, Extension + linear delays: - Still get same guarantee, but proof is trickier - Shadow game wrt non-receptieve players: pure linear latency fns. Offset defined by equilib. at end of phase 2. - This game has good PoA (5/2) . # users on e at end of phase 2 Cost Sharing, Extension + linear delays: - Still get same guarantee, but proof is trickier - Shadow game: pure linear latency fns - Behavior of NR at end of phase 2 is equilib for this game too. - Show Cost of the of nonreceptive players at the end of step 2: O(OPT/). Cost Sharing, Extension + linear delays: - Still get same guarantee, but proof is trickier Cost of the of nonreceptive players at the end of step 2: O(OPT/). Need to still argue about the cost of the receptive players. Edge by edge charging: - more receptive players, loose a factor of two compared to OPT - more non-receptive players, already paid for, loose a factor of two Party affiliation games • Given graph G, each edge labeled + or -. • Vertices have two actions: RED or BLUE. + + + Pay 1 for each + edge with endpoints of different color, and each – edge with endpoints of same color. • Special cases: • All + edges is consensus game. • All – edges is cut-game. - Party affiliation games OPT is an equilibrium so PoS = 1. But even for consensus games, PoA = (n2) Clique with perfect matching removed all edges labeled plus Party affiliation games (PoS = 1, PoA = (n2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)). (lower bound) - Same example as for consensus PoA, but sparser across cut. Degree ° n/8 across cut, °=1/2-® Party affiliation games (PoS = 1, PoA = (n2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)). (lower bound) - Same example as for consensus PoA, but sparser across cut. Degree ° n/8 across cut, °=1/2-® • For large n, whp all nodes have at most a 1/2-°/2 fraction on neighbs in R • Initially, each node has a °/4 fraction on nodes of the other color. • So, players “locked” into place Party affiliation games (PoS = 1, PoA = (n2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)). (upper bound, consensus games) - Advertising strategy = follow OPT, e.g. all red. - By Hoeffding, all nodes with degree log n/(®-1/2)2 have more than half of their neighbors in the set R, with prob. 1-1/n. - At the end of step two, all nodes are red. Note: for general cut games, OPT might not have zero cost for each player. Party affiliation games (PoS = 1, PoA = (n2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)). (upper bound, general party affiliation games) - Advertising strategy = follow OPT. - Split nodes into those incurring low-cost vs those incurring high-cost under OPT. - Show that low-cost will switch to behavior in OPT. For high-cost, don’t care. - Cost only improves in final best-response process. Party affiliation games (PoS = 1, PoA = (n2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)). (upper bound, general party affiliation games) • S is a ¯-dominating if every vertex not in S has more than a ½+¯ fraction of neighbs in S. • If ® > ½+2¯, then set R of receptive players is ¯-dominating whp • Split nodes into those incurring low-cost (less than a ¯-fraction of incident edges incur a cost in OPT) vs those incurring highcost under OPT. • Low-cost will switch to behavior in OPT. For high-cost, can only incur a cost of only 1/¯ more their cost in OPT. Summary Analyze ability of a central authority to guide behavior to a good equilibrium even if only ® fraction of players are paying attention. Influencing Dynamics A more adaptive model [Balcan Blum Mansour, ICS 2010] Each player has a few abstract actions. Uses a learning, experts based alg. to decide which one to use Expert 1 Expert 2 Play Best Response Play the Advertised Behavior [no rigid separation between receptive vs non-receptive players] Open Questions Get around problem of natural dynamics converging to poor equilibrium without central authority by giving players more information about the game?