Game Theory

advertisement

Game Theory • Developed to explain the optimal strategy in two-person interactions.

• Initially, von Neumann and Morganstern – Zero-sum games • John Nash – Nonzero-sum games • Harsanyi, Selten – Incomplete information

An example: Big Monkey and Little Monkey Big monkey w c c w c Little monkey w 0,0 9,1 4,4 5,3 What should Big Monkey do?

• If BM waits, LM will climb – BM gets 9 • If BM climbs, LM will wait – BM gets 4 • BM should wait.

• What about LM?

• Opposite of BM (even though we’ll never get to the right side of the tree)

An example: Big Monkey and Little Monkey • These strategies (w and cw) are called best

responses.

– Given what the other guy is doing, this is the best thing to do.

• A solution where everyone is playing a best response is called a Nash equilibrium. – No one can unilaterally change and improve things.

• This representation of a game is called extensive

form.

An example: Big Monkey and Little Monkey • What if the monkeys have to decide simultaneously?

Big monkey w c Little monkey w c w c 0,0 9,1 6-2,4 7-2,3 Now Little Monkey has to choose before he sees Big Monkey move Two Nash equilibria (c,w), (w,c) Also a third Nash equilibrium: Big Monkey chooses between c & w with probability 0.5 (mixed strategy)

An example: Big Monkey and Little Monkey • It can often be easier to analyze a game through a different representation, called

normal form

Little Monkey c v Big Monkey 5,3 4,4 c v 9,1 0,0

Choosing Strategies • In the simultaneous game, it’s harder to see what each monkey should do – Mixed strategy is optimal.

• Trick: How can a monkey maximize its payoff, given that it knows the other monkeys will play a Nash strategy?

• Oftentimes, other techniques can be used to prune the number of possible actions.

Eliminating Dominated Strategies • The first step is to eliminate actions that are worse than another action, no matter what.

w c Big monkey Little monkey w 0,0 Little Monkey will Never choose this path .

c w c 9,1 6-2,4 7-2,3 Or this one w 9,1 c 4,4 We can see that Big Monkey will always choose w.

So the tree reduces to: 9,1

Eliminating Dominated Strategies • We can also use this technique in normal form games: Column a b Row a b 9,1 5,3 4,4 0,0

Eliminating Dominated Strategies • We can also use this technique in normal form games: a b a 9,1 5,3 For any column action, row will prefer a.

b 4,4 0,0

Eliminating Dominated Strategies • We can also use this technique in normal form games: a b a 9,1 b 4,4 0,0 5,3 Given that row will pick a, column will pick b.

(a,b) is the unique Nash equilibrium.

Prisoner’s Dilemma • Each player can cooperate or defect Column cooperate defect Row cooperate defect -1,-1 0,-10 -10,0 -8,-8

Prisoner’s Dilemma • Each player can cooperate or defect Column cooperate defect Row cooperate defect -1,-1 0,-10 -10,0 -8,-8 Defecting is a dominant strategy for row

Prisoner’s Dilemma • Each player can cooperate or defect Column cooperate defect Row cooperate defect -1,-1 0,-10 -10,0 -8,-8 Defecting is also a dominant strategy for column

Prisoner’s Dilemma • Even though both players would be better off cooperating, mutual defection is the dominant strategy.

• What drives this?

– One-shot game – Inability to trust your opponent – Perfect rationality

Prisoner’s Dilemma • Relevant to: – Arms negotiations – Online Payment – Product descriptions – Workplace relations • How do players escape this dilemma?

– Play repeatedly – Find a way to ‘guarantee’ cooperation – Change payment structure

Definition of Nash Equilibrium • A game has n players.

• Each player i has a strategy set S

i

– This is his possible actions • Each player has a payoff function – p I : S R • A strategy t

i in S i

is a best response if there is no other strategy in S

i

that produces a higher payoff, given the opponent’s strategies.

Definition of Nash Equilibrium • A strategy profile is a list (s

1 , s 2 , …, s n

) of the strategies each player is using.

• If each strategy is a best response given the other strategies in the profile, the profile is a Nash

equilibrium.

• Why is this important?

– If we assume players are rational, they will play Nash strategies.

– Even less-than-rational play will often converge to Nash in repeated settings.

An Example of a Nash Equilibrium Row b a Column a 1,2 2,1 b 0,1 1,0 (b,a) is a Nash equilibrium.

To prove this: Given that column is playing a, row’s best response is b.

Given that row is playing b, column’s best response is a.

Finding Nash Equilibria – Dominated Strategies • What to do when it’s not obvious what the equilibrium is?

• In some cases, we can eliminate dominated

strategies.

– These are strategies that are inferior for every opponent action.

• In the previous example, row = a is dominated.

Example • A 3x3 example: a Column b Row a b 73,25 80,26 57,42 35,12 c 66,32 32,54 c 28,27 63,31 54,29

Example • A 3x3 example: Column a b Row a b 73,25 80,26 57,42 35,12 c 66,32 32,54 c 28,27 63,31 c dominates a for the column player 54,29

Example • A 3x3 example: Column a b Row a b 73,25 80,26 57,42 35,12 c 66,32 32,54 c 28,27 63,31 54,29 b is then dominated by both a and c for the row player.

Example • A 3x3 example: Column a b Row a b 73,25 80,26 57,42 35,12 c 66,32 32,54 c 28,27 63,31 54,29 Given this, b dominates c for the column player – the column player will always play b.

Example • A 3x3 example: Column a b Row a b 73,25 80,26 57,42 35,12 c 66,32 32,54 c 28,27 63,31 54,29 Since column is playing b, row will prefer c.

Row c a b a Example Column b 73,25 57,42 c 66,32 80,26 35,12 32,54 28,27 63,31 54,29 We verify that (c,b) is a Nash Equilibrium by observation: If row plays c, b is the best response for column.

If column plays b, c is the best response by row.

Example #2 • You try this one: Column a b c Row a b 2,2 1,2 1,1 4,1 4,0 3,5

Coordination Games • Consider the following problem: – A supplier and a buyer need to decide whether to adopt a new purchasing system.

Buyer new old Supplier new old 20,20 0,0 0,0 5,5 No dominated strategies!

Coordination Games Buyer new old new Supplier old 20,20 0,0 0,0 5,5 • This game has two Nash equilibria (new,new) and (old,old) •Real-life examples: Beta vs VHS, Mac vs Windows vs Linux, others?

• Each player wants to do what the other does • which may be different than what they say they’ll do • How to choose a strategy? Nothing is dominated.

Solving Coordination Games • Coordination games turn out to be an important real-life problem – Technology/policy/strategy adoption, delegation of authority, synchronization • Human agents tend to use “focal points” – Solutions that seem to make “natural sense” • e.g. pick a number between 1 and 10 • Social norms/rules are also used – Driving on the right/left side of the road • These strategies change the structure of the game

Price-matching Example • Two sellers are offering the same book for sale.

• This book costs each seller $25.

• The lowest price gets all the customers; if they match, profits are split.

• What is the Nash Equilibrium strategy?

Mixed strategies • Unfortunately, not every game has a pure strategy equilibrium.

– Rock-paper-scissors • However, every game has a mixed strategy Nash equilibrium.

• Each action is assigned a probability of play. • Player is indifferent between actions, given these probabilities.

Mixed Strategies • In many games (such as coordination games) a player might not have a pure strategy. • Instead, optimizing payoff might require a randomized strategy (also called a mixed strategy) Wife football shopping football 2,1 0,0 Husband shopping 0,0 1,2

Strategy Selection Wife football football 2,1 Husband shopping 0,0 shopping 0,0 1,2 If we limit to pure strategies: Husband: U(football) = 0.5 * 2 + 0.5 * 0 = 1 U(shopping) = 0.5 * 0 + 0.5 * 1 = ½ Wife: U(shopping) = 1, U(football) = ½ Problem: this won’t lead to coordination!

Mixed strategy • Instead, each player selects a probability associated with each action – Goal: utility of each action is equal – Players are indifferent to choices at this probability • a=probability husband chooses football • b=probability wife chooses shopping • Since payoffs must be equal, for husband: – b*1=(1-b)*2 b=2/3 • For wife: – a*1=(1-a)*2 = 2/3 • In each case, expected payoff is 2/3 – 2/9 of time go to football, 2/9 shopping, 5/9 miscoordinate • If they could synchronize ahead of time they could do better.

Row Example: Rock paper scissors rock Column paper 0,0 -1,1 rock paper 1,-1 0,0 scissors -1,1 1,-1 scissors 1,-1 -1,1 0,0

Setup • Player 1 plays rock with probability p r , scissors with probability p s , paper with probability 1-p r –p s • P2: Utility(rock) = 0*p 2 p s + p r -1 r + 1*p s – 1(1-p r –p s ) = • P2: Utility(scissors) = 0*p = 1 – 2p • P2: Utility(paper) = 0*(1-p = p r –p s r –p s s r + 1*(1 – p r –p s )+ 1*p r – p s ) – 1p r – 1p s Player 2 wants to choose a probability for each strategy so that the expected payoff for each strategy is the same.

Repeated games • Many games get played repeatedly • A common strategy for the husband-wife problem is to alternate – This leads to a payoff of 1, 2,1,2,… – 1.5 per week.

• Requires initial synchronization, plus trust that partner will go along.

• Difference in formulation: we are now thinking of the game as a repeated set of interactions, rather than as a one-shot exchange.

Repeated vs Stage Games • There are two types of multiple-action games: – Stage games: players take a number of actions and then receive a payoff.

• Checkers, chess, bidding in an ascending auction – Repeated games: Players repeatedly play a shorter game, receiving payoffs along the way.

• Poker, blackjack, rock-paper-scissors, etc

Analyzing Stage Games • Analyzing stage games requires backward

induction

• We start at the last action, determine what should happen there, and work backwards.

– Just like a game tree with extensive form.

• Strange things can happen here: – Centipede game • Players alternate – can either cooperate and get $1 from nature or defect and steal $2 from your opponent • Game ends when one player has $100 or one player defects.

Analyzing Repeated Games • Analyzing repeated games requires us to examine the expected utility of different actions.

• Assumption: game is played “infinitely often” – Weird endgame effects go away.

• Prisoner’s Dilemma again: – In this case, tit-for-tat outperforms defection.

• Collusion can also be explained this way.

– Short-term cost of undercutting is less than long-run gains from avoiding competition.

Download