IE616 Notes (4)

INTRODUCTION TO GAME THEORY K.S. MALLIKARJUNA RAO Abstract. This set of notes on “Game Theory” are based on the lectures given on multiple occasions. Contents List of Todos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part 1. Combinatorial Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 § 1. Combinatorial Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 § 2. Take Away Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 § 3. Game of Nim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 § 3.1. Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 § 3.2. Nimber Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 § 3.3. Solution of the Nim Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 § 4. Zermelo Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 § 5. Game of Hex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Part 2. § 6. Non-Cooperative Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Oligopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 § 6.1. Monopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 § 6.2. Cournot’s Duopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 § 6.3. Bertrand’s Duopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 § 7. Matching Pennies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 § 8. Rock-Paper-Scissors Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 § 9. Prisoner’s Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 § 10. BoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 § 11. Matrix Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 § 12. Continuous Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 § 13. Nonzero-sum Bimatix Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 § 14. Nonzero-sum Continuous Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 § 15. Lemke-Howson Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 § 16. Correlated Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Industrial Enginnering & Operations Research, IIT-Bombay, Powai, Mumbai– 400 076, India. Email: mallik.rao@iitb.ac.in URL: http://www.ieor.iitb.ac.in/˜mallik . 1 2 INTRODUCTION TO GAME THEORY § 17. Congestion and Potential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 § 18. Evolutionary Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 § 19. Replicator Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 § 20. Fictitious Play . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 § 21. Cooperative Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 § 22. Nucleolus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 § 23. Utility Under Certainity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Preference Relations and Utility Representation . . . . . . . . . . . . . . . . . 33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 § 23.1. List of Todos INTRODUCTION TO GAME THEORY 3 Part 1. Combinatorial Games § 1. Combinatorial Games (1.1). Definition. A combinatorial game is a game in which two players take turns making moves; both of them have complete information about what has happened in the game so far and what each player’s options are from each position. In particular combinatorial games have the following properties: ‚ ‚ ‚ ‚ ‚ There are two players who make moves alternatively. There are finite set of positions in the game. The rules specifying how each player can move to some position from the current position. Game ends when a player can’t move . The game ends eventually. (1.2). Definition. If the set of available moves depend only on the position and not on which of the two player is moving, then the game is called impartial games. Otherwise they are called partisan games. Tic-Tac-Toe is an example of impartial game whereas Chess is a partisan game. § 2. Take Away Game There is a pile of 21 sticks. In each turn, a player can take one, two, or three sticks. The player who takes the last piece is winner (Normal Play). To understand optimal strategy, suppose the number of sticks n ď 3. Obviously, the first player can remove all the sticks and can win the game. Suppose there are four sticks, then second player wins. More generally, if there are n sticks, then second player wins if n is a multiple of 4, otherwise first player wins. Thus with 21 sticks, first player wins. To prove this, we use mathematical induction. From the above, we know that the theorem is true for n P t1, 2, 3, 4u. So we assume n ě 5 and that the theorem is true for all k ă n. Now, applying division algorithm, we have n “ 4q `r with 0 ď r ď 3. Suppose r ‰ 0. In the first turn, Player 1 will remove r sticks, then the number of sticks available will be 4q. Now the induction hypothesis (since 4q ă n) shows that Player 1 will win. Next assume that r “ 0. In the first turn, Player 1 has to pick s P t1, 2, 3u sticks, Player can pick 4 ´ s sticks so that the remaining sticks will be 4pq ´ 1q ă n. Now induction hypothesis completes the proof. § 3. Game of Nim There are three (or more) piles or nim-heaps of stones. Players alternatively remove any number of stones from a pile until there are no stones left. There are two variations to decide on the winner. ‚ Normal Play - The player to remove the last stone wins. ‚ Misère Play - The player that is forced to take the last move loses. § 3.1. Positions. (3.1). Definition. In any impartial game, the position of the game is said to be in P-position if it secures a win for the previous player (the one who just made his move). The position is in N-posiiton if it secures win for the next player (the one to make a move). In normal play of Nim game with three heaps, p0, 0, 1q is P and p1, 1, 0q is P . To find, in general, whether a Nim position is P or P, we work backwards (using backward induction). ‚ ‚ ‚ ‚ Terminal positions are P. every position that can reach a P is N. Positions that only move to N positions is P. Repeat above procedure until all positions are labeled. 4 INTRODUCTION TO GAME THEORY §3.2. Nimber Arithmetic. To add two numbers, we first write their binary form and then take the exclusive or (XOR) of the corresponding digits (with the carry forward of the remainder). As an example, take 3 and 5. 3= ‘5= 6 011 ‘ 101 110 Note that in the XOR operation, 1 ‘ 1 “ 0 “ 0 ‘ 0 and 1 ‘ 0 “ 1 ‘ 1. Easiest way to look at this is that if we are adding an odd number of ones, then the answer is 1, an even number of ones gives the answer 0. § 3.3. Solution of the Nim Game. Nimsum of all the heaps play the key role in the solution. Note that the nimsum at the end is zero. There are other positions where nimsum can be zero, e.g., x ‘ x “ 0 for any x. (3.2). Theorem. Winning strategy in normal play Nim is to finish every move with a nimsum of 0. Proof of this theorem is split into couple of lemmas. (3.3). Lemma. If the nimsum is zero after a player’s turn, then the next move must change it to non-zero. Proof. Let the nimheap be px1 , x2 , ¨ ¨ ¨ , xn q and s “ x1 ‘ x2 ‘ ¨ ¨ ¨ ‘ xn . Let t “ y1 ‘ y2 ‘ ¨ ¨ ¨ ‘ yn be the sum of the heaps after the move. Note that xi “ yi for all i except for one say k. Now t “ 0 ‘ t “ s ‘ s ‘ t “ s ‘ px1 ‘ y1 q ‘ ¨ ¨ ¨ ‘ pxn ‘ yn qs ‘ pxk ‘ yk q Clearly if s “ 0, then t ‰ 0 (why!), completing the proof of the lemma. □ When numsum is zero, we think of the position as balanced. From a balanced position, we can only move to unbalanced position. But from unbalanced position, it is possible to move to either another unbalanced position or a balanced one. (3.4). Lemma. It is always possible to make the nimsum zero on your turn if it wasn’t already zero at the beginning of your turn. Proof. Let d be the position of the most significant bit in s. Choose a heap xk such that its most significant bit is also in position d (exists?). Now choose to make the new value of the heap yk “ s ‘ xk by removing xk ´ yk stones from the heap. Now the new nimsum is t “ s ‘ xk ‘ yk “ s ‘ xk ‘ xk ‘ s “ 0, completing the proof. □ Proof of the theorem. If you start off by making your first move so that nim-sum is zero, then on each turn your opponent will disturb the sum and you will in turn set it back to zeor. By Lemma (3.3), the opponent has no choice but to disturb the nimsum and by Lemma (3.4), you can set it back to zero. Eventually on your turn there will be no stones left with nim-sum of zero. □ (3.5). Exercise. Find strategy for Nim game with misère play. (3.6). Remark. The following points will help us while playing Nim game. ‚ Whenever possible, reduce to two heaps containing same number of stones each. Then mimic your opponent’s move. INTRODUCTION TO GAME THEORY 5 ‚ Visualising the binary arithmetic for large numbers is hard for us. An easy way to make the nimsum zero is always to leave even subpiles of the powers of two, starting with the largest subpile possible, where subpile is a group of coins within a nimheap. § 4. Zermelo Theorem (4.1). Theorem. Every finite perfect information two player game is determined i.e., ‚ first player has has a winning strategy, or ‚ second player has a winning strategy, ‚ both of them have strategies such that the game ends in a draw. Proof. We use mathematical induction on the depth of the game tree. If the depth is zero, the win/loss/draw is clearly determined. So assume that the game is determined when the depth ă n. Now assume that the depth is n. At this moments, there is a player who make a decision from the k possible ones (say). Each of these k decisions will lead to k sub-games T1 , T2 , ¨ ¨ ¨ , Tk . The depth of each of these subgames is clearly ă n and hence the win/loss/draw is determined for these subgames. Now, we can easily determine the win/loss/draw for the game with depth n that we started with. We, now, provide another argument using De Morgan’s law. Note that ‚ first player has a winning strategy ðñ Dx1 @y1 Dx2 @y2 ¨ ¨ ¨ such that first player wins; ‚ second player has a winning strategy ðñ @x1 Dy1 @x2 Dy2 ¨ ¨ ¨ such that second player wins; Ť ‚ or the negation of pDx1 @y1 Dx2 @y2 ¨ ¨ ¨ q p@x1 Dy1 @x2 Dy2 ¨ ¨ ¨ q happens. From De Morgan’s law, this is equivalent to p@x1 Dy1 @x2 Dy2 ¨ ¨ ¨ : Player 1 loses or drawq X pDx1 @y1 Dx2 @y2 ¨ ¨ ¨ : Player 2 loses or drawq Clearly the third bullet implies that the game ends in a draw, proving the theorem. □ § 5. Game of Hex A hex board is an array of regular hexagons arranged into a diamond shape in such a way there is the same number of hexagons along each side of the board. Two players alternatively play like in tic-tac-toe. Each player, in his turn, places his symbol (blue or red coloured according to the player) on one of the hexagon. Winner is the player who first obtains a connected path of adjacent hexagons stretching between the sides of that player’s label. Invented by Piet Hein, a Danish scientist, mathematician, writer, poet in 1942. Rediscovered by John Nash at Princeton in 1948, and became popular at Princeton. We want to address the following questions. (1) How do we solve this game? (2) Does the game have a winning strategy for either player? (3) Can there be a draw? (5.1). Theorem. There can be no draw in the Game of Hex. 6 INTRODUCTION TO GAME THEORY Proof. Imagine the playing board of hex is made out of paper. Whenever red moves, he colours the hexagon of his choice. Whenever blue moves, he cuts his hexagon of his choice. Now either the board will remain a single piece or at least two pieces. This completes the proof. □ Above proof is quite intuitive, but the proof uses one of the most important theorem known as Jordan curve theorem. In fact, one can show that Game of Hex proves the Jordan curve theorem. (5.2). Theorem. First player has a winning strategy. Proof. Suppose the second player has a winning strategy. Because moves by the players are symmetric, it is possible for the first player to adopt the second player’s winning strategy as follows: The first player, on his first move, just colors in an arbitrarily chosen hexagon. Subsequently, for each move by the other player, the first player responds with the appropriate move dictated by second player’s winning strategy. This is called “stealing the strategy” and is used by Nash in his proof. If the strategy requires that first player move in the spot that he chose in his first turn and there are empty hexagons left, he just picks another arbitrary spot and moves there instead. Having an extra hexagon on the board can never hurt the first player - it can only help him. In this way, the first player, too, is guaranteed to win, implying that both players have winning strategies, a contradiction □ Why is this interesting, in particular to this course? Provides a proof to Brouwer fixed point theorem, a very important result across many disciplines of Mathemaitcs. INTRODUCTION TO GAME THEORY Part 2. Non-Cooperative Games 7 8 INTRODUCTION TO GAME THEORY § 6. Oligopoly § 6.1. Monopoly. Consider a firm which sells its product in the market. The demand for the product is at most q0 and the price of the product is inversely proportional to the quantity available in the market. In particular, we assume that the price is given by # p0 p1 ´ qq0 q, q ď q0 ppqq “ 0, q ě q0 where p0 is the maximum price the product can have. If the quantity produced is more than the market can demand i.e., q0 , then the price will be zero. The firm incurs a production cost c per unit. Then, the profit of the firm, when it produces q units, is given by # qppqq ´ cq, q ď q0 Πpqq :“ 0, q ě q0 If p0 ď c, the firm will not produce. And hence we assume c ă p0 . The firm’s objective is to choose q which maximizes its profit. Since c ă p0 . we can find a q P p0, q0 q such that Πpqq ą 0. Therefore the optimal q will be interior to p0, q0 q. Also note that the profit function is concave in p0, q0 q. Therefore the optimal qm is characterised by BΠ pqq “ 0. Bq Solving this, yields qm q0 “ 2 ˆ c 1´ p0 ˙ and the optimal profit is given by q0 p0 Πpqm q “ 4 ˆ c 1´ p0 ˙2 § 6.2. Cournot’s Duopoly. A duopoly is a situation in which two firms control the market for a certain commodity. When there are more firms, it is oligopoly. The duopoly problem is to decide how the firms adjust their production to maximize their profits. The duopoly problem is studied by Cournot (1838). His work can be seen as precursor to Nash equilibrium. Due to this some authors use Cournot-Nash equilibrium instead of Nash equilibrium in the case of Cournot’s oligopoly problem. Two Firms 1, 2 produce and sell a product on the same market. Price of the product decreases proportionally to the supply. qi the number of items produced by company i, i “ 1, 2. q0 and p0 are the highest reasonable production level and highest possible price. price when the total quantity produced is q “ q1 ` q2 # p0 p1 ´ qq0 q if q ă q0 ppqq “ 0 if q ě q0 marginal cost of the production = c for both firms p0 ď c is meaningless (no profit). We assume p0 ą c. Strategies of each form q1 and q2 , both taken from the interval r0, q0 s. The payoffs are given by Πi pq1 , q2 q “ qi ppqq ´ cqi What is the quantity to be produced by each company to have maximum profits possible? Given a strategy q2 of firm 2, what is the best response of firm 1? It is q̂1 pq2 q which maximizes the profit for firm 1. Using the arguments in the monopoly situation, we get ˆ ˙ q0 q2 c 1´ ´ . q̂1 pq2 q “ 2 q0 p0 INTRODUCTION TO GAME THEORY 9 Similarly the best response of firm 2 to a given strategy q1 of firm 1 is given by q̂2 pq1 q “ q0 2 ˆ ˙ q1 c 1´ ´ . q0 p0 The solution for the problem is to choose a pair pq1˚ , q2˚ q such that q1˚ is best response to q2˚ and vice versa. This is called “Cournot-Nash Equilibrium”. This amounts to solve the system of equations ˆ q0 1´ 2 ˆ q0 q2˚ “ 1´ 2 q1˚ “ ˙ q2˚ c ´ q0 p0 ˙ q1˚ c ´ q0 p0 This system has a unique solution and is given by q1˚ “ q2˚ q0 “q “ 3 ˚ ˆ c 1´ p0 ˙ The equilibrium payoff is given by Π1 pq1˚ , q2˚ q “ Π2 pq1˚ , q2˚ q q0 p0 “ 9 ˆ c 1´ p0 ˙2 We will now compare the situation with monopolistic market. Now assume that the two firms form a cartel and agree to produce the same amount of product. In this case each firm will ´ ¯2 produce q2m units each and share the profit q04p0 1 ´ pc0 equally. This profit is clearly higher than the profit they get in the duopoly setting. Firms decisions of whether to go with duopoly or monopoly (cartel) can be visualized as as a two player game with the payoff matrix Player 1 Duopoly Cartel Player 2 Duopoly ´ ¯2 ´ ¯2 q0 p0 q 0 p0 c c 1 ´ , 1 ´ 9 p0 9 p0 ´ ´ ¯2 ¯2 5q0 p0 5q0 p0 c c 1 ´ , 1 ´ 48 p0 36 p0 Cartel ´ ¯2 ´ ¯2 5q0 p0 5q0 p0 c c 1 ´ , 1 ´ 36 p0 48 p0 ´ ´ ¯2 ¯2 q 0 p0 q0 p0 c c 1 ´ , 1 ´ 8 p0 8 p0 From this table, we can see that Nash equilibrium gives worse payoff than what can be achieved by cooperation. However, the cooperation is not stable. This illustrates the Prisoner’s dilemma. ´ ¯ Note that the total quantity 2q30 1 ´ pc0 available in the market under the Nash equilibrium ´ ¯ is strictly higher than the corresponding quantity q20 1 ´ pc0 under the cooperative outcome. Correspondingly the price of the product under the Nash equilibrium is lower than the price under the cooperative equilibrium. From the consumers perspective, the Nash equilibrium is better than the cooperative equilibrium. Thus, firms objective and consumers interests are at odds. § 6.3. Bertrand’s Duopoly. The situation is same as previous one. Strategies are now the prices and not the quantity. The firm with lower price captures the market. This firm sells the whole product and the second one sells nothing. In case of equal prices, the firms share market equal. The demand function qppq is given by ˆ qppq “ q0 p 1´ p0 ˙ 10 INTRODUCTION TO GAME THEORY for p ă p0 . The payoffs are given by $ ’ &pp1 ´ cqqpp1 q Π1 pp1 , p2 q “ pp1 ´ cqqpp1 q{2 ’ % 0 $ ’ &pp2 ´ cqqpp2 q Π2 pp1 , p2 q “ pp2 ´ cqqpp2 q{2 ’ % 0 if p1 ă p2 if p1 “ p2 if p ` 1 ą p2 if p2 ă p1 if p2 “ p1 if p2 ą p1 p0 ą c denote the highest reasonable price for the product. It is a symmetric game with strategy spaces P1 “ P2 “ rc, po s. ‚ p1 ą p2 can not be best response to p2 , whenever p2 ą c. Similarly p2 ą p1 can not be best response to p1 , whenever p1 ą c. ‚ c “ p2 ă p1 ď p0 : In this case p2 is not the best response to p1 , as choosing anything between c and p1 yields a better payoff for firm 2. So this case can not give a Nash Equilibrium. ‚ Similarly the case c “ p1 ă p2 ď p0 can not give Nash equilibrium. ‚ c ă p1 “ p2 ď p0 . A slight decrease in pi give firm i better pay than pi . so this case also can not give Nash equilibrium. ‚ Remaining case: c “ p1 “ p2 . In this case the payoff to both firms is zero and this is the Nash equilibrium. In the monopolist case, pm “ pp0 ` cq{2 is the optimal price and the optimal profit is ˙ ˆ pm . Πm ppm q “ ppm ´ cqq0 1 ´ p0 § 7. Matching Pennies There are two players and each have a coin. Both of them place the two coins on the table. If the two coins match, player 1 wins and gets a rupee from player 2, otherwise player 2 wins and he gets a rupee from player 1. This game can be represented in the following table. Player 2 Head Tail Head 1, -1 -1, 1 Tail -1, 1 1, -1 Player 1 What should the players choose? Suppose first player chooses x P tH, T u and the second players choses y P tH, T u. Let π1 px, yq be the payoff that the first player receives and let π2 px, yqbe the payoff that the second player receives. Note that π1 px, yq ` π2 px, yq “ 0 for all x and y. Such games are called zero-sum game. Since maximizing π2 over the choices/strategies of second player is same as minimizing π1 over the choices/strategies of second player, we can view this problem as follows: First player maximizes π1 (hereafter we denote by π) over his/her choices/strategies while the second player minimizes π over his/her choices/strategies. First player’s thought process goes as follows: If I choose x, second player will choose y so as to minimize πpx, yq. Therefore, the best I will do is when I choose x which maximizes miny πpx, yq. Thus Player 1 considers the optimization (max min) problem max min πpx, yq. x y ´ Let this value be denoted by v . Any choice/strategy x which gives first player a value bigger than v ´ is called security strategy of the first player. In a similar fashion, the second player consider the optimization (min max) problem min max πpx, yq. y ` x We denote this value by v . Any choice/strategy y which gives the second player a value less than v ` is called security strategy of the second player. INTRODUCTION TO GAME THEORY 11 The security strategies of the players can be interpreted as worst case strategies. They are also called maxmin (or minmax) strategies of the respective players. Simple computation shows that v ´ “ ´1 and v ` “ 1, implying that v ´ ă v ` . In fact, v ´ ď v ` in any example. To see this, first note that min πpx, yq ď πpx, y 1 q ď max πpx1 , y 1 q 1 y x for all x, y 1 . Therefore max min πpx, yq ď min max πpx1 , yq x y y x1 implying that v ´ ď v ` . Now, suppose the first player has a random device which suggests the choice he has to make. Let us say he choses the suggested strategy x˚ which suggests that H will be chosen with probability 12 and T will be chosen with 12 probability. In such a case his utility will be an expectation and is given by 1 1 πpx˚ , ¨q “ πpH, ¨q ` πpT, ¨q. 2 2 It is easy to check that πpx˚ , Hq “ πpx˚ , T q “ 0. Even if the second player choses a strategy y suggested by any random device (of his own), than also it is true that πpx˚ , yq “ 0. Such a strategy x˚ is called equalizer strategy of Player 1. Similarly the strategy of player 2 which suggests H with probability 21 and T with probability 21 will be equalizing strategy of Player 2. The strategies discussed just now are called mixed (or randomized) strategies. Let ∆1 and ∆2 denote, respectively, the sets of mixed strategies of the players. The earlier arguments can be extended to show that max min πpx, yq ď min max πpx, yq. xP∆1 yP∆2 yP∆2 xP∆1 In fact, we can show that these two are equal. With an abuse of notation, we use v ´ to denote the term on the left side and v ` to denote the term on right side. These are knowns as lower and upper values in the class of mixed strategies respectively. We will now show that v ´ “ v ` . Note that max min ě min πpx˚ , yq “ 0 xP∆1 yP∆2 yP∆2 and min max πpx, yq ď max πpx, y ˚ q “ 0 ´ yP∆2 xP∆1 ` xP∆1 This clearly implies that v “ v “ 0. Another property (exercise) of the pair x˚ , y ˚ q of equalizer strategies is that πpx, y ˚ q ď πpx˚ , y ˚ q ď πpx˚ , yq for all x P ∆1 and y P ∆2 . Any pair of strategies px˚ , y ˚ q P ∆1 ˆ ∆2 satisfies this property is called saddle point equilibrium of the game. To show that v ´ “ v ` , it is enough to have the existence of saddle point equilibrium. We now prove this. Let px˚ , y ˚ q P ∆1 ˆ ∆2 be saddle point equilibrium. Since πpx, y ˚ q ď πpx˚ , y ˚ q for all x P ∆1 , we have max πpx, y ˚ q ď πpx˚ , y ˚ q xP∆1 which, in turn, will imply that min max πpx, yq ď max πpx, y ˚ q ď πpx˚ , y ˚ q yP∆2 xP∆1 ` ˚ xP∆1 ˚ and hence v ď πpx , y q. From the inequality πpx˚ , y ˚ q ď πpx˚ , yq for all y P ∆2 , we have πpx˚ , y ˚ q ď min πpx˚ , yq ď max min πpx, yq yP∆2 xP∆1 yP∆2 12 INTRODUCTION TO GAME THEORY proving πpx˚ , y ˚ q ď v ´ . Hence v ´ “ v ` . In view of these observations, it is natural to ask if v ´ “ v ` , does there exist a saddle point equilibrium? Indeed, it is true provided the sets of strategies is compact. Note that, in our example, ∆1 “ tx “ px1 , x2 q P R2` : x1 ` x2 “ 1u and ∆2 “ ty “ py1 , y2 q P R2` : y1 ` y2 “ 1u. Furthermore, both the strategy spaces are compact (closed and bounded) and convex. Due to the compactness, we can choose x̂ P ∆1 , ŷ P ∆2 such that max min πpx, yq “ min πpx˚ , yq xP∆1 yP∆2 yP∆2 and min max πpx, yq “ max πpx, y ˚ q. xP∆1 yP∆2 xP∆1 Now πpx˚ , y ˚ q ě min πpx˚ , yq “ max min πpx, yq yP∆2 xP∆1 yP∆2 and πpx˚ , y ˚ q ď max πpx, y ˚ q “ min max πpx, yq xP∆1 yP∆2 xP∆1 From these two inequalities and the hypothesis that v ´ “ v ` , we obtain v ´ “ v ` “ πpx˚ , y ˚ q. It is, now, not hard to verify that πpx, y ˚ q ď πpx˚ , y ˚ q ď πpx˚ , yq for all x P ∆1 and y P ∆2 . Thus, showing the existence of saddle point equilibrium is equivalent(?) to the fact that lower and upper values are same. The main question (in zero-sum games) is to prove the existence of value i.e., lower and upper values are same. This result is established by von Neumann and is known as von Neumann minmax theorem. § 8. Rock-Paper-Scissors Game This is a famous children game. It is played by two players. They simultaneously display their hands in one of three shapes denoting a rock, a paper, or scissors. The rock wins over the scissors as it can shatter scissors, the scissors win over paper (scissor cuts paper), and the paper wins over the rock (since paper can cover rock). Winner takes a rupee from the opponent. If both display the same, the game is drawn. This game can be tabulated as Player 1 Player 2 R S R 0, 0 1, -1 S -1, 1 0, 0 P 1, -1 -1, 1 P -1, 1 1, -1 0, 0 § 9. Prisoner’s Dilemma Prisoners’ dilemma is framed by Merrill Flood and Melvin Dresher working at RAND in 1950 and Alan W. Tucker formalized the version used now. Two individuals who have committed a serious crime are apprehended. Lacking of incriminating evidence, the prosecution can obtain an indictment only by persuading one (or both) of the prisoners to confess to the crime. However, they have witness only to convict them of a minor offense. If neither admits, the both will be charged with the minor offense and pay a moderate fine. They are put in separate cells and asked to fink on the other. Finking corresponds to a strategy D (Defect) and not finking corresponds to C (Cooperate with the other prisoner). Each of them was told that if he finks on the other, he will be released with no fine. The above situation defines a two-player strategic-form game in which each player has two strategies: D, which stands for defection, betraying your fellow criminal by confessing, and C, INTRODUCTION TO GAME THEORY 13 which stands for cooperation, cooperating with your fellow criminal and not confessing the crime. This situation is better represented in the the following table Player 1 Player 2 D C D -6, -6 0, -10 C -10, 0 -1, -1 Now the question is what the two criminals should do? Let us start with the thought process of Player 1. ‚ Suppose Player 2 decides to play C. Then I will benefit by playing D. ‚ Even if Player 2 decides to play D, I will benefit by playing D. ‚ Therefore I should player D. Similar thought process suggests choosing D for Player 2 also. Thus the pair (D, D) is the outcome of this game and is called Nash equilibrium of the game. This result can also be seen also from the argument of domination. The choice D strictly dominates the choice C for both players in the sense that the payoff that the player receives under the choice D is higher than that of under the choice C irrespective of the choice of other player. Thus, the “rationality” of the players forces them to choose D and hence (D, D) is the outcome of the game. If both players do not defect each other, they are better off with imprisonment for only one year. However (C, C) is not stable, i.e., if one player chooses C then the other player will switch to D, thus by the other player will get no imprisonment. Is there a way to ensure the cooperation among the player? This is a big question, people have been working on this. The situation of Prisoner’s dilemma appears naturally in various situations. One such situation appears in oligopoly. Some other examples include ‚ Arm races between superpower or local rival nations. ‚ Students sharing a room need to clear the room. Each student would prefer that his roommate clean the room. In the end nobody puts the effort and the room does not get cleaned. ‚ Competition between two firms selling similar products. Take the example of Coca-Cola and Pepsi. Each must decide on a pricing strategy. If both of them charge a higher price, they can exploit their joint market and make a good profit. However, if one of them sets a competitive low price, it attracts more customers from the rival and then the profit will raise higher than the earlier. The prisoners’ dilemma has several viewpoints. ‚ Moral Issue: Taking the high road (not defecting the other) leads to the best overall outcome. ‚ Community Issue: How can we persuade people to do what is best for the group, instead of what is best for themselves (and not best for the group)? ‚ Truth and Falsehood Issue: The above story does not say whether the evidence given by the prisoners is true or false. If one person gives evidence, it convicts the other, that’s all. ‚ Communication Issue: If the two prisoners are allowed to communicate to each other, they could mutually decide to not-attack the other and get the best overall outcome. Prisoner’s Dilemma A Story from Childhood Days ‚ A king, while performing a yagna, decides to make his praja participate so that they also will attain the punya. ‚ For this, he asks all his praja to get a milk and put it in a large vessel. ‚ The result: The vessel contains only water ‚ This story is a multiplayer version of “Donation Game”, a variant of Prisoner’s dilemma. Prisoner’s Dilemma Donation Game 14 INTRODUCTION TO GAME THEORY ‚ The payoffs in the donation game for two player case is given below: Player 2 D C b, ć Player 1 D 0 , 0 C ć, b b ´ c, b ´ c Here C refers to the cooperation by both the players, which benefit them, whereas D refers to defecting the other player. b is the benefit obtained by cooperating and c is the cost of cooperation. Prisoner’s Dilemma ‚ A natural question is the following: Can the prisoners extricate themselves from the dilemma and sustain cooperation when each of them have a powerful incentive to cheat? If so how? – Idea is to consider repeated play of the same game. The competition between Coke and Pepsi exists on every day. So if they continue cheating, the gains will be lesser than the cooperation. We can expect them to cooperate in the long run. – Can Gandhigiri be explained through the iterated play of Prisoner’s dilemma? Prisoner’s Dilemma ‚ Cheater’s reward comes at once, while the loss from punishment lies in the future. If the future payoffs are heavily discounted, then the loss may be insufficient to deter cheating. Thus cooperation is harder to sustain among very impatient players (governments for example). ‚ Punishment will not work unless cheating can be detected and punished. If the actions of companies are not easily detected, then the companies will try to defect. ‚ Punishment can be made automatic by following strategies like ‘tit for tac’. This is popularised by Robert Axelrod. Here you cheat if your rival cheated your in the previous round. Prisoner’s Dilemma ‚ A fixed, finite number of repetition is logically inadequate to yield cooperation. ‚ Cooperation can also arise if the group has a large leader. if the leader is going to loose a lot by the outright competition and therefore exercises restraint, even though he knows the small players will cheat. Saudi Arabia’s role of “swing producer” (supplier of any commodity controlling its global deposits and possessing large spare production capacity) in the OPEC (Organization of the Petroleum Exporting Countries) cartel is an instance of this. A swing producer is able to increase or decrease commodity supply at minimal additional internal cost, and thus able to influence prices and balance the markets, providing downside protection in the short to middle term. § 10. BoS Two people wish to go out together. They have two options: watching a movie or to go for dinner. One prefers movie and the other prefers dinner. If they go to different things, each of them are unhappy as they do not have a company. This situation is described in the following table. Player 1 Player M M 2, 1 D 0, 0 2 D 0, 0 1, 2 § 11. Matrix Games A matrix game is described by a single matrix. There are two players (let us call row player (player 1) and column player (player 2)). Row player chooses a row and column player chooses a column independent of each other. The corresponding entry is that pay received by the row player from the column player. Generally we assume that the row player is the maximizer and INTRODUCTION TO GAME THEORY 15 the column player is minimizer. The row player will choose a row such that he gets maximum pay whereas the column player chooses a column to minimize the amount that he pays. The pair of optimal choices of row and colum is called pure saddle point equilibrium. In general, pure saddle point equilibrium does not exist. In a seminal work, what can be considered as the starting point of game theory, von Numann proved the minimax theorem, which the existence of saddle point equilibrium in mixed strategies. We will now discuss this result. Let A be a m ˆ n matrix meaning that player 1 has m pure strategies and player 2 has n pure strategies. A mixed strategy for player 1 is a probability vector x “ px1 , x2 , ¨ ¨ ¨ , xm q1 P Rn i.e., m ÿ xi “ 1 and xi ě 0, i “ 1, 2, ¨ ¨ ¨ , m. i“1 Here xi represents the probability with which player 1 picks the row i. Similarly a mixed strategy for player 2 is a probability vector y “ py1 , y2 , ¨ ¨ ¨ , yn q1 P Rn i.e., n ÿ yj “ 1 and yj ě 0, j “ 1, 2, ¨ ¨ ¨ , n. j“1 As earlier, here yj represents the probability with which player 2 pics the column j. Let ∆m denotes the set of all mixed strategies for player 1 and ∆n denotes the set of all mixed strategies for player 2. Note that both ∆m and ∆n are convex and compact subsets of the respective euclidean spaces. Note that vectors x and y are generally understood as column vectors. Let player 1 choose a mixed strategy x and player 2 chooses a mixed strategy y then the (expected) payoff received by player 1 from player 2 is given by πpx, yq “ m ÿ n ÿ xi yj aij “ x1 Ay. i“1 j“1 The meaning of the above payoff function is self-explanatory. the row i is picked with probability xi and the column j is picked with probability yj and hence the player 1 will get xi yj aij . Now the expected payoff will be sum of all these quantities. Let us consider the situation of player 1. Since the players are assumed to be rational, he will think in the following way: if I choose a mixed strategy x, then player 2 will try to choose mixed strategy which minimized the quantity x1 Ay over y P ∆n . So player 1’s best choice will be to choose a mixed strategy which maximizes this minimum value. Thus player 1, if he plays best will secure at least max min x1 Ay. xP∆m yP∆n ´ This value, denoted by V pAq, is called the security level of player 1, and any strategy that secures him at least this value is called optimal (prudent) strategy for player 1. In a similar fashion, player 2 will think and his strategy will be to choose so as not to pay more than min max x1 Ay. yP∆n xP∆m This value, denoted by V ` pAq, is called the security level of player 2, and any strategy which makes him pay less than this is called optimal (prudent) strategy for player 2. The values V ´ pAq and V ` pAq are also called lower and upper values of the game respectively. Note that we always have V ´ pAq ď V ` pAq. When these two values are equal, that equal value is called the value of the game and is denoted by V pAq. The first major theorem of the game theory (proved by von Neumann) is to show that the upper and lower values, when mixed strategies are used, are always equal. (11.1). Theorem (Minmax Theorem, von Neumann). Every finite zero-sum game admits value. Before proceeding with the proof we recall two results. 16 INTRODUCTION TO GAME THEORY (11.2). Proposition. Let C be a compact convex subset of a euclidean space Rm and 0 R C. Then there exists a vector z P Rm such that z ¨ x ą 0 for x P C. Proof. Since C is convex, there exists a unique point z P C such that |z|2 ď |x|2 for every x P C. Now for any x P C, we have p1 ´ αqz ` αx P C for all α P p0, 1q. Therefore }z}2 ď }p1 ´ αqz ` αx}2 “ p1 ´ αq2 }z}2 ` 2αp1 ´ αqz ¨ x ` α2 }x}2 Therefore, 0 ď αpα ´ 2q}z}2 ` 2αp1 ´ αqz ¨ x ` α2 }x}2 Dividing by α, we have 0 ď pα ´ 2q}z}2 ` 2p1 ´ αqz ¨ x ` α}x}2 Letting α Ñ 0, we have 0 ď ´2}z}2 ` 2z ¨ x which gives the required inequality }z}2 ď z ¨ x □ (11.3). Proposition. Let A be any matrix of order m ˆ n. Then either (1) there exists x P Rm , x ‰ 0, x ě 0 such that x1 A ě 0; or (2) there exists y P Rn , y ‰ 0, y ě 0 such that Ay ď 0. Proof. Let e1 , e2 , ¨ ¨ ¨ , en be the unit vectors in Rn . Let the rows of A be denoted by a1 , a2 , ¨ ¨ ¨ , am P Rn . Let C be the convex hull of é1 , é2 , ¨ ¨ ¨ , én and a1 , a2 , ¨ ¨ ¨ , am , then C is a compact convex subset of Rn . Now two cases arise: 0 P C or 0 R C. Case 0 P C : In this case, there exists non-negative real numbers x1 , x2 , ¨ ¨ ¨ , xm , η1 , η2 , ¨ ¨ ¨ , ηn such that x1 a1 ` x2 a2 ` ¨ ¨ ¨ ` xm am ´ η1 e1 ´ η2 e2 ´ ¨ ¨ ¨ ´ ηn en “ 0, and x1 ` x2 ` ¨ ¨ ¨ ` xm ` η1 ` η2 ` ¨ ¨ ¨ ` ηn “ 1. Clearly all of x1 , x2 , ¨ ¨ ¨ , xm can be zero. Indeed, if x1 “ x2 “ ¨ ¨ ¨ “ xm “ 0, then we must have η1 e1 ` η2 e2 ` ¨ ¨ ¨ ` ηn en “ 0, η1 ` η2 ` ¨ ¨ ¨ ` ηn “ 1 which contradicts the liner independence of the vectors e1 , e2 , ¨ ¨ ¨ , en . Thus we have non-negative real numbers x1 , x2 , ¨ ¨ ¨ , xm P R, not all of them zero, such that x1 a1 ` x2 a2 ` ¨ ¨ ¨ ` xm am “ η n where η “ pη1 , η2 , ¨ ¨ ¨ ηn q P R . Note that η ě 0. In other words, x1 A “ η ě 0 where x “ px1 , x2 , ¨ ¨ ¨ , xm q1 P Rm , x ‰ 0 and x ě 0. This proves (i). Case 0 R C : Since 0 R C, there is a hyperplane separating 0 and C. In other words there must exist z P Rn such that x ¨ z ą 0 for every x P C. Since éi P C, we must have zi ă 0 and hence z ‰ 0, z ď 0. Now ai P C and hence ai ¨ z ą 0 for every i “ 1, 2, ¨ ¨ ¨ , m. Thus Az ą 0. Now taking z “ ý we obtain Ay ă 0 which proves (ii). □ With these two lemmas in hand, we are now ready to prove the minmax theorem. INTRODUCTION TO GAME THEORY 17 Proof. (Minmax Theorem) From the previous result either we have two cases: there exists x ě 0 P Rm , x1 ‰ 0 such that x A ě 0 or there exists y ě 0 P Rn , y ‰ 0 such that Ay ď 0. Letting x̄ “ řcxi and ȳ “ řyyj , we note that x̄ P ∆m and ȳ P ∆n and either x̄1 A ě 0 or Aȳ ď 0. 1 The first case means that x̄1 Ay ě 0 for ever y P ∆n which means that the lower value of the game V ´ pAq “ max min x1 Ay ě 0. xP∆m yP∆n The second case means that xAȳ ď 0 for every x P ∆m , which gives that the upper value of the game V ` pAq “ min max x1 Ay ď 0. yP∆n xP∆m Thus we have either V ´ pAq ě 0 or V ` pAq ď 0. Let B “ ppaij ´ cqq, where c P R. Note that V ´ pBq “ V ´ pAq ´ c and V ` pBq “ V ` pAq ´ c. Thus we must have V ´ pAq ě c or V ` pAq ď c for any c P R. This can happen only if both V ´ pAq and V ` pAq are equal. This completes the proof of the minmax theorem. □ (11.4). Remark. We can view the Farkas lemma as minmax theorem in disguise. The Farkas lemma says that either maxx miny xx, Ayy ě 0 or miny maxx xx, Ayy ď 0. In fact, we can show that they are equivalent. We now recall the definition of saddle point equilibrium. A pair of mixed strategies px‹ , y ‹ q P ∆m ˆ ∆n is said to be saddle point equilibrium provided x1 Ay ‹ ď x˚1 Ay ‹ ď x˚1 Ay for every x P ∆m and y P ∆n . We now show that the existence of saddle point equilibrium and the existence of value are equivalent. (11.5). Theorem. A game admits value if and only if it has a saddle point equilibrium. □ Proof. Exercise. We now discuss several properties of zero-sum games. (11.6). Proposition. Suppose px‹ , y ‹ q and px̂, ŷq are two saddle point equilibria of the game A, then x˚1 Ay ‹ “ x̂1 Aŷ, and both px‹ , ŷq and px̂, y ‹ q are also saddle points. □ Proof. Exercise. For given mixed strategy x of player 1, let BR2 pxq denotes all the mixed strategies ŷ of player 2 such that x1 Aŷ “ min x1 Ay. yP∆n 2 Any strategy in BR pxq is called a best response of player 2 to the mixed strategy x of player 1. Similarly let BR1 pyq denotes the best responses of player 1 to the mixed strategy y of player 2 and is given by * " BR1 pyq “ x̂ P ∆m : x̂1 Ay “ max x1 Ay . xP∆m 1 2 Set BRpx, yq “ BR pyq ˆ BR pxq Ď ∆m ˆ ∆n , then BR defines a set-valued map from ∆m ˆ ∆n to itself and is denoted by BR : ∆m ˆ ∆n ãÑ ∆m ˆ ∆n . It is easy to observe the following: 18 INTRODUCTION TO GAME THEORY (11.7). Proposition. A point px‹ , y ‹ q P ∆m ˆ ∆n is a saddle point equilibrium if and only if px‹ , y ‹ q P BRpx‹ , y ‹ q. In other words, x‹ P BR1 py ‹ q and y ‹ P BR2 px‹ q. □ Proof. Exercise. (11.8). Remark. A point px‹ , y ‹ q such that px‹ , y ‹ q P BRpx‹ , y ‹ q is called a fixed point of the best response map. Thus proving the existence of saddle point equilibrium amounts to show the existence of fixed point of the best response map. This can be done using Kakutani fixed point map. We will not give these details here, as we will use this idea later while studying nonzero-sum games. For a fixed mixed strategy x of player 1, note that min x1 Ay “ min x1 Aej . 1ďjďn yP∆n Here e1 , e2 , ¨ ¨ ¨ , en are the pure strategies of player 2. Thus finding an optimal strategy x of player 1 amounts to the following optimization problem maximize min x1 Aej 1ďjďn m ÿ s.t xi ě 0, xi “ 1. i“1 Let t “ min1ďjďn x1 Aej , then we note that t ď x1 Aej for every j. Using this fact in the above optimization problem, we get maximize t m ÿ s.t t ď x1 Aej ; xi ě 0; xi “ 1. i“1 Now this is a linear program and the value of this linear program is equal to the lower value of the game. Similarly we can write another linear program from the second player’s perspective. We now give these details. When player 2 fixes his strategy y, then we have max x1 Ay “ max e1i Ay. 1ďiďm xP∆m Thus the corresponding linear program will be minimize s s. t. s ě e1i Ay; yj ě 0; n ÿ yj “ 1. j“1 The value of this linear program is the upper value of the game. It is not very difficult to note that both the linear programs are dual to each other. Since both are finite linear programs, using the duality result, we must have that the two values must be equal, which provides, yet another proof of minmax theorem. § 12. Continuous Games Let S1 and S2 be the action spaces of player 1 and player 2 respectively. Let f : S1 ˆ S2 Ñ R be a continuous function. Player 1 chooses an action x from S1 and player 2 chooses y from S2 . As a result player 1 gets f px, yq from player 2. The object of player 1 is to choose his action so that f will be maximized and at the same time player 2 chooses his action to minimize f . If player 1 chooses an action x P S1 , then the best he can get is miny f px, yq. And hence he chooses an x which maximizes this minimum value i.e., he choose x‹ such that max min f px, yq “ min f px‹ , yq x y y INTRODUCTION TO GAME THEORY 19 This maxmin value is called the security level of player 1. In a similar way, the security value for player 2 can be introduced and is given by minmax value miny maxx f px, yq. A pair of strategies px‹ , y ‹ q given as above are called minmax (or maxmin) strategies of the players. (12.1). Proposition. Security level for player 1 is always less than or equal to the security level of player 2 i.e., max min f px, yq ď min max f px, yq x y y x One can find numerous examples to show that the above inequality can be strict. When the security levels for both the players are same, then the pair of minmax strategies play a crucial rule in the study of zero-sum games. These can be shown to be equivalent to saddle point equilibrium. we now introduce the definition of saddle point equilibrium. (12.2). Definition (Saddle Point Equilibrium). A pair of strategies px‹ , y ‹ q P S1 ˆ S2 is called a saddle point equilibrium if f px, y ‹ q ď f px‹ , y ‹ q ď f px‹ , yq for all px, yq P S1 ˆ S2 . We now list some properties of saddle point equilibrium. (12.3). Proposition. If px‹ , y ‹ q is a pair of saddle point equilibrium, then f px‹ , y ‹ q “ max min f px, yq “ min max f px, yq. x y y x (12.4). Proposition. If max min f px, yq “ min max f px, yq. x y y ‹ ‹ x and the outer maximizer x and outer minimizer y exists, then px‹ , y ‹ q is saddle point equilibrium. In other ways, minmax strategies and saddle point equilibrium coincide. (12.5). Theorem (Minmax Theorem). Let Si be a compact convex subset of some euclidean space and f be concave-convex function (i.e., f is concave in x-variable when y is fixed and it is convex in y-variable when x is fixed). Then a saddle point equilibrium exists. Proof. We first assume that f is strict concave in x-variable and strict convex in y-variable when the other variable is fixed. By the strictness, for each x, there is a unique ypxq such that f px, ypxqq “ min f px, yq :“ mpxq y Since f is uniformly continuous (as it is continuous on a compact set), ypxq is continuous as a function of x. Also mpxq is concave (being minimum of a family of concave functions) function. Note that minimum of a family of concave functions need not be continuous in general. However, our mpxq is continuous (Verify!). Let x‹ be such that mpx‹ q “ max mpxq “ max min f px, yq x ‹ x y ‹ Now our aim is to show that px , ypx qq is a saddle point equilibrium. Note that (12.6) mpx‹ q “ f px‹ , ypx‹ qq ď f px‹ , yq for all y P S2 from the definition of ypxq. Also, f px‹ , ypx‹ qq “ mpx‹ q ě mpxq for all x P S1 (12.7) from the definition of x‹ . Let ỹ “ ypp1 ´ tqx‹ ` txq, then (12.8) mpp1 ´ tqx‹ ` tx‹ q “ f pp1 ´ tqx‹ ` tx‹ , ỹq ě p1 ´ tqf px‹ , ỹq ` tf px, ỹq Using (12.6), (12.7) and (12.8), we now get mpx‹ q ě mpp1 ´ tqx‹ ` tx‹ q ě p1 ´ tqf px‹ , ỹq ` tf px, ỹq ě p1 ´ tqmpx‹ q ` tf px, ỹq which further implies that mpx‹ q ě f px, ỹq 20 INTRODUCTION TO GAME THEORY Now letting t Ñ 0, we obtain mpx‹ q ě f px, ypx‹ qq for all x P S1 . Combing this with (12.6), we get f px, ypx‹ qq ď f px‹ , ypx‹ qq ď f px‹ , yq for all px, yq P S1 ˆ S2 . Thus px‹ , ypx‹ qq is a saddle point equilibrium. This proves the theorem for the strict concave/convex case. We now prove for general case. Let f ϵ px, yq “ f px, yq ´ ϵ|x|2 ` ϵ|y|2 for ϵ ą 0. Then f ϵ is strict concave in x and strict convex in y when the other variable is fixed. Thus from above there is a pair pxϵ , y ϵ q P S1 ˆ S2 such that (12.9) f ϵ px, y ϵ q ď f ϵ pxϵ , y ϵ q ď f ϵ pxϵ , yq Since S1 , S2 are compact, we can extract a subsequence which is convergent. With an abuse of notation, we denote this subsequence by tpxϵ , y ϵ qu itself. Let pxϵ , y ϵ q Ñ px‹ , y ‹ q. Then letting ϵ Ñ 0 in (12.9), we obtain f px, ypx‹ qq ď f px‹ , ypx‹ qq ď f px‹ , yq for all px, yq P S1 ˆ S2 , which completes the proof of the theorem. □ The proof above is due to Karlin [7]. The general way to prove this theorem is to apply fixed point theorems. But the above proof avoids the use of fixed point theorem relying on strict convexity/concavity. We now prove the general minmax theorem. (12.10). Theorem (Minmax Theorem). Let X and Y be two compact and convex topological spaces. and f : X ˆ Y Ñ R be a continuous function. Assume that f is concave in x-variable when y is fixed and convex in y-variable when x is fixed. Then there exists a point px‹ , y ‹ q P X ˆY such that f px, y ‹ q ď f px‹ , y ‹ q ď f px‹ , yq In other words, there is a saddle point equilibrium. Proof. The above proof due to Karlin can be replicated if X and Y are compact convex subsets of some normed linear spaces with suitable modifications. But, we provide a proof which uses fixed point theorems and encompasses all topological vector spaces. □ (12.11). Exercise. Consider the following extension of the matching pennies game with countably infinite pure strategies. Describe the set of all equilibrium with positive probability on each 1, -1 -1, 1 1, -1 -1, 1 .. . -1, 1 1, -1 -1, 1 1, -1 .. . 1, -1 -1, 1 1, -1 -1, 1 .. . -1, 1 1, -1 -1, 1 1, -1 .. . ¨¨¨ ¨¨¨ ¨¨¨ ¨¨¨ ¨¨¨ ¨¨¨ ¨¨¨ ¨¨¨ ¨¨¨ ¨¨¨ of the choices available. § 13. Nonzero-sum Bimatix Games Let pA, Bq be a bimatrix game. Our aim is to prove the following theorem due to Nash. (13.1). Theorem (Nash). There exists a Nash equilibrium. Proof. Recall that a pair of mixed strategies px‹ , y ‹ q is a Nash equilibrium if and only if x‹ Ay ‹ ě ei Ay ‹ and x‹ By ‹ ě x‹ Bej for every i and j. INTRODUCTION TO GAME THEORY 21 Define f : ∆m ˆ ∆n Ñ ∆m ˆ ∆n by f px, yq “ px1 , y 1 q, where px1 , y 1 q are defined as follows: ci px, yq “ maxp0, ei Ay ´ xAyq and dj px, yq “ maxp0, xBej ´ xByq and yj ` dj px, yq xi ` ci px, yq řm řm and yj1 “ 1 ` l“1 cl px, yq 1 ` k“1 dk px, yq Clearly f is a continuous map. By Brouwer fixed point theorem, it has a fixed point. Let px˚ , y ˚ q be a fixed point of f . Then, x˚i ` ci px˚ , y ˚ q yj ` dj px˚ , y ˚ q řm řm (13.2) x˚1 and yj˚1 “ i “ ˚ ˚ 1 ` l“1 cl px , y q 1 ` k“1 dk px˚ , y ˚ q řm řm řm ˚ ˚ ˚ ˚ ˚ ˚ We, now, l“1 cl px , y q “ 0 and k“1 dk px , y q “ 0. Suppose l“1 cl px , y q ‰ 0, řm claim˚ that ˚ then l“1 cl px , y q ą 0 (since ci s are non-negative. x1i “ Let I “ ti : ci px˚ , y ˚ q ą 0u. Note that m ÿ x˚1 cl px˚ , y ˚ q “ ci px˚ , y ˚ q i l“1 from (13.2). If i R I, then this equality implies that x˚i “ 0. Now, m ÿ ÿ ÿ ÿ x˚i ei Ay ˚ ą x˚i x˚1 Ay ˚ “ x˚1 Ay ˚ x˚i “ x˚1 Ay ˚ , x˚1 Ay ˚ “ x˚i ei Ay ˚ “ l“1 iPI iPI iPI řm which is a a contradiction. Therefore the claim that l“1 cl px˚ , y ˚ q “ 0 holds. Similarly ř n ˚ ˚ ˚ ˚ ˚ ˚ k“1 dk px , y q “ 0. and hence ci px , y q “ 0 as well as dj px , y q “ 0. Thus ei Ay ˚ ď x˚1 Ay ˚ and ej By ˚ ď x˚1 By ˚ for each i “ 1, 2, ¨ ¨ ¨ , m and j “ 1, 2, ¨ ¨ ¨ , n, proving that px˚ , y ˚ q is Nash equilibrium. Digression: Brouwer Fixed Point Theorem Brouwer fixed point theorem is one of the central result in various disciplines of mathematics. The standard proofs uses either algebraic topology or degree theory. Here we provide a proof due to Lax [8] using change of variable formula. (13.3). Proposition. Let ϕ : Rn Ñ Rn be a continuously differentiable function such that ϕpxq “ x for all |x| ě 1. Then f is onto. Proof. Suppose ϕ is not onto. Then there exists y0 with |y0 | ă 1 such that for every x P Rn , ϕpxq ‰ y0 . Since ϕ maps unit ball into a closed set (Exercise!), there must be a neighborhood of y0 with no preimages. Let ϵ ą 0 be such that Bpy0 , ϵq X Rangepϕq “ ∅. ş Choose any continuous function f with support contained in Bpy0 , ϵq and f pyqdy ‰ 0. Using the change of variables, we now have ż ż 0 ‰ f pyqdy “ f pϕpxqqJpxqdx “ 0 since support of ϕ is not contained in range of f . Here J is the determinant of the Jacobian of ϕ. Thus we have a contradiction, which completes the proof of the theorem. □ (13.4). Proposition. Let ϕ be a continuous map of the unit ball in to Rn that is identity on the boundary. Then the image of ϕ covers every point in the unit ball. Proof. Extend ϕ outside the unit ball by ϕpxq “ x. Then clearly ϕ is continuous function. Now approximate ϕ by smooth functions which are identity outside unit ball (Exercise!). Now by previous proposition, each such smooth function covers the unit ball. By continuity and compactness arguments, ϕ also covers the unit ball. □ The above proposition actually proves the famous result known as no retraction theorem. This theorem says that there can not be a continuous function from the unit ball onto unit sphere which is identity on unit sphere. Once we have this result, we can easily prove □ 22 INTRODUCTION TO GAME THEORY Brouwer fixed point theorem. In fact, it is known that Brouwer fixed point theorem and no retraction theorem are equivalent. (13.5). Theorem (Brouwer Fixed Point Theorem). Any continuous function f from unit ball to itself has a fixed point i.e., there exists a point x such that f pxq “ x. Proof. Suppose f has no fixed point. Let ϕpxq denote the point where the ray starting from f pxq to x meets the boundary of the unit ball. The ϕ is a continuous function which maps the unit ball into the unit sphere. Also ϕ is identity on the unit sphere. Thus it satisfies the hypothesis of previous theorem, but not the conclusion and hence the theorem is proved. □ (13.6). Theorem (Kakutani Fixed Point Theorem). Let X Ď Rn be a compact and convex. for each x P X, let F pxq be a nonempty convex subset of X. Assume that the graph of the set valued map F is closed in X ˆ X. Then there is a point x‹ P X such that x‹ P F px‹ q. § 14. Nonzero-sum Continuous Games Let N denotes the set of players and with an abuse of notation, we also use N denote the number of players. Let Si be subset of a metric space which is the action space of palyer i, i “ 1, 2, ¨ ¨ ¨ , N . Denote by S “ S1 ˆ S2 ˆ ¨ ¨ ¨ ˆ SN . Let fi : S Ñ R be the payoff function of ith player. The objective of each player is to maximize his payoff fi . The central concept of the non-cooperative game theory is the equilibrium concept introduced by J. Nash. We first introduce the notation: For s P S and s˚i P Si , psí ; s˚i q denotes the strategy profile ps1 , ¨ ¨ ¨ , si´1 , s˚i , si`1 , ¨ ¨ ¨ , sN q. (14.1). Definition (Nash Equilibrium). A strategy profile s˚ “ ps˚1 , s˚2 , ¨ ¨ ¨ , s˚N q is called Nash equilibrium if fi ps˚ q ě fi ps˚í , si q for all si P Si and i “ 1, 2, ¨ ¨ ¨ , N . (14.2). Remark. The interesting feature of Nash equilibrium is that any unilateral deviation by a player, when others stick to Nash equilibrium, will yield him lesser payoff. Our aim is to establish the existence of Nash equilibrium. We restrict the attention to two player case, and the proof readily extends to multiplayer case without much difficulty. Let S1 , S2 be action spaces of player 1 and 2 respectively and the payoff for ith palyer is given by fi : S1 ˆ S2 Ñ R. We assume that S1 and S2 are compact and convex subsets of some euclidean space. f is concave in each of the variables when the other is fixed. In this case, the definition of Nash equilibrium take the following form. A pair px‹ , y ‹ q P S1 ˆ S2 is Nash equilibrium if f1 px, y ‹ q ď f1 px‹ , y ‹ q and f2 px‹ , yq ď f2 px‹ , y ‹ q for all x P S1 and y P S2 . The following proof is due to Genakoplos. Assume f1 is concave in x and f2 is concave in y. Also assume that both f1 and f2 are jointly continuous. Fix px̄, ȳq P S1 ˆ S2 and define define ϕ “ pϕ1 , ϕ2 q : S1 ˆ S2 Ñ S1 ˆ S2 by ϕ1 px̄, ȳq “ arg maxxPS1 tf1 px, ȳq ´ |x ´ x̄|2 u and ϕ2 px̄, ȳq “ arg maxyPS2 tf2 px̄, yq ´ |y ´ ȳ|2 u Clearly ϕ1 and ϕ2 are single valued and hence ϕ : S1 ˆ S2 Ñ S1 ˆ S2 defines a continuous function. Now Brouwer fixed point theorem guarantees the existence of px˚ , y ˚ q P S1 ˆ S2 such that ϕpx˚ , y ˚ q “ px˚ , y ˚ q. Therefore, ! ) ϕ1 px˚ , y ˚ q “ max ϕ1 px, y ˚ q ´ |x ´ x˚ |2 xPS1 INTRODUCTION TO GAME THEORY and 23 ! ) ϕ2 px˚ , y ˚ q “ max ϕ2 px, y ˚ q ´ |y ´ y ˚ |2 yPS2 Fix x P S1 . Let λ P p0, 1q and consider λx ` p1 ´ λqx˚ P S1 . By definition of x˚ and y ˚ , we have ϕ1 px˚ , y ˚ q ě ϕ1 pλx ` p1 ´ λqx˚ , y ˚ q ´ |λx ` p1 ´ λqx˚ ´ x˚ |2 “ ϕ1 pλx ` p1 ´ λqx˚ , y ˚ q ´ λ2 |x ´ x˚ |2 ě λϕ1 px, y ˚ q ` p1 ´ λqϕ1 px˚ , y ˚ q ´ λ2 |x ´ x˚ |2 Rearranging we get λϕ1 px˚ , y ˚ q ě λϕ1 px, y ˚ q ´ λ2 |x ´ x˚ |2 Now dividing both sides by λ ą 0 and then letting λ Ñ 0, we obtain ϕ1 px˚ , y ˚ q ě ϕ1 px, y ˚ q Since x P S1 is arbitrary, we get the optimality of x˚ . In a similar fashion, we can show the optimality of y ˚ . Therefore px˚ , y ˚ q is Nash equilibrium. § 15. Lemke-Howson Algorithm (15.1). Proposition. Let pA, Bq be a bimatrix game. A mixed strategy x˚ P ∆1 is a best response to y ˚ P ∆2 if and only if for all i “ 1, 2, ¨ ¨ ¨ , m, x˚i ą 0 ñ pAy ˚ qi “ maxtpAy ˚ qk : k “ 1, 2, ¨ ¨ ¨ , mu. Proof. Let u “ xx˚ , Ay ˚ y. Then u“ m ÿ k“1 x˚k pAy ˚ qk “ ÿ x˚k pAy ˚ qk ě xx, Ay ˚ y ą0 x˚ k for all x P ∆1 . Assume that, for some i, pAy ˚ qi ă maxtpAy ˚ qk : k “ 1, 2, ¨ ¨ ¨ , mu and x˚i ą 0. □ As a corollary, we have (15.2). Corollary. A pair of mixed strategies px˚ , y ˚ q is Nash equilibrium if and only if Ay ˚ ď ve and B 1 x˚ ď ue for some u and v together with xx˚ , Ay ˚ ´ vey “ 0 and xx˚ , By ˚ ´ uey “ 0. □ Proof. Left as an exercise. Define P “ tpu, xq P R ˆ Rm : x ě 0, ÿ xi “ 1, B T x ď uu i Q “ tpv, yq P R ˆ Rn : y ě 0, ÿ yj “ 1, Ay ď vu j P̄ “ tx P Rm : x ě 0, B T x ď 1u Q̄ “ ty P Rn : y ě 0, Ay ď 1u Note that P̄ represents the set of “artificial” strategies for Player 1 in the sense that any normalized nonzero vector of P̄ is a mixed strategy of Player 1. In fact, for each pu, xq P P , taking x̃ “ u1 x, we see that x̃ P P̄ . For xp‰ 0q P P̄ , p ř 1xi , ř 1xi xq P P . Similarly Q̄ represents the set i i of “artificial” strategies for Player 2 in the sense that any nonrmalized nonzero vector of Q̄ is a mixed strategy of Player 2. And the points in Q and Q̄ are connected. From the Proposition (15.2), we know that there is a one-to-one correspondence between the set of Nash equilibria and the extreme points of the polyhedrons P and Q. 24 INTRODUCTION TO GAME THEORY For the notational convenience, we write y P Rn as y “ pym`1 , ym`2 , ¨ ¨ ¨ , ym`n q, M “ t1, 2, ¨ ¨ ¨ , mu and N “ tm ` 1, m ` 2, ¨ ¨ ¨ , m ` nu. For x P P̄ and y P Q̄, we define the label sets of x and y respectively by ! )ď! ) Lpxq “ i P t1, 2, ¨ ¨ ¨ , mu : xi “ 0 j P tm ` 1, m ` 2, ¨ ¨ ¨ , m ` nu : pxT Bqj “ 1 ! ) ) ď! Lpyq “ j P tm ` 1, m ` 2, ¨ ¨ ¨ , m ` n : yj “ 0u i P t1, 2, ¨ ¨ ¨ , mu : pAyqi “ 1 We now assume that the bimatrix game is nondegenerate i.e., for any x P ∆1 , |Spxq| ě P BRpxq and for any y P ∆2 , |Spyq| ě P BRpyq. In other words: for any y 1 that is best response to x, we have |Spxq| ě |Spy 1 q| and for any x1 that is best response to y, we have |Spyq| ě |Spx1 q|. By the non-degeneracy assumption, |Lpxq| ď m and |Lpyq| ď n. (15.3). Theorem. A pair px, yq corresponds to Nash equilibrium if and only if it is completely labeled: Lpxq Y Lpyq “ M Y N . Proof. Suppose Lpxq Y Lpyq “ M Y N . Let M1 “ ti : xi “ 0u, M2 “ ti : pAyqi “ 1u, N1 “ tj : yj “ 0u, N2 “ tj : pxT Bqj “ 1u. Since |Lpxq| ď m, |Lpyq| ď n and Lpxq Y Lpyq “ M Y N , we must have that M1 Y M2 “ M and N1 Y N2 “ N . Now xi ą 0 if and only if i P M2 if and only if ei is best response to y, which imply that x is best response to y. Similarly y is best response to x and hence px, yq corresponds to a Nash equilibrium. We now assume that px, yq corresponds Nash equilibrium. For i R Spxq, xi “ 0 and hence M zSpxq Ď Lpxq. Now let j P Spyq, then ej corresponds to the best response of x and hence pxT Bqj “ 1, implying that Spyq Ď Lpxq. Thus pM zSpxqq Y Spyq Ď Lpxq. Since the game is non-degenerate, |Spxq| “ |Spyq|. Therefore, Lpxq contains exactly m elements implying that Lpxq “ pM zSpxqq Y Spyq. Similarly, Lpyq “ pN zSpyqq Y Spxq. Hence Lpxq Y Lpyq “ M Y N . □ We now introduce graphs G1 and G2 with vertices given by the extreme points of P̄ and Q̄ respectively. Two extreme points x and x̃ of P̄ are connected by an edge in G1 if and only if they are adjacent extreme points. Similarly the edges of G2 are defined. We, now, introduce another graph G “ G1 ˆ G2 , where the set of vertices of G is given by V pG1 qˆV pG2 q. There is an edge between z “ px, yq and z 1 “ px1 , y 1 q if and only if px, x1 q P EpG1 q or py, y 1 q P EpG2 q. For each vertex z “ px, yq, the label of z is given by Lpzq “ Lpxq Y Lpyq. For k P M Y N , define Uk “ tz P V pGq : Lpzq Ě pM Y N qztkuu. Vertices in Uk are called “k-almost” completely labelled vertices. Note that for k ‰ l, Uk X Ul is exactly the completely labelled vertices. (15.4). Theorem. For any k P M Y N , (1) p0, 0q and all Nash equilibrium points belong to Uk . Furthermore, their degree in the graph induced by Uk is exactly one. (2) The degree of every other vertex in Uk in the graph induced by Uk is two. Proof. Since the label set of p0, 0q and any Nash equilibrium is exactly M Y N , all these belong to Uk for each k P M Y N . Let z “ px, yq be one such point. Without loss of generality, suppose k P Lpxq. In the graph G1 , among all edges that x is incident to, there is only one direction leading to a vertex x1 without label k (by loosing the binding constraint corresponding to label k). It is easy to see that z and px1 , yq have an edge showing that px1 , yq is the only neighbour of z in the graph induced by Uk . Let z “ px, yq be any other point in Uk . Then there must be label l such that l P Lpxq X Lpyq. Now z will have one neighbour through the graph G1 and another neighbour through the graph G2 implying that its degree is two. □ INTRODUCTION TO GAME THEORY 25 Thus, in a non-degenerate bimatrix game, the set of k-almost completely labelled vertices in G and their induced edges consist of disjoint paths and cycles. The end points of the paths correspond to p0, 0q and the Nash equilibria of the game. (15.5). Corollary. A non-degenerate bimatrix game has an odd number of Nash equilibria. Based on this theorem, we have the following Lemke-Howson algorithm for computing Nash equilibria of non-degenerate games. (1) (2) (3) (4) Input: non-degenerate bimatrix game pA, Bq. Choose k P M Y N . Start with px, yq “ p0, 0q P V pGq. Drop label k from z. Let z be the current vertex. Let l be the label that is picked up by dropping label k. If l “ k, then px, yq is the Nash equilibrium. If l ‰ k, drop l and repeat. The Lemke-Howson algorithm starts from origin and follows a path in Uk . This path can not be a cycle as p0, 0q has degree 1. It must end at a Nash equilirbium. § 16. Correlated Equilibria We start with few examples. (16.1). Example. Consider the game battle of sexes Player 1 Player M M 2, 1 D 0, 0 2 D 0, 0 1, 2 The game has two pure Nash equilibria in which¯the players will receive either 2, 1 or 1, 2. There ´ 2 1 is also a mixed Nash equilibrium p 3 , 3 q, p 13 , 23 q , where the players receive a reward of 23 each. Note that this reward is lower than the worst reward they get under a pure Nash equilibrium. Now consider the following situation where a third party advises their choices according to a coin toss which is observed by both of them: choose M if the coin lands in heads and choose D if the coin lands in tails. If they follow this advise, both the players will receive 23 . Interesting fact is that no player has incentive deviate from this strategy. Another interesting point to note here is that the payoff they are receiving is higher than that of in mixed Nash equilibrium. (16.2). Example. We, now, consider another game called game of chicken given by Player 1 Player 2 Y D Y 3, 3 0, 5 D 5, 0 -4, -4 The story behind this game is the following: There are two drivers who arrive at the same time to an intersection. Each one would like to drive on (strategy D) rather than yielding (strategy Y), but if both drive then they run the risk of damaging their cars. If both yield then they may waste time, but no risk of damage. The game has three Nash equilibria pY, Dq, pD, Y q and a mixed equilibria where each drive with probability 13 . The players’ expected payoffs under these equilibria are p0, 5q, p5, 0q and p2, 2q respectively. Suppose we install a traffic light which would instruct each player whether to yield or drive. For example, the light could choose uniformly at random from pY, Dq and pD, Y q. In this case the players’ payoffs will be p2.5, 2.5q and they have no incentive to deviate, assuming the other obeys it. We can also consider the situation where the light chooses from tpY, Dq, pD, Y q, pY, Y qu where the third one is chosen with probability p and the first and second are chosen with probability 1´p 2 . 26 INTRODUCTION TO GAME THEORY Given that a player is instructed to yield, the player knows that the other player has been told p to yield with conditional probability pY “ p`p1´pq{2 and to drive with conditional probability p1´pq{2 pD “ p`p1´pq{2 . Therefore the player’s utility for yielding is 3pY while the utility for driving is 5pY ´ 4pD . Therefore the player will not deviate the instruction as long as 3pY ě 5pY ´ 4pD . Simplifying this, we get p ď 12 . Each player’s utility is 3p ` 5p1 ´ pq{2. Choosing p “ 1{2, the players’ utilities will be p2.75, 2.75q which are more than in any Nash equilibrium. We will not introduce the above equilibrium concept formally. (16.3). Definition. Let G “ pS1 , S2 , u1 , u2 q be a two player finite game. A distribution µ P ℘pS1 ˆ S2 q is called a correlated strategy. A correlated strategy µ is said to be correlated equilibrium if for every player i and every si , ti P Si , it holds that ÿ ÿ µpsí ; si qui psí ; ti q. µpsí ; si qui psí ; si q ě sí PSí sí PSí Note that player i’s expected utility under correlated equilibrium is ř sPS ui psqµpsq. Several interesting consequences follow from the definition of correlated equilibrium. (16.4). Proposition. Set of correlated equilibria is closed and hence compact. Moreover, it is convex. (16.5). Proposition. Every Nash equilibrium is a correlated equilibrium. Incidentally, the above proposition also proves the existence of correlated equilibrium. Note that correlated equilibrium is defined in terms of linear inequalities. Therefore it is natural to ask if we can prove this without using Nash equilibrium. The answer to this is yes and such a proof is provided by Hart and Mas-Collel. (16.6). Example. This is taken from web https://economics.stackexchange.com/questions/ 21583/example-of-a-game-with-no-nash-equilibria-but-at-least-one-correlated-equilibr Consider a game with three players. Player 1 and 2 have strategy space r0, 1s ˆ N, while Player 3’s strategy space is r0, 1s. We denote the generic strategies of these players respectively as px, mq, py, nq and x1 . The payoff of Player 3 is simply 1 if x1 “ x and 0 otherwise. The payoff of Player 1 as well as Player 2 is 2 if y “ x ‰ x1 , and both of them is -2 if x “ x1 . If y ‰ x ‰ x1 , then the palyer with highest number in the second coordinate gets a payoff of 1 and the other one gets ´1. If they pick same digit, then both of them get 0. This game has no Nash, but it has a correlated equillibrium. § 17. Congestion and Potential Games (17.1). Definition. Let pn, S1 , S2 , ¨ ¨ ¨ , Sn , u1 , u2 , ¨ ¨ ¨ , un q be a normal form game. Let S “ S1 ˆ S2 ˆ ¨ ¨ ¨ ˆ Sn . The game is said to be potential if there exists a function f : S Ñ R such that ui psq ě ui ps1i ; sí q for all s P S, s1i P Si and i “ 1, 2, ¨ ¨ ¨ , n. (17.2). Example. Consider the game Battle of Sexes ˆ ˙ 2, 1 0, 0 0, 0 1, 2 INTRODUCTION TO GAME THEORY 27 § 18. Evolutionary Game Theory Consider a symmetric game given by the payoff matrix A. Recall that such a symmetric game is a bimatrix game where A and A1 are respectively the payoff matrices of player 1 and player 2. In particular, both players have the same set of pure strategies S “ te1 , e2 , ¨ ¨ ¨ , ek u and mixed strategies ∆ “ tx P Rk : x1 ` ¨ ¨ ¨ ` xk “ 1, xi ě 0; 1 ď i ď ku. When a strategey x is played against y, then x receives the payoff f px, yq :“ xx, Ayy “ x1 Ay “ k ÿ aij xi yj , i,j“1 and y receives f py, xq. Consider a large population of individuals pairing off randomly and play the symmetric game with payoff matrix A. Let the incumbent strategy (of all individuals) be x. Suppose that an ϵ fraction of individuals become mutants and start playing y. This would give rise to a population where 1 ´ ϵ proportion plays x and the remaining ϵ proportion plays y. In such a population, the average fitness (or payoff) for x-individulas is f px, ϵy ` p1 ´ ϵqxq, and that for y-individuals is f py, ϵy ` p1 ´ ϵqxq. Biological intuition suggests that evolutionary forces select against the mutant strategy y if and only if its postentry payoff is lower than that of the incumbent strategy x. That is, f px, ϵy ` p1 ´ ϵqxq ą f py, ϵy ` p1 ´ ϵqxq. (18.1). Definition. A strategy x P ∆ is an evolutionarily stable strategy (ESS, for short) if for every y ‰ x, there exists ϵ̄ P p0, 1q such that (18.2) f px, ϵy ` p1 ´ ϵqxq ą f py, ϵy ` p1 ´ ϵqxq; 0 ă ϵ ă ϵ̄. The highest fraction ϵ̄, satisfying ((18.2)), is called the invasion barrier of the incumbent strategy x against the mutant strategy y. So an ESS may be understood as a strategy having positive invasion barrier against all mutations. Let us now rewrite ((18.2)) as (18.3) ϵrf px, yq ´ f py, yqs ` p1 ´ ϵqrf px, xq ´ f py, xqs ą 0, 0 ă ϵ ă ϵ̄ A careful observation of this yields the next theorem whose proof is left as an exercise. (18.4). Theorem. For x P ∆, the following two statements are equivalent: (i). x is an ESS. (ii). x is a symmetric NE and, for any other best response y (against x), f px, yq ą f py, yq. An immediate consequence of this theorem is that every strict symmetic NE is an ESS. It also follows that an ESS is necessarily a symmetric Nash equilibrium. The converse of the previous two statements may not be true. The next two examples illustrate this fact. (18.5). Example (Hawk-Dove game). Consider a species of birds pairing off at random and competing for resources. Each bird is programmed to behave like a ‘Hawk’ or ‘Dove’. When two Hawks compete, the fight continues till one gets injured seriously. The injury is very costly compared with reward of success. When two Doves compete, both keep displaying till one retreats. There is no injury in this case. When a Dove is faced with a Hawk, Dove retreats immediately ˆ ˙ with out injury. Such a Hawk-Dove game may be represented by the matrix A “ ´1 2 . 0 1 Consider the mixed strategy x “ p1{2, 1{2q and note that f py, xq “ ´ 1 ´ y1 1 y1 ` y1 ` “ , 2 2 2 28 INTRODUCTION TO GAME THEORY for every mixed strategy y. Also note that 1 f px, yq ´ f py, yq “ 2py1 ´ q2 ě 0. 2 Here equality holds only if y “ x. Therefore, by Theorem (18.4), x is an ESS. (18.6). Example (Rock-Scissors-Paper game). This game has three pure strategies; Rock, Scissors, ¨ Paper. Rock˛beats Scissors, Scissors beat Paper, Paper beats Rock. The payoff matrix is 0 1 ´1 ˝ ´1 0 1 ‚. It can be easily shown that x “ p1{3, 1{3, 1{3q is the only symmetric NE. 1 ´1 0 Clearly all strategies are best responses agaist x. However, f pe1 , e1 q “ 0 “ f px, e1 q. Therefore, by Theorem (18.4), x is not an ESS. The next theorem will provide some insight in to the geometry of the set of all evolutionary stable strategies in a game. Recall that the support Spxq of a strategy x is the set of all pure strategies ei with xi ą 0. Also note that the convex hull Λpxq of Spxq is the face (of ∆) generated by (or, the smallest face containing) x. (18.7). Theorem. If x P ∆ is an ESS, then there is no other symmetric NE in Λpxq. Proof. Let x be an ESS. By Theorem (18.4), x is a symmetric NE, and hence all pure strategies in Spxq are best responses to x. This implies that Λpxq Ă BRpxq. Therefore, by Theorem (18.4), no strategy y P Λpxq is in BRpyq. That is, no strategy in Λpxq is a symmetric NE. □ (18.8). Remark. In view of Theorem (18.4), one infers also that if x is an ESS, then then there is no other ESS in the face generated by x. In particular, it follows that if there is an ESS in the interior of ∆, then there is no other ESS in the game. To obtain another useful characterization of ESS, note that the inequality ((18.2)) is equivalent to the inequality (18.9) f px, ϵy ` p1 ´ ϵqxq ą f pϵy ` p1 ´ ϵqx, ϵy ` p1 ´ ϵqxq. This motivates the next theorem, and the proof is left as an exercise to the reader. (18.10). Theorem. For x P ∆, the following two statements are equivalent: (i). x is an ESS. (ii). There exista a neighbourhood (relative to ∆) of x such that f px, yq ą f py, yq for every y P U , y ‰ x. (18.11). Remark. If x satisfies the conditions in Theorem (18.10)(ii), then some authors define it as a locally superior strategy. So the above theorem says that a strategy is an ESS if and only if it is locally superior. This result will be helpful in deriving important dynamic properties of ESS in the next section. § 19. Replicator Dynamics The notion of ESS relies upon implicit dynamical considerations. In certain situations, the underlying dynamics can be modeled by a system of ordinary differential equations on the simplex ∆. Consider a large population of individuals where each individual is programmed to adopt a certain pure strategy from te1 , e2 , ¨ ¨ ¨ , ek u in a symmetric game with payoff matrix A. Let řk ni ptq be the number of individuals adopting ei at time t. Then nptq “ i“1 ni ptq is the total population size at time t. The associated population state xptq at time t is the transpose of the vector px1 ptq, x2 ptq, ¨ ¨ ¨ , xk ptqq, where xi ptq “ ni ptq , 1 ď i ď k. nptq INTRODUCTION TO GAME THEORY 29 Clearly xi ptq is the proportion of individuals programmed to play ei at time t. In this way, a population state xptq can be considered as a mixed strategy in the simplex ∆. We also note that, at time t, the average payoff to an individual adopting ei , in a random match, is f pei , xptqq. The population average payoff is f pxptq, xptqq. Let σpei , xq “ f pei , xq ´ f px, xq. ptq is a measure of evolutionary success of ei -strategists. FollowThe relative rate of change xx9 ii ptq ing the basic tenet of Darwinism, we may express this success as the fitness difference σpei , xptqq. Thus we obtain the replicator dynamics: (19.1) x9 i “ xi σpei , xq, i “ 1, 2, ¨ ¨ ¨ , k. řk Since i“1 x9 i “ 0, the mixed strategy simplex ∆ is invariant under the replicator dynamics ((19.1)). This, together with the fact that the R.H.S. of ((19.1)) is a polynomial of degree at most three, yield existence and unique for the replicator dynamics. That is, for each initial state xp0q P ∆, the replicator dynamics admits a global unique solution xptq which remains in ∆ for all time. In addition, by the very structure of the replicator dynamics, each face of ∆ (as well as its boundary and interior) is invariant. Clearly each vertex (pure strategy) of ∆ is a stationary point of ((19.1)). But there can be other interesting stationary points. Denote by ∆0 , the set of all stationary points in ∆; that is, ∆0 “ tx P ∆ : σpei , xq “ 0 for all i P Spxqu. The next result relates the stationary points of ((19.1)) with symmetric NE of the associated game. (19.2). Theorem. Let x P ∆. If x is a symmetric NE, then it is a stationary point of ((19.1)). The converse is true when any of the following conditions hold: (c1). x is in the interior of ∆. (c2). x is a limit state of trajectory lying in the interior of ∆. (c3). x is a Lyapunov stable state of ((19.1)). Proof. If x is a symmetric NE, then clearly f pei , xq “ f px, xq, for all ei P Spxq. This implies that, for i with xi ą 0, σpei , xq “ 0. Therefore x is a stationary point of ((19.1)). We now prove the converse under (c1). If x is an interior stationary point, then σpei , xq “ f pei , xq ´ f px, xq “ 0 for i “ 1, 2, ¨ ¨ ¨ , k. This implies that x P BRpxq “ ∆, and hence x is a symmetric NE. To prove the converse under (c2), assume that the stationary point x is a limit state. That is, (19.3) x “ lim xptq, tÑ8 where xptq satisfies ((19.1)) and it lies in the interior of ∆ for all time t ě 0. Note that şt (19.4) xi ptq “ xi pt0 qe t0 σpei ,xpsqq ds 1 ď i ď k. ; If possible, let x be not a symmetric NE. Then there would be a pure strategy, say ej P Spxq satisfying f pej , xq ą f px, xq. This implies that 2δ :“ σpej , xq ą 0. This implies (by continuity of f ) that there exista a neighborhood (relative to ∆) U such that (19.5) σpej , yq ě δ, for all y P U, y ‰ x. In view of ((19.3)), this shows that there is t0 (large enough) such that xptq P U when t ď t0 . This and ((19.5)) yield σpej , xptqq ě δ, for all t ě t0 . Substituting this in to ((19.4)), we obtain (19.6) xj ptq ě xj p0qeδpt´t0 q , @t ě t0 . 30 INTRODUCTION TO GAME THEORY This contradicts ((19.3)), and indicates that x has to be a symmetric NE. It remains to prove the converse under (c3). To this end, let us take a Lynapunov stable stationary point x of ((19.1)). If x is not a symmetic NE, then there exist a pure strategy ej and t0 such that ((19.6)) is satisfied. This contradicts the Lyapunov stability of x. □ From the above theorem, it follows that all Lyapunov stable states of the replicator dynamics are symmetric NE. The next theorem shows that all ESS are asymptotically stable state of the replicator dynamics ((19.1)). (19.7). Theorem. If x is an ESS, then it is an asymptotically stable state of ((19.1)). Proof. Let x P ∆ be an ESS. By Theorem (18.10), there exists a neighborhood (relative to ∆) U of x such that σpx, yq ą 0 (19.8) @y P U ztxu. We use Lyapunov’s direct method to prove that x is asymptotically stable. Consider the relative neighborhood O of x, where O “ ty P ∆ : Spxq Ă Spyqu. Define V : O Ñ R by V pyq “ ÿ xi logp iPSpxq xi q. yi Clearly V is continuous and V pxq “ 0. Now for y P Oztxu, ř V pyq “ ´ iPSpxq xi logp xyii q ř ą ´ iPSpxq xi p xyii ´ 1q p since logprq ă r ´ 1 ř “ 1 ´ iPSpxq yi ě @r ‰ 1q 0. Therefore the proof will be complete once we show the time derivative of V pyptqq, along the trajectories of ((19.1)) is strictly negative. d dt V pyptqq “ řk “ řk “ ´σpx, yptqq. BV 9 i i“1 Byi pyptqqyptq ´xi i i“1 yi ptq yi ptqσpe , yptqq In view of ((19.8)), we now obtain the fact that the time derivative of the Lyapunov function V along the trajectories of replicator dynamics is strictly negative. □ (19.9). Remark. If x is an interior ESS, then, as in the previous theorem, it can be shown to be globally asymptotically stable; that is, intp∆q is its basin of attraction. The following example shows that the converse is not true, in general. That is, an asymptotically stable state of the replicator dynamics may fail to be an ESS. ¨ ˛ 1 5 0 (19.10). Example. Consider the symmetic game with payoff matrix A “ ˝ 0 1 5 ‚. 5 0 4 1 2 3 It is clear that the pure strategies e , e , e are not symmetic NE. Furthermore, for any mixed strategy y, f pe1 , yq “ y1 ` 5y2 , 2 “ y2 ` 5y3 , 3 “ 5y1 ` 4y3 . f pe , yq f pe , yq INTRODUCTION TO GAME THEORY 31 From this, it follows that f pe1 , yq “ f pe2 , yq iff 6y1 ` 9y2 “ 5, f pe , yq “ f pe , yq iff 6y1 “ 1, f pe1 , yq “ f pe3 , yq iff 9y2 “ 4. 2 3 7 It follows that the game has only one symmetic NE; namely x “ p 61 , 49 , 18 qu. Note also that, all mixed strategies are best responses to x. We observe that f px, e3 q “ 4x3 “ 28 . 18 But 28 , 18 and so x is not an ESS. Nevertheless, we can show that x is an asymptotically stable state of the associated replicator dynamics: $ & y9 1 “ ý1 py12 ` y22 ` 4y32 ` 5y1 y2 ` 5y2 y3 ` 5y3 y1 ´ y1 ´ 5y2 q, y9 2 “ ý2 py12 ` y22 ` 4y32 ` 5y1 y2 ` 5y2 y3 ` 5y3 y1 ´ y2 ´ 5y3 q, (19.11) % y9 3 “ ý3 py12 ` y22 ` 4y32 ` 5y1 y2 ` 5y2 y3 ` 5y3 y1 ´ 5y1 ´ 4y3 q. f pe3 , e3 q “ 4 ą The R.H.S. of ((19.11)) is a map from R3 R3 , and its gradient matrix at x is ˛ ¨ ´7{12 2{9 ´37{36 ˝ ´2 ´32{27 ´14{27 ‚. 7{36 ´77{54 ´91{108 The eigen values of this matrix are x, x, x, and hence x is an asymptotically stable stationary point of (19.11). § 20. Fictitious Play Fictitious play is a simple iterative procedure introduced by Brown [2]. The proof of convergence is given by Robinson [14]. In this method, both the players play the same game iteratively. At each stage the player, after observing the other player’s action and the history of both players, chose a pure strategy which is best response to the empirical strategy of the other player. Under some situations, this empirical strategies converge to optimal strategies. We now describe this procedure. (1) (2) (3) (4) (5) (6) Player 1 chooses a pure strategy α1 . Then x1 “ α1 . Player 2 chooses a best response β1 to x1 in pure strategies. Then y 1 “ β1 . Player 1 chosses a best response α2 in pure strategies to y 1 . Then x2 “ 21 α1 ` 12 α2 . Player 2 chooses a best response β2 in pure strategies to x2 . Then y 2 “ 12 β1 ` 12 β2 . Palyer 1 chooses a best response α3 in pure strategies to y 2 . Then x3 “ 13 α1 ` 13 α2 ` 13 α3 . Repeat steps 4 and 5 with xk , y k , k ě 3. In the following we provide the proof of this result which closely follows [7] Let A “ ppaij qq be a m ˆ n matrix which gives the payoff of a zerosum game. Let v be the value of the game (which exists by von Neumann minmax theorem). Let ∆m and ∆n denote the mixed strategy spaces of both the players respectively. Let Ai denote the ith row and Aj , denote the jth column. (20.1). Definition. A pair of sequences pU P ∆m and yl P ∆n is called a vector system Let A be the payoff matrix and consider a sequence of vectors cptq and rptq of dimension n and m defined by (1) maxi ci p0q “ minj rj p0q; (2) rpt ` 1q “ rptq ` al ; where l P arg maxi ci ptq and al is the lth row of A; (3) cptq “ cpt ´ 1q ` bk , where k P arg minj rj ptq and bk is the kth column of A; 32 INTRODUCTION TO GAME THEORY Let ni , mj denote theřnumberřof times that ai , bj are added to rp0q, cp0q in forming rptq, m cptq respectively. Then i ni “ j mj “ t. Let xi “ nti , yj “ tj , then x, y are respectively mixed strategies of players 1 and 2. Without loss of generality we assume both players have same number of pure strategies. Now note that ÿ ÿ ci ptq “ ci p0q ` t aij yj and rj ptq “ rj p0q ` t aij xi . j i It is trivial to see that min j rj ptq ci ptq ď valuepAq ď max i t t Consider ci ptq ´ rj ptq “ ci p0q ´ rj p0q ` t ÿ tyk aik ´ xk akj u k We now assume that the matrix game A is symmetric i.e., A “ Á1 . Then ÿ ci ptq ´ ri ptq “ ci p0q ´ ri p0q ` t aik pyk ` xk q k Now take z “ 21 px ` yq, then ci ptq ´ ri ptq “ ci p0q ´ ri p0q ` tÿ aik zk 2 k Without loss of generality, we make the assumption that all the entries of A are non-negative. Then ci ptq ´ rj ptq ě ci p0q ´ rj p0q ` t min max u ˆ We now consider a game with the payoff matrix B “ 0 A w ˙ ÁT . Note that 0 z “ 21 px1 , x2 , ¨ ¨ ¨ , xn , y1 , y2 , ¨ ¨ ¨ , yn q is a valid mixed strategy of this game. If the maximizing player chooses this mixed strategy z and minimising ř player chooses rows i and j with equal probability then the payoff obtained is given by 41 kl pyk aik ´ xl alj q. Since B is skew-symmetric, the value of B is zero. Thus min ij ci ptq ´ rj ptq Ñ0 t as t Ñ 8. Thus both mini ci ptq t and maxj rj ptq t converge to same value. Note that ci ptq ´ ri ptq “ ci p0q ´ ri p0q ` t ÿ tyk aik ´ xk aki u k Thus for a symmetric game, ci ptq “ ri ptq. § 21. Cooperative Games A transferable utility game (TU game) is specified by pN, vq, where N is the set of players and v : 2N Ñ R. It is also called game in characteristic form. The value vpSq for a subset S Ď N is called the worth of coalition S. Needless to say, the subsets of N are called coalitions. INTRODUCTION TO GAME THEORY 33 § 22. Nucleolus Nucleolus is another solution concept for TU games. It is manifestation of Rawlsian social welfare, wherein the the welfare of the worst-off player is maximized. The welfare, here, is measured in terms of the excess function. Now consider a TU game pN, vq and consider a coalition C of N . The excess epC, xq of a coalition at an allocation x is defined by epC, xq “ vpCq ´ xpCq. ř Note that xpCq “ iPC xi . Whenever epC, xq ą 0, we can see this as the amount of dissatisfaction or complaint of the players of C from the allocation x. Using this notation, we can see that CorepN, vq “ tx P Rn |x is an imputation and epC, xq ď 0, C Ď N u. Let e˚ pxq be the arrangement of the values tepC, xq, C Ď N u in the decreasing order. Thus e˚ pxq N can be seen as a vector R2 ´1 . Here we excluded the excess corresponding to empty coaltion, which is zero always. N N Let ľlex denote the lexicographical ordering of R2 ´1 i.e., u, v P R2 if either u “ v or there is t such that ui “ vi for 1 ď i ă t and ut ą vt . ´1 , w ąlex y if and only We say that an allocation y is better than x provided e˚ pyq ľlex e˚ pyq. Now, the nucleolus N upN, vq is the set N upN, vq “ tx P Rn : x is an imputation and e˚ pyq ľlex e˚ pxq for each imputation yu. § 23. Utility Under Certainity § 23.1. Preference Relations and Utility Representation. Let X be a nonempty set which is supposed to be the set of alternatives that a decision maker has. A preference relation on X is a binary operation satisfying ‚ Complete. for all x, y P X, either x ĺ y or y ĺ x ‚ Transitive. for all x, y, z P X, if x ĺ y and y ĺ z, then x ĺ z. In fact, we say that such a preference relation is rational. Any order preserving map u : X Ñ R is called a utility function. Note that if there is an utility function, then the preference relation is necessarily rational. If x ĺ y and y ĺ x, then we write x „ y to denote that both x and y are equally likely. We write x ă y to mean that x ĺ y but not y ĺ x. In this case y is strictly preferred to x. The sets of alternatives strictly worse and better than y P X are denoted respectively by W pyq “ tx P X|x ă yu and Bpyq “ tx P X|x ą yu. Given a topology on X, we say that the preference relation is continuous if for each y, the sets W pyq and Bpyq are open. The preference relation is upper semicontinuous if W pyq is open for each y. Lower semicontinuity of the preference relation is defined analogously. The set X is called Jaffray order separable if there is a countable subset D Ă X such that for all x, y P X, x ă y ùñ Dy, y 1 P D : x ĺ d ă d1 ĺ y. One can easily verify that there is a utility function if and only if X is Jaffray order separable. (23.1). Proposition. If X is countable, there always exists an utility function. Proof. Let X “ tx1 , x2 , ¨ ¨ ¨ u and let tai u be a sequence of positive integers with ÿ upxq “ ai ř ai ă 8. Set xi ĺx then u is an utility function. Proof. Let X “ tx1 , x2 , ¨ ¨ ¨ u and u be defined inductively as follows: ‚ upx1 q “ 0 □ 34 INTRODUCTION TO GAME THEORY ‚ $ ’ &upx1 q if x1 „ x2 1q upx2 q “ ´1ùpx if x2 ă x1 2 ’ % upx1 q`1 if x1 ă x2 2 □ (23.2). Definition (Lexicographic Preference). Let X and Y be two sets with preference relations ĺ1 and ĺ2 respectively. On X ˆ Y , we define the preference relation ĺ as follows: px, yq ĺ px1 , y 1 q if x ă1 x1 or x „1 x1 , y ĺ2 y 1 . The preference relation ĺ is called lexicographic preference. (23.3). Proposition. Let X “ Y “ r0, 1s. On X ˆ Y , consider the lexicographic order induced by the usual order on r0, 1s. There is no utility function which represents this order on X ˆ Y . Proof. Suppose u is an utility function representing the order on r0, 1sˆr0, 1s. For each a P r0, 1s, note that pa, 0q ă pa, 1q. So, we can choose a rational number qpaq P pupa, 0q, upa, 1qq. Thus we have a one-one function q : r0, 1s Ñ Q, which is a contradiction. Thus there can not be any utility function representing the order on r0, 1s ˆ r0, 1s. □ (23.4). Exercise. Show that lexicographic preference is not continuous. (23.5). Definition. A preference relation on X is said to be separable if there is a countable set D such that x ă y ùñ x ĺ z ĺ y for some x P D. (23.6). Theorem. Let X be uncountable. The preference relation is representable by utility function if and only if it is separable. Proof. Existence of a utility function obviously implies the separability of the preference relation. So, we prove only the other way. Let D be order separable. Let µ be a probability distribution over D. For any x P X, let ÿ ÿ upxq “ µy ´ µy . yPW pxqXD u satisfies the desired properties. yPBpyqXD □ INTRODUCTION TO GAME THEORY 35 References [1] Tilman Börgers and Rajiv Sarin, Learning through reinforcement and replicatior dynamics, J. Econ. Theory, 77(1997), 1 - 14. [2] G. W. Brown, Iterative solutions of games by fictitious play, in Koopmans, 374 - 376, 1951. [3] G. W. Brown and J. von Neumann, Solutions of games by differential equations, in Kuhn and Tucker, 73 79, 1950. [4] T. Fuzimato, An extension of Tarski’s fixed point theorem and its application to isotone complementarity problems, Math. Programming, 28(1984), 116 - 118. [5] J. Gait, Stability in the gaming equation, Bull. Aus. Math. Soc., 21(1980), 207 - 210. [6] J. Geanakoplos, Nash and Walras equilibrium via Brouwer, Economic Theory, 21(2003), 585 - 603, [7] S. Karlin, Mathematical Methods and Theory in Games, Programming, and Economics, Dover, 1992. [8] P.D. Lax, Change of variables in multiple integrals, American Math. Monthly, 106(99), 497 - 501. [9] R.D. Luce and H. Raiffa, Games and Decisions, Dover, 1957. [10] Andreu Mas-Colell, Michael D. Whinston, and Jerry R. Green, Microeconomic Theory. Oxford University Press, New York, 1995. [11] J.F. Mertens, S. Sorin, and S. Zamir, Repeated Games, Econometric Society Monographs, 55, Cambridge University Press, New York, 2015 [12] H. Nikaidô, Stability by equilibrium by the Brown-von Neumann differential equation, Econometrica, 27(1959), 654 - 671. [13] T.E.S. Raghavan, Completely mixed strategies in bimatrix games, J. London Math. Soc., (2), 2(1970), 709 - 712. [14] J. Robinson, An iterative method of solving a game, Annals of Mathematics, 54(1951), 296 - 301. [15] A. Tarski, A lattice-theoretical fixpoint theorem and its applications, Pacific J. Math., 5(1955), 285 - 309.

IE616 Notes (4)

Related documents

Products

Support

IE616 Notes (4)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib