Sherlock Holmes, Criminal Interrogations and Aspects of Non-cooperative Game Theory Math Models Spring 2000 Brandi Ahlers Jennifer Lohmann Madoka Miyata Soo-Bong Park Rae-San Ryu Jill Schlosser Game Theory Game Theory provides economists with a systematic way of analyzing problems of strategic behavior where one player’s actions depend essentially on what others players may do. As early as 1838 Cournot had made clear that problems of economic optimization are greatly simplified when either no other players are present or there are unboundedly many. (Eatwell, 1989). Game theory has developed over time, during which it can be divided by several decade times, that of which that John VonNeumann was most influential will be described here. This paper will reflect on concepts of game theory, including applications of these principles, the Holmes/Moriarty Paradox, the Prisoner’s Dilemma, and F-scale applications of game theory. Concepts of Game Theory Game theory is the mathematical analysis of situation of conflict and cooperation. It contains four elements such as players, strategies, outcomes and payoffs. We assume each player should play individually, which means that there is no communication between players. Players make their decisions on how to get their favorable outcomes, which are their best strategies by using rational play. Player’s best strategy leads a player to earn the highest payoff. Player’s best strategy is to maximize gains of payoffs and minimize losses of payoffs. Several concepts are intrinsic to game theory. Some of the most important of these include concepts of payoffs and saddle points. Payoff. A payoff is a reward, which a player earns from, in a given play in a game. Usually, the row player’s payoffs are shown in the reward matrix, while column player’s payoffs are the negatives of these (Straffin, 1993). Saddlepoint. A saddle point is the optimal strategy for both players, and it is a pair of strategies, which the game will evolve, when each player uses rational play. There are two ways to locate the saddlepoint. Both the principles of Maximin and minimax, as well as a movement diagram are effective. When considering the maximin and minimax principles, the row player should follow the Maximin strategy. First, the player finds the smallest entry in each row, then takes the maximum of these numbers. On the other hand, a column player should follow the Minimax strategy. First, the player finds the largest column entry in each column, then takes the minimum of these numbers. For example, if Rose and Colin play a coin game, where each independently choose heads or tails, and Rose’s payoffs are shown in the matrix below. Figure 1. Rose’s payoffs in the coin game. Colin’s Options Rose’s Options H T Row Minimum H -6 1 -6 T 3 1 1 Column 3 1 Saddlepoint =1 Maximum *Payoffs are bolded for ease of reading In this reward matrix (figure 1), Rose is considered the “row player”, since her options are found in the rows, while Colin is considered the “column player” since his options are found in the columns. Thus, by the Maximin Theorem, Rose should follow the Maximin strategy. Her smallest entries in each row are –6 and 1, then she takes maximum of these payoffs, which is 1. Colin should follow the Minimax strategy. His largest entries in each column are 3 and 1 then he takes minimum of these numbers, which is 1. Therefore 1 is the saddle point of this game (Straffin, 1993). Another way to think about this process is that each player should play for himself or herself from the Maximin point of view. Since the matrix could be written in terms of either player, one might rationally consider that each player could write the matrix in terms of his own payoffs as the row payoffs and then follow this strategy. In this sense, both players can make use of each of the strategies, provided the matrix is inverted to do so. The movement diagram is a more visual method to find the saddle point. First, we consider the row player’s point of view, and secondly, the column player’s point of view. For example, the Rose/Colin coin game as seen previously, the following movement diagram results. Figure2 Movement Diagram Colin Rose H T H 3 -6 T 2 1 In explanation of the movement diagram, we can initially examine the game from Rose’s point of view. If Rose knows or guesses that Colin will choose a head, Rose would want to play heads because she has the highest payoff, which is three. Thus, we draw Rose’s arrow upward from two to three. Similarly, the arrow is drawn downward to one from negative six (Straffin, 1993). On the other hand, we may also examine Colin’s view. If Colin believes Rose will keep choosing heads, he is always better off choosing tails because his payoff is six, which is the best payoff. So we draw Colin’s arrow to the right – towards the negative six. Similarly, the arrow can be drawn toward the one from the two to complete the movement diagram (Straffin, 1993). In movement diagram if all of the arrows are pointing to a payoff and there is no arrow coming out of that payoff, then it is called saddle point. If the value of the saddle point is equal to zero, that means that the game is fair. On the other hand, if the value of saddle point is not equal to zero, then the game is biased. This is true, since it a non-zero saddlepoint would recommend optimal strategies for each play, but leave one player with the advantage to earning a higher payoff over time (Straffin, 1993). If the game has a saddle point, it is called a determinate game. The saddle point indicates that there is a clear set of strategies, which the players have to use to attain the highest payoff in the long run. A game tree, always used when applying game theory to a problem, shows the progression of moves in the game among players. It includes information sets and decision nodes. An information set is shown when a player makes a choice, and he/she knows he/she is at node in particular information set, but he/she does not know which node. On the other hand, a decision node is which a moment in the game at which a player must act (Eatwell, 1989). Adventures in Sherlock Holmes Most people can recall hearing stories of the world’s most famous sleuth, Sherlock Holmes. Consider the following story… In Sherlock Holmes’ latest pursuit of truth, he has unraveled a mystery concerning the most evil Professor Moriarty. The professor is aware of Holmes’ astute deduction and must do everything in his power to eliminate Holmes. And so, Sherlock Holmes knows his only possible way to survive is to escape Moriarty, reaching the continent alive. He is faced with the alternative of going to Dover or leaving the train at Canterbury, the only intermediate station. As the train pulls out, Holmes observes Professor Moriarty on the platform. Holmes assumes, and we believe him to be fully justified, that his enemy might secure a special train and overtake him. Holmes’ adversary, whose intelligence is assumed to be adequate to visualize these possibilities which Holmes is considering has the same choice. Both opponents must choose their detrainment in ignorance of the other’s corresponding decision. If, as a result of these measures, they should find themselves, in the end, on the same platform, the Professor will certainly kill Sherlock Holmes. If Sherlock Holmes reaches Dover unharmed, he can certainly make his escape (Case, 2000). In attempting to understand the paradox present, we might notice that Holmes eventually does reach the Continent by detraining at Canterbury while Moriarty proceeds directly to Dover. Yet the odds appear to favor Moriarty – Holmes seems quite unlikely, as the train departs Victoria station to reach the continent alive (Case, 2000). Oskar Morgenstern was the first to recognize this famous paradox, upon which he began questioning a host of learned men. However, he was constantly dissatisfied with the responses he received – until he met a man by the name of John VonNeumann, who provided what Morgenstern believed was an appropriate explanation based on Game Theory (Case, 2000). Like all games, the rewards or payoffs for the Holmes/Moriarty Paradox can be put into a reward matrix, which shows the rewards of the game from Holmes’ view. The reward matrix of this game displays the probability of Holmes’ survival given each possible outcome or set of moves. It is obvious that Holmes will be killed if both he & Moriarty choose the same alternative of either Dover or Canterbury, so Holmes’ payoff is zero in either case. Since Holmes’ appears to have more than a fighting chance to survival, should he go to Canterbury and Moriarty go to Dover, and merely a fighting chance of survival should he go to Dover and Moriarty to Canterbury, one might label these chances of survival as P and p, where 1<=P<=0 (Case, 2000). Table 3. Payoff’s in the Holmes/Moriarty Paradox. Moriarty Holmes C D C 0 P D P 0 To fully understand how to analyze the Holmes/Moriarty Paradox, it is necessary to understand further concepts of games. For example, in non-zero sum games, like this paradox, when there is no saddle point, you can use randomization to choosing a strategy. This is called a mixed strategy, which is basing your choice on probability. A man named VonNeumann introduced mixed Strategy. VonNeumann introduced this concept so that neither player can take advantage of the other in looking out for his or her own best interest (Case, 2000). As stated previously, Determinate Games have saddle points. The saddle point is a clear set of strategy’s which each player should use to attain the highest payoff. On the other hand, the Holmes and Moriaty Paradox is known as an Indeterminate Game. An Indeterminate Game is any game, which does not contain a saddle point. Also, in this kind of game, the Movement Diagram keeps going in circles. To find the Mixed Strategy, the Holmes and Moriarty Reward Matrix can be put into numerical form. It is impossible to know the precise probability of survival given the particular outcomes where p and P are used as payoffs. However, to analyze the problem, one may choose probabalistic values which satisfy the inequality 1<=P<=p. Thus, the p can be replaced by 2/3, P by 1. The 1 replaces P because this is a larger probability of survival for Holmes than the p of 2/3. The small p has a smaller probability because Holmes only has a fighting chance to escape from Moriarty. Moriarty’s Options C D C 0 2/3 D 1 0 Holmes’s Options Since the Reward Matrix is now in numerical form, we can find the Mixed Strategy, using the Principal of Equalized Expectation. In general, one might use p to represent the probabilities the player should follow for each given outcome (i.e. his or her strategy) and q to represent the options of the players. The [q1, q2, …, qk] represents the player’s options, and [p1,p2, …, pk] represent the player’s associated probabilities. Using the principles of mathematical expectation, the general form for the expected outcome of the player is: E = p1q1 + p2q2 + p3q3 + p4q4 … pkqk. Replacing the known values and doing some algebra, it is found that Holmes’ optimal strategy is 2/5C+3/5D. Moreover, the same process leads to Moriarty’s optimal strategy of 3/5C+2/5D. This means that Holmes ought to go to Canterbury 2/5 of the time, and to Dover 3/5 of the time – with the opposite being true for Moriarty in order to maximize each of their expectations. Thus, Holmes’s and Moriarty’s Strategy Equations go into the Reward Matrix as shown below. C Holmes’s Options Moriarty’s Options D C 0 2/3 D 1 0 2/5C + 3/5D 3/5C + 2/5D These strategies found using mathematical expectation are considered optimal strategies. They are the strategies, which force the expectation of their opponents to be equal and thus limit the opponent’s ability to cause harm. The arrow in the matrix point to the cell where the Saddlepoint for this game would be. It is interesting to note that VonNeumann’s mixed strategy results in the same outcome from Holmes’ as the original story, although the author presumably had no knowledge of game theory. The Prisoner’s Dilemma The Prisoner’s dilemma is the most historic application of game theory. Imagine that two players have been arrested, called “you” and “your partner”. Both players have been placed in separate rooms in the police station, where the district attorney is questioning each. The district attorney tells each player that if one player confesses, while the other does not, the confessor will get a reward and his/her partner will receive a heavy sentence. He also tells you that if both players confess, each will receive a light reward. There is good reason to believe that if neither player confesses, both will go free (Straffin, 1993). The numerical values of the payoffs can be placed in a reward matrix, where the payoffs of heavy sentence (-2), light sentence (-1), go free (0), reward (1) are shown for each possible outcome. Since this is a non-zero sum game and players are not strictly opposed, the payoffs are shown in pairs, where the first number is the payoff of the row player and the second payoff is that of the column player (Straffin, 1993). Nash Equilibrium and Non-Cooperative Solutions In a non-zero sum game, players'payoffs do not add to zero, which means each player has his/her own payoffs. Unlike a zero sum game the interests of players in a nonzero sum game are not strictly opposed, and not strictly coincident. In a zero sum game it is important what kind of assumptions we make. To understand this non zero sum game better, we assume non zero sum games are non-cooperative, and there is no communication between players. We also assume each player does what is in his/her own best interest. We will introduce three different kinds of two person games with non-cooperation between players, which will lead us to solve Prisoner' s Dilemma later on this paper. 1. Games without Equilibrium Colin H T H (2, 4) (1, 0) T (3, 1) (0, 4) Rose The first example shown above contains no equilibrium outcome, which means no saddle point in a zero sum game. An equilibrium outcome in a non-zero sum game is equivalent to a saddle point in zero sum game. Since there is no pure strategy equilibrium, we can solve this game by applying mixed strategy as we have explained in Holmes and Moriarty Paradox. 2. Games with two different equilibrium In 1950 John Nash proved that every two-person game has at least one equilibrium in either pure strategies or mixed strategies. This shows that the theory of non-zero sum games would not be much difficult to solve than zero sum theory. However, John Nash introduced the games that have two different equilibrium, which are somewhat different with no equilibrium games. Colin H T H (1, 1) (2, 5) T (5, 2) (-1, -1) Rose In a non-zero sum game multiple saddle points are always equivalent and interchangeable. Thus, if both players always choose saddle points, then the result of the game is always saddle points, which is optimal strategy for both players. However, this Rose/Colin game shown above illustrate that if both players play their favorable equilibrium, they will end up at their worst outcome, which is (-1, -1). 3. Prisoner' s Dilemma The final non-zero sum game we will introduce is Prisoner' s Dilemma. In 1950s Melvin Dresher and Merrill Flood at RAND Corporation, introduced that a non-zero sum game could have a unique equilibrium outcome, but fail to Pareto Optimal. Colin Rose A (do not confess) B (Confess) A (do not confess) (0, 0) B (Confess) (1, -2) (-1, -1) (-2,1) Later at a Seminal at Stanford University, Albert W. Tucker introduced Prisoner' s Dilemma. It is now the most widely studied and used game in social science. This game shows B, which means ' confess' , is always dominant for both players with a unique equilibrium at BB. However, both players rationally follow their equilibrium outcome, which is their own best interest, then the result of this game is unfortunate for both players. To understand this ambiguity better we will introduce Pareto Principle. An outcome of a game is non-Pareto Optimal if there is another outcome, which would give both players higher payoffs, or would give one player the same payoff but the other player a higher payoff. An outcome is Pareto optimal if there is no such other outcome. From the definition of Pareto principle we could acknowledge Prisoner' s Dilemma has unique equilibrium, but not Pareto Optimal. In conclusion of non-zero sum game, there are games that equilibrium is not always optimal strategy as we have seen from Prisoner' s Dilemma. However, equilibrium outcome is certainly desirable because of its stability, and two-person game is Pareto optimally solvable when there are multiple equilibrium points, which are equivalent and interchangeable, and there is a unique equilibrium point, which is Pareto Optimal. After examining the prisoner’s dilemma one may wonder what would be a rational way to choose a strategy to find the solution to this non-cooperative game. Two different strategies are Repeated Play-theory and the Metagame argument (Straffin, 1993). Repeated play theory suggests that the game be modeled as if it is being played over and over. Players may be more willing to cooperate in the beginning, giving them the Pareto optimal solution. If the players know which game is the last game logically they will choose what is best for them. So when taking a formal approach to the problem we will model it as though each player does not know when the last game will be played. The other big assumption that is made when using the repeated play-theory is that a player must assume his or her opponent will begin by cooperating and continue to cooperate until that player defects. With this assumption it is possibly to deduce two expressions, given that p is the probability that the next play will occur, and is given a value between 0 and 1. Expression 1: Displays the payoff when there is always cooperation R + pR + p 2 R + p 3 R + .......... R (1 − p ) .... = Expression 2: Displays the payoff when you choose to defect at game m R − p m + R (1 − p ) p (1 − p ) m T + p m + 1 U When looking at these two equations it makes sense to conclude that one should never defect if the payoff in expression one is larger then expression two. So we set up our expression: R R − p m R + p mT − p m+1T + p m +1U > (1 − p ) (1 − p ) This expression can be reduced to find the probability of when a player should cooperate. If p is greater then some threshold value then he or she should cooperate. After doing some algebra the expression is reduced to: p > T T − R − U Now we can go back to the reward matrix of the prisoner’s dilemma and substitute in the values for T, R, and U. This leads to: p > 1− 0 1 − ( − 1) p > 1 2 It is possible to conclude that, if the probability p is greater than ½, the threshold value, then under the assumption it makes sense for both players to cooperate. Thus giving a Pareto optimal solution (Straffin, 1993). Metagame Argument: Traditionally the prisoner’s dilemma is played just once, so one may look at the Metagame argument to see if this strategy can be utilized to argue that the player should choose C. When the Metagame argument is set up, each player must to do a bit of mind reading. For example say your partner may want to cooperate with you but he fears that you won’t cooperate too. So your partner has four strategies based on what he believes you will choose. These are listed below and set up in matrix form. I.AA II: AB III:BA IV:BB I: Choose A regardless A (0,0) (0,0) (-2,1) (-2,1) II: Choose the same as opponent B (1,-2) (-1,-1) (1,-2) (-1,-1) III: Choose opposite of opponent IV: Choose B regardless When looking at the matrix you should notice that the payoff for your partner is greatest in column IV. Knowing this you should choose row B because you would rather have a payoff of –1 then –2. This leads to the payoff of (-1,-1). This is not the Pareto optimal solution we are looking for, so we take this game a step further. Now it’s your turn to choose strategies based on what you believe you partner’s strategies are. This leads us to a 16 X 4 reward matrix. I: AA I:AAAA II:AAAB III:AABA IV:AABB V:ABAA VI:ABAB VII:ABBA VIII:ABBB IX:BAAA X:BAAB XI:BABB XII:BABB XIII:BBAA XIV:BBAB XV:BBBA XVI:BBBB (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (1,-2) (1,-2) (1,-2) (1,-2) (1,-2) (1,-2) (1,-2) (1,-2) II:AB (0,0) (0,0) (0,0) (0,0) (-1,-1) (-1,-1) (-1,-1) (-1,-1) (0,0) (0,0) (0,0) (0,0) (-1,-1) (-1,-1) (-1,-1) (-1,-1) III:BA (-2,1) (-2,1) (1,-2) (1,-2) (-2,1) (-2,1) (1,-2) (1,-2) (-2,1) (-2,1) (1,-2) (1,-2) (-2,1) (-2,1) (1,-2) (1,-2) IV:BB (-2,1) (-1,-1) (-2,1) (-1,-1) (-2,1) (-1,-1) (-2,1) (-1,-1) (-2,1) (-1,-1) (-2,1) (-1,-1) (-2,1) (-1,-1) (-2,1) (-1,-1) After carefully analyzing all of your outcomes you will find that row XII has the best payoffs for you. Knowing that row XII is the best payoff for you your partner should then choose column II because this column has the best payoff for him in row XII. Now we end up with a payoff of (0,0). This is our Pareto optimal solution. Although this strategy seemed to work, there was a great amount of mind reading going on between you and your partner to make this game work (Straffin, 1993). Aside from applications of game theory to both simple and complex games, this theory has made important contributions to research. Oodles of surveys are issued in research today consisting of a seven point scale, which was piloted by T.W. Adorno in his research testing personality variable underlying susceptibility to authoritarian personalities, explained in his book, The Authoritarian Personality. From the beginning, this research was controversial, and thus further research has been done to verify these results. Morton Deutsch was the first to do this, when he examined commonsense concepts of suspicion, untrustworthiness, and scoring on the F-scale. In his experiment, he had subjects take the Fscale questionnaire and then play the prisoner’s dilemma. He found that there was a correlation between each of these variables and furthermore those participants who scored high on the F-scale played the Prisoner’s Dilemma differently (Straffin, 1993). There is some discrepancy between the interpretation of these results. However researchers agree that experimental games give psychologists a way to make previously vague concepts precise and operational, as well as providing measurable results about connections between concepts. As an illustration of the prevalence of game theory in research, one might note that since the 1980’s over 1000 experiments have employed the Prisoner’s Dilemma (Straffin, 1993). In Conclusion, there are many uses of game theory from zero and non-zero sum games to cooperative and non-cooperative games. Applications of this theory include subjects such as economics, the Prisoner’s Dilemma, social psychology research, as well as any time rational thinking is used to solve a game. It is also important to note why Game Theory is a successful model. As well as providing a wide variety of applications, game theory provides a concrete map with the rules of the game, how the game is played, and the knowledge of each player at any given moment. This allows investigators to successfully analyze complex problems involving rational thinking. References Eatweel, Milgate, Newman. The new Palgrave, game theory: W.W. Norton &company inc; New York, NY 1989. Case, James. Paradoxes involving conflicts of interest. Mathematical association of America; 33-38, January 2000. Straffin, Philip D. Game Theory and strategy: The Mathematical Association of America; 1993.