Sherlock Holmes, Criminal Interrogations and Aspects of Non-cooperative Game Theory

advertisement
Sherlock Holmes,
Criminal Interrogations and Aspects of
Non-cooperative Game Theory
Math Models
Spring 2000
Brandi Ahlers
Jennifer Lohmann
Madoka Miyata
Soo-Bong Park
Rae-San Ryu
Jill Schlosser
Game Theory
Game Theory provides economists with a systematic way of analyzing problems of
strategic behavior where one player’s actions depend essentially on what others players may
do. As early as 1838 Cournot had made clear that problems of economic optimization are
greatly simplified when either no other players are present or there are unboundedly many.
(Eatwell, 1989). Game theory has developed over time, during which it can be divided by
several decade times, that of which that John VonNeumann was most influential will be
described here.
This paper will reflect on concepts of game theory, including applications of these
principles, the Holmes/Moriarty Paradox, the Prisoner’s Dilemma, and F-scale applications
of game theory.
Concepts of Game Theory
Game theory is the mathematical analysis of situation of conflict and cooperation. It
contains four elements such as players, strategies, outcomes and payoffs. We assume each
player should play individually, which means that there is no communication between
players. Players make their decisions on how to get their favorable outcomes, which are their
best strategies by using rational play. Player’s best strategy leads a player to earn the highest
payoff. Player’s best strategy is to maximize gains of payoffs and minimize losses of
payoffs.
Several concepts are intrinsic to game theory. Some of the most important of these
include concepts of payoffs and saddle points.
Payoff.
A payoff is a reward, which a player earns from, in a given play in a game. Usually,
the row player’s payoffs are shown in the reward matrix, while column player’s payoffs are
the negatives of these (Straffin, 1993).
Saddlepoint.
A saddle point is the optimal strategy for both players, and it is a pair of strategies,
which the game will evolve, when each player uses rational play. There are two ways to
locate the saddlepoint. Both the principles of Maximin and minimax, as well as a movement
diagram are effective.
When considering the maximin and minimax principles, the row player should follow
the Maximin strategy. First, the player finds the smallest entry in each row, then takes the
maximum of these numbers. On the other hand, a column player should follow the Minimax
strategy. First, the player finds the largest column entry in each column, then takes the
minimum of these numbers.
For example, if Rose and Colin play a coin game, where each independently choose
heads or tails, and Rose’s payoffs are shown in the matrix below.
Figure 1. Rose’s payoffs in the coin game.
Colin’s Options
Rose’s Options
H
T
Row Minimum
H
-6
1
-6
T
3
1
1
Column
3
1
Saddlepoint =1
Maximum
*Payoffs are bolded for ease of reading
In this reward matrix (figure 1), Rose is considered the “row player”, since her
options are found in the rows, while Colin is considered the “column player” since his
options are found in the columns. Thus, by the Maximin Theorem, Rose should follow the
Maximin strategy. Her smallest entries in each row are –6 and 1, then she takes maximum of
these payoffs, which is 1. Colin should follow the Minimax strategy. His largest entries in
each column are 3 and 1 then he takes minimum of these numbers, which is 1. Therefore 1 is
the saddle point of this game (Straffin, 1993). Another way to think about this process is that
each player should play for himself or herself from the Maximin point of view. Since the
matrix could be written in terms of either player, one might rationally consider that each
player could write the matrix in terms of his own payoffs as the row payoffs and then follow
this strategy. In this sense, both players can make use of each of the strategies, provided the
matrix is inverted to do so.
The movement diagram is a more visual method to find the saddle point. First, we
consider the row player’s point of view, and secondly, the column player’s point of view.
For example, the Rose/Colin coin game as seen previously, the following movement diagram
results.
Figure2 Movement Diagram
Colin
Rose
H
T
H
3
-6
T
2
1
In explanation of the movement diagram, we can initially examine the game from
Rose’s point of view. If Rose knows or guesses that Colin will choose a head, Rose would
want to play heads because she has the highest payoff, which is three. Thus, we draw Rose’s
arrow upward from two to three. Similarly, the arrow is drawn downward to one from
negative six (Straffin, 1993).
On the other hand, we may also examine Colin’s view. If Colin believes Rose will
keep choosing heads, he is always better off choosing tails because his payoff is six, which is
the best payoff. So we draw Colin’s arrow to the right – towards the negative six. Similarly,
the arrow can be drawn toward the one from the two to complete the movement diagram
(Straffin, 1993).
In movement diagram if all of the arrows are pointing to a payoff and there is no
arrow coming out of that payoff, then it is called saddle point. If the value of the saddle point
is equal to zero, that means that the game is fair. On the other hand, if the value of saddle
point is not equal to zero, then the game is biased. This is true, since it a non-zero
saddlepoint would recommend optimal strategies for each play, but leave one player with the
advantage to earning a higher payoff over time (Straffin, 1993).
If the game has a saddle point, it is called a determinate game. The saddle point
indicates that there is a clear set of strategies, which the players have to use to attain the
highest payoff in the long run.
A game tree, always used when applying game theory to a problem, shows the
progression of moves in the game among players. It includes information sets and decision
nodes. An information set is shown when a player makes a choice, and he/she knows he/she
is at node in particular information set, but he/she does not know which node. On the other
hand, a decision node is which a moment in the game at which a player must act (Eatwell,
1989).
Adventures in Sherlock Holmes
Most people can recall hearing stories of the world’s most famous sleuth, Sherlock
Holmes. Consider the following story…
In Sherlock Holmes’ latest pursuit of truth, he has unraveled a mystery concerning the most evil
Professor Moriarty. The professor is aware of Holmes’ astute deduction and must do everything in his
power to eliminate Holmes. And so, Sherlock Holmes knows his only possible way to survive is to
escape Moriarty, reaching the continent alive. He is faced with the alternative of going to Dover or
leaving the train at Canterbury, the only intermediate station. As the train pulls out, Holmes observes
Professor Moriarty on the platform. Holmes assumes, and we believe him to be fully justified, that his
enemy might secure a special train and overtake him. Holmes’ adversary, whose intelligence is
assumed to be adequate to visualize these possibilities which Holmes is considering has the same
choice. Both opponents must choose their detrainment in ignorance of the other’s corresponding
decision. If, as a result of these measures, they should find themselves, in the end, on the same
platform, the Professor will certainly kill Sherlock Holmes. If Sherlock Holmes reaches Dover
unharmed, he can certainly make his escape (Case, 2000).
In attempting to understand the paradox present, we might notice that Holmes
eventually does reach the Continent by detraining at Canterbury while Moriarty proceeds
directly to Dover. Yet the odds appear to favor Moriarty – Holmes seems quite unlikely, as
the train departs Victoria station to reach the continent alive (Case, 2000).
Oskar Morgenstern was the first to recognize this famous paradox, upon which he
began questioning a host of learned men. However, he was constantly dissatisfied with the
responses he received – until he met a man by the name of John VonNeumann, who provided
what Morgenstern believed was an appropriate explanation based on Game Theory (Case,
2000).
Like all games, the rewards or payoffs for the Holmes/Moriarty Paradox can be put
into a reward matrix, which shows the rewards of the game from Holmes’ view. The reward
matrix of this game displays the probability of Holmes’ survival given each possible outcome
or set of moves. It is obvious that Holmes will be killed if both he & Moriarty choose the
same alternative of either Dover or Canterbury, so Holmes’ payoff is zero in either case.
Since Holmes’ appears to have more than a fighting chance to survival, should he go to
Canterbury and Moriarty go to Dover, and merely a fighting chance of survival should he go
to Dover and Moriarty to Canterbury, one might label these chances of survival as P and p,
where 1<=P<=0 (Case, 2000).
Table 3. Payoff’s in the Holmes/Moriarty Paradox.
Moriarty
Holmes
C
D
C
0
P
D
P
0
To fully understand how to analyze the Holmes/Moriarty Paradox, it is necessary to
understand further concepts of games. For example, in non-zero sum games, like this
paradox, when there is no saddle point, you can use randomization to choosing a strategy.
This is called a mixed strategy, which is basing your choice on probability. A man named
VonNeumann introduced mixed Strategy. VonNeumann introduced this concept so that
neither player can take advantage of the other in looking out for his or her own best interest
(Case, 2000).
As stated previously, Determinate Games have saddle points. The saddle point is a
clear set of strategy’s which each player should use to attain the highest payoff. On the other
hand, the Holmes and Moriaty Paradox is known as an Indeterminate Game. An
Indeterminate Game is any game, which does not contain a saddle point. Also, in this kind of
game, the Movement Diagram keeps going in circles.
To find the Mixed Strategy, the Holmes and Moriarty Reward Matrix can be put into
numerical form. It is impossible to know the precise probability of survival given the
particular outcomes where p and P are used as payoffs. However, to analyze the problem,
one may choose probabalistic values which satisfy the inequality 1<=P<=p. Thus, the p can
be replaced by 2/3, P by 1. The 1 replaces P because this is a larger probability of survival
for Holmes than the p of 2/3. The small p has a smaller probability because Holmes only has
a fighting chance to escape from Moriarty.
Moriarty’s Options
C
D
C
0
2/3
D
1
0
Holmes’s
Options
Since the Reward Matrix is now in numerical form, we can find the Mixed Strategy,
using the Principal of Equalized Expectation. In general, one might use p to represent the
probabilities the player should follow for each given outcome (i.e. his or her strategy) and q
to represent the options of the players. The [q1, q2, …, qk] represents the player’s options,
and [p1,p2, …, pk] represent the player’s associated probabilities. Using the principles of
mathematical expectation, the general form for the expected outcome of the player is: E =
p1q1 + p2q2 + p3q3 + p4q4 … pkqk.
Replacing the known values and doing some algebra, it is found that Holmes’ optimal
strategy is 2/5C+3/5D. Moreover, the same process leads to Moriarty’s optimal strategy of
3/5C+2/5D. This means that Holmes ought to go to Canterbury 2/5 of the time, and to Dover
3/5 of the time – with the opposite being true for Moriarty in order to maximize each of their
expectations.
Thus, Holmes’s and Moriarty’s Strategy Equations go into the Reward Matrix as
shown below.
C
Holmes’s
Options
Moriarty’s Options
D
C
0
2/3
D
1
0
2/5C + 3/5D
3/5C + 2/5D
These strategies found using mathematical expectation are considered optimal
strategies. They are the strategies, which force the expectation of their opponents to be equal
and thus limit the opponent’s ability to cause harm. The arrow in the matrix point to the cell
where the Saddlepoint for this game would be.
It is interesting to note that VonNeumann’s mixed strategy results in the same
outcome from Holmes’ as the original story, although the author presumably had no
knowledge of game theory.
The Prisoner’s Dilemma
The Prisoner’s dilemma is the most historic application of game theory. Imagine that
two players have been arrested, called “you” and “your partner”. Both players have been
placed in separate rooms in the police station, where the district attorney is questioning each.
The district attorney tells each player that if one player confesses, while the other does not,
the confessor will get a reward and his/her partner will receive a heavy sentence. He also
tells you that if both players confess, each will receive a light reward. There is good reason
to believe that if neither player confesses, both will go free (Straffin, 1993).
The numerical values of the payoffs can be placed in a reward matrix, where the
payoffs of heavy sentence (-2), light sentence (-1), go free (0), reward (1) are shown for each
possible outcome. Since this is a non-zero sum game and players are not strictly opposed, the
payoffs are shown in pairs, where the first number is the payoff of the row player and the
second payoff is that of the column player (Straffin, 1993).
Nash Equilibrium and Non-Cooperative Solutions
In a non-zero sum game, players'payoffs do not add to zero, which means each
player has his/her own payoffs. Unlike a zero sum game the interests of players in a nonzero sum game are not strictly opposed, and not strictly coincident. In a zero sum game it
is important what kind of assumptions we make. To understand this non zero sum game
better, we assume non zero sum games are non-cooperative, and there is no
communication between players. We also assume each player does what is in his/her own
best interest.
We will introduce three different kinds of two person games with non-cooperation
between players, which will lead us to solve Prisoner'
s Dilemma later on this paper.
1. Games without Equilibrium
Colin
H
T
H
(2, 4)
(1, 0)
T
(3, 1)
(0, 4)
Rose
The first example shown above contains no equilibrium outcome, which means no
saddle point in a zero sum game. An equilibrium outcome in a non-zero sum game is
equivalent to a saddle point in zero sum game. Since there is no pure strategy
equilibrium, we can solve this game by applying mixed strategy as we have explained in
Holmes and Moriarty Paradox.
2. Games with two different equilibrium
In 1950 John Nash proved that every two-person game has at least one
equilibrium in either pure strategies or mixed strategies. This shows that the theory of
non-zero sum games would not be much difficult to solve than zero sum theory.
However, John Nash introduced the games that have two different equilibrium, which are
somewhat different with no equilibrium games.
Colin
H
T
H
(1, 1)
(2, 5)
T
(5, 2)
(-1, -1)
Rose
In a non-zero sum game multiple saddle points are always equivalent and
interchangeable. Thus, if both players always choose saddle points, then the result of the
game is always saddle points, which is optimal strategy for both players. However, this
Rose/Colin game shown above illustrate that if both players play their favorable
equilibrium, they will end up at their worst outcome, which is (-1, -1).
3. Prisoner'
s Dilemma
The final non-zero sum game we will introduce is Prisoner'
s Dilemma. In 1950s
Melvin Dresher and Merrill Flood at RAND Corporation, introduced that a non-zero sum
game could have a unique equilibrium outcome, but fail to Pareto Optimal.
Colin
Rose
A (do not
confess)
B (Confess)
A (do not
confess)
(0, 0)
B (Confess)
(1, -2)
(-1, -1)
(-2,1)
Later at a Seminal at Stanford University, Albert W. Tucker introduced Prisoner'
s
Dilemma. It is now the most widely studied and used game in social science. This game
shows B, which means '
confess'
, is always dominant for both players with a unique
equilibrium at BB. However, both players rationally follow their equilibrium outcome,
which is their own best interest, then the result of this game is unfortunate for both
players.
To understand this ambiguity better we will introduce Pareto Principle. An
outcome of a game is non-Pareto Optimal if there is another outcome, which would give
both players higher payoffs, or would give one player the same payoff but the other
player a higher payoff. An outcome is Pareto optimal if there is no such other outcome.
From the definition of Pareto principle we could acknowledge Prisoner'
s Dilemma has
unique equilibrium, but not Pareto Optimal.
In conclusion of non-zero sum game, there are games that equilibrium is not
always optimal strategy as we have seen from Prisoner'
s Dilemma. However, equilibrium
outcome is certainly desirable because of its stability, and two-person game is Pareto
optimally solvable when there are multiple equilibrium points, which are equivalent and
interchangeable, and there is a unique equilibrium point, which is Pareto Optimal.
After examining the prisoner’s dilemma one may wonder what would be a rational
way to choose a strategy to find the solution to this non-cooperative game. Two different
strategies are Repeated Play-theory and the Metagame argument (Straffin, 1993).
Repeated play theory suggests that the game be modeled as if it is being played over
and over. Players may be more willing to cooperate in the beginning, giving them the Pareto
optimal solution. If the players know which game is the last game logically they will choose
what is best for them. So when taking a formal approach to the problem we will model it as
though each player does not know when the last game will be played. The other big
assumption that is made when using the repeated play-theory is that a player must assume his
or her opponent will begin by cooperating and continue to cooperate until that player defects.
With this assumption it is possibly to deduce two expressions, given that p is the probability
that the next play will occur, and is given a value between 0 and 1.
Expression 1: Displays the payoff when there is always cooperation
R + pR
+ p
2
R + p
3
R + ..........
R
(1 − p )
.... =
Expression 2: Displays the payoff when you choose to defect at game m
R
−
p
m
+
R
(1 − p ) p
(1 − p )
m
T
+
p
m
+ 1
U
When looking at these two equations it makes sense to conclude that one should never defect
if the payoff in expression one is larger then expression two. So we set up our expression:
R
R − p m R + p mT − p m+1T + p m +1U
>
(1 − p )
(1 − p )
This expression can be reduced to find the probability of when a player should cooperate. If
p is greater then some threshold value then he or she should cooperate. After doing some
algebra the expression is reduced to:
p >
T
T
− R
− U
Now we can go back to the reward matrix of the prisoner’s dilemma and substitute in the
values for T, R, and U. This leads to:
p >
1− 0
1 − ( − 1)
p >
1
2
It is possible to conclude that, if the probability p is greater than ½, the threshold value, then
under the assumption it makes sense for both players to cooperate. Thus giving a Pareto
optimal solution (Straffin, 1993).
Metagame Argument:
Traditionally the prisoner’s dilemma is played just once, so one may look at the
Metagame argument to see if this strategy can be utilized to argue that the player should
choose C.
When the Metagame argument is set up, each player must to do a bit of mind reading.
For example say your partner may want to cooperate with you but he fears that you won’t
cooperate too. So your partner has four strategies based on what he believes you will choose.
These are listed below and set up in matrix form.
I.AA II: AB
III:BA IV:BB
I: Choose A regardless
A
(0,0)
(0,0)
(-2,1)
(-2,1)
II: Choose the same as opponent
B
(1,-2)
(-1,-1)
(1,-2) (-1,-1)
III: Choose opposite of opponent
IV: Choose B regardless
When looking at the matrix you should notice that the payoff for your partner is
greatest in column IV. Knowing this you should choose row B because you would rather
have a payoff of –1 then –2. This leads to the payoff of (-1,-1). This is not the Pareto
optimal solution we are looking for, so we take this game a step further. Now it’s your turn
to choose strategies based on what you believe you partner’s strategies are. This leads us to a
16 X 4 reward matrix.
I: AA
I:AAAA
II:AAAB
III:AABA
IV:AABB
V:ABAA
VI:ABAB
VII:ABBA
VIII:ABBB
IX:BAAA
X:BAAB
XI:BABB
XII:BABB
XIII:BBAA
XIV:BBAB
XV:BBBA
XVI:BBBB
(0,0)
(0,0)
(0,0)
(0,0)
(0,0)
(0,0)
(0,0)
(0,0)
(1,-2)
(1,-2)
(1,-2)
(1,-2)
(1,-2)
(1,-2)
(1,-2)
(1,-2)
II:AB
(0,0)
(0,0)
(0,0)
(0,0)
(-1,-1)
(-1,-1)
(-1,-1)
(-1,-1)
(0,0)
(0,0)
(0,0)
(0,0)
(-1,-1)
(-1,-1)
(-1,-1)
(-1,-1)
III:BA
(-2,1)
(-2,1)
(1,-2)
(1,-2)
(-2,1)
(-2,1)
(1,-2)
(1,-2)
(-2,1)
(-2,1)
(1,-2)
(1,-2)
(-2,1)
(-2,1)
(1,-2)
(1,-2)
IV:BB
(-2,1)
(-1,-1)
(-2,1)
(-1,-1)
(-2,1)
(-1,-1)
(-2,1)
(-1,-1)
(-2,1)
(-1,-1)
(-2,1)
(-1,-1)
(-2,1)
(-1,-1)
(-2,1)
(-1,-1)
After carefully analyzing all of your outcomes you will find that row XII has the best
payoffs for you. Knowing that row XII is the best payoff for you your partner should then
choose column II because this column has the best payoff for him in row XII. Now we end
up with a payoff of (0,0). This is our Pareto optimal solution. Although this strategy seemed
to work, there was a great amount of mind reading going on between you and your partner to
make this game work (Straffin, 1993).
Aside from applications of game theory to both simple and complex games, this
theory has made important contributions to research. Oodles of surveys are issued in
research today consisting of a seven point scale, which was piloted by T.W. Adorno in his
research testing personality variable underlying susceptibility to authoritarian personalities,
explained in his book, The Authoritarian Personality. From the beginning, this research was
controversial, and thus further research has been done to verify these results. Morton
Deutsch was the first to do this, when he examined commonsense concepts of suspicion,
untrustworthiness, and scoring on the F-scale. In his experiment, he had subjects take the Fscale questionnaire and then play the prisoner’s dilemma. He found that there was a
correlation between each of these variables and furthermore those participants who scored
high on the F-scale played the Prisoner’s Dilemma differently (Straffin, 1993).
There is some discrepancy between the interpretation of these results. However
researchers agree that experimental games give psychologists a way to make previously
vague concepts precise and operational, as well as providing measurable results about
connections between concepts. As an illustration of the prevalence of game theory in
research, one might note that since the 1980’s over 1000 experiments have employed the
Prisoner’s Dilemma (Straffin, 1993).
In Conclusion, there are many uses of game theory from zero and non-zero sum
games to cooperative and non-cooperative games. Applications of this theory include
subjects such as economics, the Prisoner’s Dilemma, social psychology research, as well as
any time rational thinking is used to solve a game.
It is also important to note why Game Theory is a successful model. As well as
providing a wide variety of applications, game theory provides a concrete map with the rules
of the game, how the game is played, and the knowledge of each player at any given moment.
This allows investigators to successfully analyze complex problems involving rational
thinking.
References
Eatweel, Milgate, Newman. The new Palgrave, game theory: W.W. Norton &company inc;
New York, NY 1989.
Case, James. Paradoxes involving conflicts of interest. Mathematical association of America;
33-38, January 2000.
Straffin, Philip D. Game Theory and strategy: The Mathematical Association of America;
1993.
Download