The Exchange Paradox -A Failure of Intuition

advertisement
The Psychological Basis of
Rationality:
Examples from Games
and Paradoxes
Richard M. Shiffrin
• Claim:
• Rationality is a cognitive, not an
axiomatic, concept, and is defined both
individually and socially, in the context
of particular problems and decisions
What is rational in the end is
determined by a sufficiently large
consensus of thinkers judged to be
sufficiently clear reasoners
Backward Induction
• Example: Game of Nim
20 marbles on table. Players A (first) and B take
turns taking one or two marbles. Player taking last
marble(s) wins.
How should A play? Hard to reason— E.g. A takes
1, B takes 2, A takes 2, etc. => Many possibilities.
Use backward induction:
• Start at end: Player facing 1 or 2 wins.
But player facing 3 loses. So A should try
to see that B faces 3. But the same
applies if B faces 6, because whatever B
does, A can give B 3. Similarly for 9, and
all multiples of 3. Hence if A can give B a
multiple of 3, A will win. Hence A takes 2
at start, giving B 18. Each time A gives B
the next lower multiple of 3, until A wins.
• This is an example of
backward induction.
But backward induction can fail -Centipede Game
• Two players take turns, either ‘stopping’ or
‘playing’. ‘STOP’ ends the game for both
players. ‘PLAY’ causes the player to lose $1 (to
the bank), the other player to get $10 (from the
bank), and the turn moves to the other player.
There are 20 turns, if no one stops first.
• Goal is to maximize personal profit, not beat the
other player.
• Seems nice– both players gain a lot by playing
many turns.
• BUT, assume purely ‘rational’ players. The
player with the last turn will surely STOP,
because PLAY loses $1, and the game
ends. Knowing this, the player on trial 19
will STOP, because playing loses $1, with
no gain following. Knowing this, the player
on trial 18 will STOP, etc. Hence by
backward induction, the player with the
first move will STOP, and end the game
with both players getting nothing.
• This may seem as irrational to you as it
does to me. Both players have a lot to gain
by playing as many trials as possible, so
how can it be rational to not play at all?
Even if one of the players should stop near
the end, both would have received much
money by then.
• Of course there are problems with the
reasoning. The player on trial n STOPS
because that player is sure the other will
STOP on trial n+1. But the only way the
game could have reached trial n is if the
other player PLAYED on the preceding
trial. Hence the player cannot be sure the
other player will STOP on the next trial.
• This helps justify PLAY on early trials, but
doesn’t give a way to play when the end
approaches.
• So consider a two-trial game. A gets the
first turn, B the second. If B gets a turn,
the game then ends.
A and B get $0
STOP
A
STOP
PLAY
A loses $1, B gains $10
B
PLAY
A gains $9, B gains $9
B will STOP at turn 2, to keep all $10. This will cause A to
lose $1, so A will STOP at turn 1, and A and B get $0.
Is this rational? Note that both can get $9 if both play.
• Game theory, economic theory, subjective
expected utility theory, and many more, all
say A should STOP. Is this rational?
• I argue that there is no axiomatic definition
of rationality, that one must decide what is
rational in the context of a given game.
• When two rational players decide what to
do, they must not only decide what to do,
but what is ‘rational’ in that game setting.
• Here I suggest it is rational for both
players to play. Why? There are only three
strategies to consider: {STOP}, {PLAY,
STOP}, and {PLAY, PLAY}. But {PLAY,
STOP} cannot be the rational strategy,
because then A would not PLAY.
• This leaves only {STOP} and {PLAY,
PLAY}. {STOP} gives both players $0;
{PLAY, PLAY} gives both players $9.
Clearly the second is the rational choice.
• Some people can’t accept this. They argue
that B, at turn 2, will ‘defect’ and stop. But
if this is rational, then A knows it, and A
will not play at turn 1. Hence B should
PLAY at turn two, if both players are
rational.
• I know some of you will not like this
reasoning, because ‘defection’ seems very
seductive. But we are discussing rational
decision making, not emotional feelings.
• The problem with traditional reasoning is
the failure to take into account that the two
decisions are correlated. Imagine that the
correlation was perfect, in the sense that
Player 1’s decision to play always
occurred with Player 2’s to play (of course
if Player 1 stops, Player 2 is irrelevant). In
this case everyone would play when
Player 1.
• But why should one assume correlation? I
will provide a few lines or argument.
• Suppose the players must lay out their
irrevocable strategies (not known to the
other player) before knowing who will be
the first player. The first player will be
determined by a later coin flip.
• A could choose to defect on trial 2, if A is
the second player, and stop on trial 1, if
the first player. Is this rational? If so, it is
likely both players will choose this, and
both will get $0.
• Note that this reasoning violates an
‘axiom’ of game theory, that one plays on
the last trial whatever is best at that point,
regardless of how that point is reached. I
say this is wrong: In a one trial game, B
would of course STOP, to save $1. In a
two trial game at the same point B PLAYS,
because this allows B to get $9 rather than
$0.
• Why does the strategy change? Because
what B does affects how A plays. A knows
B’s ‘rational’ decision before the first play,
just as does B.
• Perhaps you remain unconvinced, so let
me give what could be a scenario that is
more personally convincing.
• Suppose there are N players, each playing one trial of the
centipede game against him or herself. They do this under
the drug Midazolam that leaves reasoning intact but
prevents memory of what decision had been made at the
first decision point. Each player is told to adopt a Player 1
strategy that will maximize the number of times they get a
Player 1 payoff better than the other people when they are
Player 1. Similarly each player adopts a Player 2 strategy to
maximize the number of times they get a Player 2 payoff that
is better than the other people when they are Player 2.
• IMPORTANT: When you are Player 2, you must decide what
you will do before you know whether you will get the
opportunity to play--You must make a conditional decision: If
I get to play what will I do? It could well be that you would
decide not to play when Player 1, so you might never get a
chance to give a Player 2 decision.
• So what strategies do you choose? When
1, you can get -1, 0, or 9. If you play when
Player 1, and also cooperate when Player
2, then you will get 9 and at least tie for
best.
• However, if you think you will decide to
defect when Player 2 you would be best
off not playing at all (-1 would tie for
worst).
• When Player 2 you can get 0,9, or 10. If you get
a chance to play you can cooperate and get 9 or
defect and get 10. If you cooperate you will lose
to those who play at step 1 and defect at step 2.
•
• But which other players would do that? Players
who think defection is rational will not play at
step 1. Thus how can you lose by cooperating?
• You can reason this before playing as Player 1,
and hence can confidently play, and be
reasonably certain you will later cooperate.
• What is about playing one’s self that makes
cooperation at step 2 seem rational? The key is
the correlation between the decisions made at
both steps. You are able to assume that
whatever you decide before play 1 about what
you will later decide as Player 2 will in fact come
to pass.
• The assumption of rational players makes this
correlation even stronger—if a set of decisions is
rational all rational players will adopt them.
• It is of course just this correlation between
decisions of multiple players that is
ignored when arriving at Nash equilibria,
and other seemingly irrational decisions.
• Thus when we say all players are
assumed rational (not very satisfactory if
we have not defined rationality), we are
more importantly saying that the decisions
of the players are positively correlated.
• There are many reasons why decisions
could and should be correlated—social
norms, (playing one’s self), group
consensus, reliance of experts, and so on.
• Interestingly, if the expert community
defines rationality in a way that makes
defection rational, then assuming the
opponents to be rational would lead to
defection, not cooperation. Perhaps this
occurred in the early days of game theory.
Another lesson to take home:
• Rationality is ‘locally defined’ and ‘context
bound’
• Further, there is room for disagreement
– Colman in his recent BBS article discusses
the Centipede game and concludes there is
nor resolution and no rational way to play.
– Others would not like the resolution I
suggested. Even others argue an initial STOP
is the rational play in the centipede game.
Newcomb’s Paradox
• (My version). A is given two envelopes and told
they contain the same amount of money, either
(1,1) or (1000,1000). A’s goal is to maximize
gain, and A can choose either: ONE envelope,
or BOTH envelopes.
• Before the game a psychologist predicts whether
A will choose ONE or BOTH. If the prediction is
ONE, 1000 is placed in both. If the prediction is
BOTH, 1 is placed in both.
• Strangely the prediction is quite accurate.
How is this possible? A actually plays the
game twice, but is given the drug
Midazolam, which leaves reasoning intact,
but prevents learning. Once an interruption
occurs, what has occurred before is
completely forgotten (dentists use this
stuff).
• Whatever A chooses the first time, the
psychologist predicts A will do again. A is
told about Midazolam, about both trials,
and is told that exactly the same
information is provided on both trials.
• Trial 1: Either: {1,1} or {1000,1000}
-- A chooses BOTH or ONE
• A forgets
Trial 2:
{1,1}
or
{1000,1000}
Should A choose BOTH or ONE?
• P tells A both times that much testing shows that
the prediction is 95% accurate in the general
population: When people choose BOTH, 95% of
the time the envelopes contain 1; when people
choose ONE, 95% of the time, the envelopes
contain 1000 – i.e. people tend to make the
same decision in the same situation.
• A complains that A doesn’t know if this is the first
or second test; if it is the first, he should choose
ONE so there will be 1000 in the envelopes on
test two.
• However, P says the goal is to maximize gain
only on the present trial, however many trials
may have preceded it.
What should A choose?
• Argument 1: Whatever is in the envelopes, it is too late
to do anything about it, so it only makes sense to choose
BOTH and double the gain.
• Argument 2: A should choose ONE, because A tends
95% of the time to make the same decision. Choosing
ONE makes it likely that 1000 is in both envelopes.
• Rejoinder: How can what one chooses now affect what
is in the envelopes? They have already been filled!
• Rejoinder: That is a good way to get $2, and lose
$1000. Obviously the ONE choosers tend to get $1000.
• The best human minds cannot decide
what is right. That is, people are quite
often sure they are right, but they
disagree. Roughly 1/3 of the people say
BOTH, 1/3 say ONE, and 1/3 say it cannot
be decided. There are many published
articles defending each alternative.
• What does this say about ‘rationality’?
• Rationality is what people say it is, but
which people? And what proportion of
them?
• I believe there the rational decision is
ONE, but how does one convince those in
the other two camps that this is right?
• Because rationality is a ‘cognitive’
phenomenon, we can try ‘reframing’ the
problem to convince others. Of course the
recipient of the reframed problem must
also be convinced that the new problem
retains the critical elements of the original.
• I’ll try this out with a few examples.
• Scenario 1:
• B watches A decide. Whatever A gets, B gets
also. B knows people fall into two classes, those
choosing ONE and those choosing BOTH. B
would very much want A to be someone who
chose ONE last time. Therefore B wants A to
choose ONE this time– i.e. to be a ONE chooser
type.
• Suppose A chooses, but before the envelopes
are open, B can sell the profit. If A’s choice was
ONE, B expects 1000, so would sell for
something like 900. If A’s choice was BOTH, B
expects 2, so would sell for something like 5.
• How is this thinking different for A, the
chooser? After a choice, but before
opening the envelopes, how much would
A sell the (unknown) profit for? The sales
prices would be similar to those demanded
by B. E.g. following a BOTH choice by A,
A knows herself to be a BOTH chooser,
and expects to get $2. Following a ONE
choice by A, A knows herself to be a ONE
chooser, and expects $1000.
• Why should A wait to notice this? Why not
choose ONE and ‘produce’ $1000?
• Scenario 2:
• A and B are together, each discussing the
choice they are about to make for their own sets
of envelopes. They know they were together on
trial 1 also. They argue– A says BOTH, B says
ONE; neither can convince the other.
• Just before the choice B pulls a knife and forces
A to choose ONE, saying “I won’t let your
foolishness cost you money”.
• What does A now expect to get? If on trial 1, B
had also forced A at knifepoint to choose ONE,
then A expects to get 1000.
• Why should A need a knife to choose ONE? A
can choose ONE freely, with the same result.
• Perhaps these ‘reframings’ make you
reconsider your answer. But what is wrong
with the argument that one’s choice now
can’t change what is in the envelopes?
This seems like ‘backward causality’.
• This is a confusion of causality and
correlation. The two choices are
correlated, but neither causes the other.
Rather, both are caused by the ‘thinking
processes’ used by A to decide. Given no
memory, these thinking processes tend to
repeat most of the time.
• Scenario 3: At the time A makes her
choice, the envelopes are empty. A is told
the envelopes will be filled on the basis of
the next choice A will make (under
midazolam, without memory, with identical
instructions). A is told to maximize present
gain. She chooses, forgets, makes a
subsequent choice, and later gets the
payoff determined by the subsequent
choice. Should she choose ONE or
BOTH?
• A good case can be made that the forward
and backward versions are identical in
structure. Yet most people believe that
backward causality doesn’t apply in the
forward case, and most people faced with
the forward case see good reason to
choose ONE. They reason: “If I choose
ONE now, I will likely do so again,
because the choice situation is identical”.
• Why is anything different in the backward
case?
Scenario 4:
Non-intuitive Information
• Suppose A is having trouble deciding, and
is told: “If you like, you can open the
envelopes, look at the contents, and then
decide. I gave you this same option the
previous time you participated.”
• Should A look?
• Strangely, A should decline the offer.
Why?
• If A looks, A will certainly choose BOTH,
whatever is found. A does not have to look
at the contents to know this. Hence a
decision to look is just a decision to
choose BOTH.
• If A now chooses to look, A would have
likely chosen to look the previous time,
making an outcome of $2 very likely.
• In general, the more A knows about the
likely contents of the envelopes, the more
A would want to choose BOTH.
Scenario 5: Height
• The psychologist does not use a previous
trial to fill the envelopes. Instead, he tells A
that much research has shown that height
is a reliable predictor of choice, enabling
the psychologist to predict choice correctly
52% of the time (for reasons unknown).
• The envelopes have been filled on the
basis of A’s height.
• A does not know whether being tall or short
causes $1000 to be in the envelopes.
• Whatever size A is, if A chooses ONE, it makes
that height more likely to be the height
associated with $1000, and conversely for a
choice of BOTH.
• Hence A should choose ONE, if the payoff
difference is large enough.
• In the earlier versions, A’s decision processes
were the causative agent producing the contents
of the envelopes on both occasions. In this
version, A’s height is the causative agent.
• Scenario 6: Symmetry of Decisions
• Some people are bothered by the
Midazolam scenarios becomes it seems
there is an infinite regress of past (or
future) decision situations required.
• So let there be exactly two trials under
Midazolam, the payoff for trial A
determined by B, and vice versa, these
facts known at both trials.
• There are more reframings of this sort, but
this gives you the idea. I have found that
even the most fervent BOTH choosers
either change their position, or at least
express uncertainty, when given these and
other problem reframings.
• Like the centipede game, the resolution I
argue for is based on the correlation
between the two different decisions, a third
factor being the cause that produces the
correlation.
• Regardless of the ‘answer’, we see once
again that rationality is what people say it
is. There is no guarantee people will
agree, producing an disturbing situation.
• Connecting Newcomb’s, Backward Induction, and
Prisoner’s Dilemma
• A ‘prisoner’s dilemma’ has two players, A, B, each of
whom has a decision option E that will guarantee them a
better outcome, regardless of what the other player
does. However, if both players choose this dominating
strategy, they each will get a poor outcome:
B
For both A and B, 8 is better
than 5 and –5 is better than
-10, so both prefer E and
both get –5. But if both
choose D, both get +5!
D
E
D
A, B
(5, 5)
A, B
(-10, 8)
E
A, B
(8, -10)
A
A, B
(-5, -5)
• The centipede game argument would
imply that two known rational players
would both choose D, the ‘cooperate
strategy’.
• But what would a ‘real’ player do? Doug
Hofstadter once ran a test in Scientific
American– he thought his bright friends
would cooperate; however, they defected.
• But what would a ‘real’ player do playing
against him- or herself?
• Have a player make a choice under
midazolam, ‘twice’. Knowing that the
opponent is oneself, would a player
cooperate? Would it matter if the choice is
the first or second one made, as long as
there is no memory?
• The choices are symmetric so the choice
order seems irrelevant, but suppose your
current choice is second. Whatever you
chose the other time, you know you will do
better by defecting this time. Should you
cooperate nonetheless? This is
Newcomb’s in another guise. You can
make yourself a ‘cooperative’ person.
• In all these cases, what is rational is not
defined absolutely, by rule, or by axiomatic
system. What is a rational decision is a
cognitive process, context dependent, and
subject to some sort of general agreement
by thoughtful humans.
The Exchange Paradox
• (In economic circles, known as Nalebuff’s
Paradox (1988, and related to Siegel’s
Paradox in foreign exchange, 1972)
• One envelope has 10 times the money of
another (i.e. M and 10M).
• The player (P) chooses an envelope and
chooses to keep the contents (X), or
irrevocably exchange for the other.
{$D
{$10D
$10D}
$D}
The Strategy
• P reasons the other envelope has half a
chance of having 10X, and half a chance
of having (1/10)X. The expected value for
exchanging is then:
• E(V) = (1/2)(1/10)X + (1/2)(10)X =5.05X
• This is larger than X, so P exchanges.
The Puzzle
• This reasoning applies regardless of X.
• Hence P should always exchange.
• But if P always exchanges, why even look at the
contents of the first envelope? Why not save a
step and just take the second? But then the
same reasoning says one should switch back.
• More critically, since P chooses randomly, and
always exchanges, symmetry requires that the
amounts rejected have the same probability
distribution as those accepted.
The Paradox
• How can P gain by exchanging, and yet
not gain at all?
Resolution Number One
• The savvy among you notice that it
matters how the envelopes are filled. One
needs to know what are the amounts X
and what probabilities they have. The
problem doesn’t say. It might be that once
we know exactly how the envelopes are
filled, the paradox will disappear.
• Do you think this is the case?
An Algorithm for Filling
• We fill as follows: Flip a coin until the first
heads appears, on the n-th flip. Then put
10n-1 and 10n in the two envelopes.
• E.g. with prob ½ a heads comes up on flip
1, so we put 1 and 10 in the envelopes.
With probability ¼ a heads comes up first
on the second flip, so we put in 10 and
100. Etc.
Paradox Not Solved
• If we observe X > 1,
P([1/10]X, X) = 2P(X,10X)
• So P(other envelope has 1/10 X) = 2/3
and P(other envelope has 10X = 1/3.
• Hence E(V) = (2/3)(1/10)X + (1/3)(10X) =
102/30 = 3.4X > X, so exchange.
• If X = 1, exchanging gains 9, for sure.
• So, always exchange.
But
• By symmetry, distribution of amounts
rejected and accepted have to be identical
(same reasoning as before). How can we
gain and not gain simultaneously?
• This game is easy to program on your PC.
One can verify that exchanging produces
3.4X, and also that the amounts rejected
and accepted have the same distribution.
Paradox Resolution Number 2
The even savvier among you will notice that
this game has infinite expectation– the gains keep
going up by a factor of 10, but the probabilities
keep going down by a factor of ½.
E(G) = (.5)1101 + (.5)2102 + (.5)3103 + … =
∞
Everyone knows we can’t compare gains
when both lines of play have infinite
expectation. One infinity can’t be larger
than the other (e.g. another example some
of you may know is called the St.
Petersburg Paradox).
Perhaps this explains the paradox: Perhaps
such a paradox couldn’t appear in a finite
game.
• Most writers think this, and stop here.
Can we make the game finite?
• Make the game finite by terminating the coin
flips with a heads if a very large number Z of
consecutive tails occurs.
• A possible strategy would then be: exchange
always, except when X =10Z, in which case
STAY, because this is the largest possible.
• (This is still paradoxical, but I’ll return later to this
point).
• But this loses some paradoxical essence,
because P does not ALWAYS exchange.
Always Exchange in a Finite Game
• Ask someone to choose a largest limit N,
vastly smaller than Z, but still vast. This is
easy to do if Z is something like 10100000.
• If N coin flips come up tails, then one
stops anyway, and pretends the last flip
was a heads.
• One way to get the number N:
• Ask a friend to fill a sheet of paper with
digits. Permute these, and the result is N.
• There are only so many digits that can fit
on a sheet of paper, so N is obviously
finite, though unknown.
• For any number X observed, P has an
infinitesimally small chance, c, of guessing
correctly that this is the largest possible.
• I.e. How likely is it that X is the number N,
rather than any number smaller than N?
• For exchanging, E(V|X) =
(1-c)(3.4X) + (c)(.1X) =~ 3.4 X
>X
Unfortunately, this means the paradox returns in
a finite game: One should always exchange.
But how can this be sensible?
Paradox Made Worse?
• Instead of P choosing an envelope at
random, someone examines the two and
hands P the largest with probability .8, and
the smallest with probability .2.
• P(other is larger) is:
(.5)Q(.2)/[(.5)Q(.2)+Q(.8)] = 1/9
• E(V) = (8/9)(1/10)X + (1/9)10X = 1.2X > X
• So always exchange.
This is bad news
• If P always exchanges, then P will
exchange the larger for the smaller .8 of
the time. Further, the distribution of
rejected numbers strictly dominates those
accepted.
• If one exchanges one gets, and rejects:
• Amount P(Gets) P(Rejects) P(G):P(R)
1
.4
.1
4:1
|
|
|
10n (½)n-1(.6) (½)n-1(.9)
6:9
|
|
|
10N-1 (½)N-1
(½)N-1
1:1
10N (½)N-1(.2) (½)N-1(.8)
2:8
• Exchanging gets 1 more often, and every
other outcome less often!
• Another way to look at things: Suppose
you are handed the .8 and .2 envelopes,
but don’t open the .8 one. You clearly
would want the .8 envelope, since it is 4
times more likely to have the larger
amount. However, once you open it, you
seem to want to exchange for the .2,
regardless of what you find.
• This probably makes you uneasy.
• If you see X, the contents of the .8
envelope, you want to exchange.
• If you see X*, the contents of the .2
envelope, you also want to exchange, only
you expect to gain more than the first
case.
• Given this, why exchange a .8?
• This might make you feel uneasy.
What a good empiricist would do
• P, a scientist, carries out a test: P
programs this game on his PC, always
exchanges, and tables the outcomes for
many thousands of trials.
• Now P will learn if P is winning or losing!
• (No more worries about poor reasoning
abilities or bad mathematical derivations).
P gets an answer or two
• For X =1, P gains 9
• For all other X, P gains ~.2X (the larger
the number of simulation trials, the closer
is the convergence to .2X).
• Case closed.
• But..
Another answer
• P tables the results accepted and rejected:
• Amount P(Accepted) P(Rejected)
•
1
.4
.1
•
10n
(½)n-1(.6)
(½)n-1(.9)
•
10N-1
(½)N-1
(½)N-1
•
10N
(½)N-1(.2)
(½)N-1(.8)
• Oops: P is rejecting larger amounts.
• If one exchanges one gets (G), and rejects (R):
Amount
P(G):P(R)
1
4:1
|
|
10n
6:9
|
|
10N-1
1:1
10N
2:8
The empirical ratios get closer to the ones listed above, the
more simulations are run.
MOST IMPORTANT: THE NUMBERS COME FROM THE
SAME TABLE SHOWING GAINS FOR EXCHANGING
Failure of empirical testing
• So, is P winning or losing?
• It is tempting to think the second test is
better– more money is piling up in the
rejected envelopes. P might therefore
decide never to exchange.
• But, surely P would exchange for X = 1!
• So P should decide to exchange for X =1
only? But exchanging on 1 and 10 would
be even better!
Egads!
• So exchanging on all X’s up to K is
dominated by the strategy of exchanging
on all X’s up to K+1!
• So P should always exchange!
• But this loses money!
• Help!!
Paradox Resolved
• Both answers are correct. The problem is that
the goal is ambiguous. Maximizing expected
gain has several interpretations:
1) Expected gain on the present trial,
given X [E|X]
2) Expected gain for playing the game,
given one uses a strategy S, where
S specifies the X at which one stops
exchanging and starts STAYING [EG|X]
• A few remarks:
• As noted E|X is higher for exchanging, for every X.
• But EG|X for the strategy ‘always exchange’ is exactly
the same as for the strategy ‘never exchange’. How is
this? The amount lost when one exchanges the highest
possible number balances the gains for all smaller
numbers to produce equality.
• But of course some strategies produce higher EG|X than
others. E.g. EG|1 is clearly not as good as EG|10: It is
obviously better to exchange a 1 for the certain gain of 9.
• Most people have a strong intuition that
the expected value of this game will be
maximized if one makes the decision for
each observed X that maximizes expected
gain for that trial.
• That this is not the case is seen in this
paradox, but is hard to believe.
• Perhaps the case is clearer in a simplified
example.
Simplified example 1
• Consider a version of the St. Petersburg
Paradox. One starts with $1, and flips a
coin. HEADS causes the current total to
triple. TAILS causes all the money to be
lost, and the game ends.
• One can play as long as one wants.
• The goal is to ‘maximize expected gain’.
• There is one exception to the above.
• Exception: The game has an upper limit,
N, unknown but chosen to be vastly
smaller than a very large number U (such
as 10**100). If the current total = U, and
the decision is made to PLAY, then
regardless of the coin flip all money is lost
and the game ends.
• The probability of guessing that the current
total is U is infinitesimally small, so the
expected value for playing stays about
1.5X.
• Hence A will always play, for every X. But this
strategy maximizes conditional expected gain on
the current trial. It minimizes expected gain for
the game, because it guarantees with probability
1.0 that all money will be lost (eventually U will
be reached).
• Clearly there the decision to exchange for each
X gives a positive expected gain for each X, but
the strategy to exchange for all X produces a
total game return of 0. It is surprisingly easy to
confuse these different expectation quantities.
– Note: Choosing a strategy to maximize expected
return for the game is a very complex matter.
• The exchange paradox presents a similar
confusion.
• Perhaps partly due to such a confusion,
some people have concluded one should
exchange always, others to conclude one
should not, others to conclude it does not
matter, and yet others to conclude no
rational strategy exists.
• Thus the exchange paradox reveals yet
another facet of the general claim that
rationality is a cognitive process, subject to
reinterpretation and reanalysis in the
context of a given problem.
• Given this, it is not too surprising that
persuasive arguments for the rationality of
one or another decision are heavily
weighted by problem framing, and by
examples.
• The finite exchange paradox also reveals
a difference that a number of researchers
have noted between ‘uncertainty’ and
‘vagueness’. The highest number N is
vague. There is no Bayesian prior we can
place on N that makes ‘sense’. If we
specify a prior it is easy to calculate a
number Y at which one would not
exchange.
• But having done so, if Y were reached,
one would not believe that stopping was
the best strategy.
• Joe Halpern (for example) has argued that
for vaguely specified situations, one might
have a set of priors that are possible,
without probabilities one can assign to
them. Then one can use some minimax
type strategy (e.g. protect against the
worst outcome) to choose which prior to
assume.
• This would not work here (stopping at 1 is
not a good idea), but it seems to be the
case that vague problems produce vague
optimization.
The ‘surprise exam’
• On day one of a twenty lecture course, the
teacher tells the class that there will be
one surprise exam, but that the class
members will not be able to predict with
certainty the occurrence of the exam on
the morning before it will occur.
• Backward Induction leads to a
contradiction: The class reasons that the
exam cannot occur on the last day,
because the exam occurrence could be
predicted. If so, then it cannot occur on the
second to last day, for the same reason.
Backward Induction continues until every
day is ruled out. Ruling out every day
seems to imply that the teacher has lied.
• But the exam does occur on, say, lecture
seven, and indeed the class is ‘surprised’.
• Indeed, ruling out every day seems to
guarantee ‘surprise’. But is this because
the teacher might have lied?
• With a little thought, both teacher and
students can see that the statement is
true, as long as some exam day ‘in the
middle’ is chosen.
• However not everyone agrees. Russ
Lyons (math here at IU) looks at the case
of one….
• “Do you agree that the number of days is
irrelevant? If so, let's take just one day. The
setter says "I will give you an exam today but
you won't know whether I will do that". If this is
false, he lies. If this is true, then it must be
because he might not give an exam today (so he
might be lying), and the truth is unknown (since
it deals with the future) or because the student is
confused/stupid/doesn't speak English/etc. If he
does give the exam, then it becomes true,
otherwise false. Of course, an alternative is that
the statement is neither true nor false. That's
probably the best; it's like saying "let S be the
set of all sets". That's meaningless as there is no
such S.“
• But is it the case that ‘truth’ is unknown?
• ‘Truth’, like ‘rationality’ may be a cognitive
construct, and socially reified.
• When the number of lectures is very large
(say 10**100 if you like), and the instructor
says the surprise exam will be on a day
vastly short of the end of the course, then
it seems clear to both instructor and
student that the statement is ‘true’, and
must be so.
• But for one lecture, there is a contradiction, and
perhaps no truth value.
• How about two lectures? Three? Four?
• At what point does the truth value change from
‘uncertain’ to ‘true’?
• Lyons would say the truth value is always
uncertain. I am less dogmatic. I see truth
ultimately being decided by the real universe we
live in. In this universe, for large N, the
statement seems true by all sensible measures.
• But at what N the transition occurs, I’m unsure.
‘Sleeping Beauty Paradox’
• (Seems to have grown from an earlier set of
paradoxes introduced by Piccione and
Rubinstein in 1995: On the interpretation of
decision problems with imperfect recall).
• One version: A coin is flipped. HEADS means
sleeping beauty (SB) is awakened Monday and
asked to estimate the probability that a HEADS
had been flipped. TAILS means SB is awakened
on both Monday and Tuesday, without memory
of any other awakenings, and asked the same
query both days.
• What should SB answer?
• Opinions seem sharply split with
vociferous defenders of both 1/2 and 1/3.
• As with Newcomb’s, many people find the
question ridiculous, but they disagree on
the answer. Other people are unsure.
• More generally, the ‘thirders’ believe that if
TAILS produces N awakenings, the
answer is 1/(N+1) for P(HEADS).
• In a slight variant, HEADS causes SB to
awaken in Room A once, and be queried.
TAILS causes SB to be awakened in
Room B twice, without memory, and be
queried each time.
• The ‘thirders’ believe that the probability of
Room A is 1/3.
• I think there is a ‘surreal’ aspect to this.
• Suppose SB is told there will be a prize of
$1,000,000 if a tails occurs and she is in Room
B, awarded if SB requests it when awoken. SB
may request whatever number of awakenings in
room B she desires, up to say, 100.
• SB thinks: This is terrific; I can increase my odds
of getting the million to 100/101 by requesting
100 wake ups.
• If this seems strange, perhaps the thirders can
argue that there is a 50% chance of the million,
but once an awakening occurs the conditional
probability of a tails is then 100/101 (sic). If so,
then SB ought to be willing to turn down an offer
of say $900,000 for her potential winnings.
• Dave Chalmers gives the following as his
argument for 1/3: The query is either Mon
or Tue. Given Monday, P(H|M) = .5. Given
Tuesday, P(H) = 0. P(H) = P(M)P(H|M) +
P(T)P(H|T) = P(M)P(H|M) + 0 = (1/2)P(M).
• Since P(M) is between 0 and 1, P(H) must
be less than .5.
• What do you think of this?
• Of course the condition changes from the
beginning to the end of this argument. At
the outset Dave says P(H|M) = ½ because
there is an equal probability of a query on
Mon following a Heads and a Tails (Heads
always produces a Monday query, and
Tails always produces a Monday query.)
But this implies P(M|H) = P(M|T) = 1.0.
Thus P(M) = 1.0. Thus the last equation
from Chalmers implies P(H) = ½.
• Of course Chalmers argues P(M) is less
than 1.0, because he considers that SB
might awaken Tuesday.
• This means we are selecting from events
on the Tails side. For this condition, the
probability of Monday given Tails is 1/2
(whereas P(M|Heads) stays at 1.0). This
means that p(H|M) = 2/3, not 1/2.
• So far, Chalmers and Terry Horgan,
among others, steadfastly believe that the
answer is 1/3 (and have written journal
articles saying so). What then is the
rational answer?
The Absent-Minded Driver
• Perhaps Sleeping Beauty is not such a paradox
(social consensus notwithstanding), but Piccione
and Rubinstein raise a more interesting issue.
• A driver is in a bar about to set off for home.
There are two exits and then a long drive to a
distant city. The driver lives at exit 2. Exit 1 is
dangerous. If he passes 2, he must take a motel
for the night in the distant city. However he is
forgetful and when reaching any exit does not
remember whether he has passed any exits
already.
• His payoff for taking exit 1 is 0, for exit 2 is
4, and for going past 2, is 1. What should
he plan to do? If he cannot choose a
probabilistic plan, he should decide to
always go, getting 1 rather than 0.
Suppose he so decides. He now finds
himself at an exit. Knowing he decided to
keep driving, he guesses there is a
probability of ½ that this is exit 2. He
therefore changes his mind and decides to
exit (for an expected gain of 0/2 + 4/2 = 2).
• Of course, this is circular. Once he
decides to change his mind, he knows he
would have changed his mind at exit 1, so
this must be exit one, so he should not
exit.
• The issue is less silly when he can decide
to exit with some probability, p. A simple
calculation shows p = 1/3 maximizes
expected gain, if applied consistently. E(G)
= (1/3)0 + (2/3)(1/3)4 + (2/3)(2/3)1 = 4/3.
• However, having chosen this strategy, he now
finds himself at an exit. He now forms an
estimate, e, of the probability that this is exit 1,
knowing his strategy. He then adjusts his
strategy. This leads to various forms of circular
adjustments and possible convergence on some
strategy.
• Strangely, a good case can be made for
deciding at an exit to exit with p = 5/9, even
though if used consistently this strategy gives a
lower E(G) than 1/3. The local decision is
sensible even the global outcome is not.
• The original article and replies and counterreplies are worth a look.
FINI
Download