On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis

advertisement
On Bounded Rationality and
Computational Complexity
Christos Papadimitriou and Mihallis
Yannakakis
Bounded Rationality
In Economic Theory computation is usually not
considered as a pricey resource. Therefore it
sometimes predicts that rational agents will invest
a large amount of computation for small payoffs.
If we bound the ability of agents to compute, new
solutions might appear.
The Prisoner’s Dilemma
II
I
C
D
C
3,3
0,4
D
4,0
1,1
D strictly dominates C therefore:
The Only Nash Equilibrium is (D,D).
Problem: This contradicts natural human behavior.
The n-round Prisoner’s Dilemma
Play the game n times. Compute the average
outcome.
A strategy depends upon the history of the game.
Lemma: The only equilibrium is {D,D}n.
Proof: Playing D in the last round dominates any other
strategy. Use a backwards induction argument.
Paretto Optimal
CD
Individually
Rational
CC
DD
DC
Threat Point
Goal:
Use bounded rationality in order to find an
equilibrium which approximates the collaborating
outcome.
Find an equilibrium in which both players play ‘C’ in
most of the rounds.
Main Idea
Limit the computational powers of the players in
such a way that a new equilibrium would emerge.
The players would be too ‘dumb’ to count the
number of rounds therefore would not be able to
defect in the last round.
Use the fact that the number of strategies is double
exponential in the number of rounds.
Model Of Computation
• The model of computation is automata.
• Each pure strategy would be played by a single
automaton.
• A mixed strategy is a distribution over automata
• The resource we limit is the number of states an
automaton is aloud to have.
Automata Theory - Basics
The Automata consists of States and Transitions.
The States represent the Output, i.e the strategy
played by the player. The transitions represent the
Input, i.e the strategy played by the opponent.
Example: Tit for Tat
D
C
C
C
D
D
Outline of Argument
• Limit the number of states an automaton is aloud to
have.
• Present two complex strategies, in which any
deviation is retaliated by defecting forever.
• Show that playing the strategies utilizes all of the
states, thus no states are left to count the rounds.
Question: What should be the bounds on the
number of states?
Trivial Lower Bound
Lemma: An automaton with less then n states, cannot
count to n.
Corollary: If both of the automata are bounded
to have less then n states, then ‘Tit for Tat’ is
an equilibrium.
If one of the automata has less then n states,
then ‘Tit for Tat’ is an equilibrium when the
‘smart’ player defects in the last round.
Upper Bound
Theorem 1: If both size bounds are at least 2n-1,
then, the only equilibrium is the one in which both
players defect in all the rounds.
Proof: Bottom up dynamic programming on the
decision tree of each player.
Main Result
If at least one of the state bounds in the nround prisoner’s dilemma is bounded by
2O(εn) , then, for large enough n, there is a
(mixed) equilibrium with average payoff for
each player at least 3-ε.
Equilibrium – General Description
• The strategies are mixed. The support of each,
consists of 2d pure strategies, represented by a
string {C,D}d. d<<n.
• First the players will exchange their strings.
• The players will collaborate, while periodically
make sure that the opponent remembers their
string.
• A deviation would be retaliated by defecting for
ever.
The Equilibrium Automaton
d
announce
Cooperation
+
Checking
All states contain also a retaliating edge that leads
to an ever defecting state.
Technical Points I
Problem: Some ‘business cards’ are more
advantageous than others. Therefore it is better to
have a ‘business card’ with many ‘D’.
Solution: After the first segment, comes a ‘fix up’
segment: For i=1…d rounds both players play C iff
the i’th letter in both their cards is C.
Now in the first 2d steps, the average payoff is 1.75,
for every business card.
The Equilibrium Automaton
d
Announce
d
Fix up
Cooperation
+
Checking
Technical Points II
Problem: The Checking segment might not be fair,
making one business card more advantageous then
another.
Solution: The Checking segment is done by xoring
the C’s and the D’s. Thus on average the payments
are the same for all business cards.
Technical Points III
Problem: If the opponent is playing according to his
equilibrium strategy, then he will never encounter
the punitive edge of a state.
A ‘cheating’ automaton might be able to use this in
order to save states, and then defect in the last
round.
Suppose player 1 has a state q in which he plays C
and expects C, and another state p in which he
plays C and expects D. A cheating player can
unite both states and save a state.
The ‘Honest’ Player’s Automaton
D
C
C
C
p
q
D
D
C
Punitive State.
The ‘Cheating’ Player’s Automaton
C
C
p
D
The Solution
After the business card exchange, both players play
in unison, i.e. each player, when plays C expects
C and when plays D expects D.
Therefore the problematic scenario does not occur.
Generalization
Let G be an arbitrary game and let p = (p1,p2) be a
point in the individually rational region realized
by pure strategies. For every ε>0, there is c>0 ,
N>0 such that for all n>N, in the n round repeated
game G played by automata.
If at least one of the automata is bounded by 2cn then
there exists an equilibrium with average payoff of
at least p-ε.
Download