On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis Bounded Rationality In Economic Theory computation is usually not considered as a pricey resource. Therefore it sometimes predicts that rational agents will invest a large amount of computation for small payoffs. If we bound the ability of agents to compute, new solutions might appear. The Prisoner’s Dilemma II I C D C 3,3 0,4 D 4,0 1,1 D strictly dominates C therefore: The Only Nash Equilibrium is (D,D). Problem: This contradicts natural human behavior. The n-round Prisoner’s Dilemma Play the game n times. Compute the average outcome. A strategy depends upon the history of the game. Lemma: The only equilibrium is {D,D}n. Proof: Playing D in the last round dominates any other strategy. Use a backwards induction argument. Paretto Optimal CD Individually Rational CC DD DC Threat Point Goal: Use bounded rationality in order to find an equilibrium which approximates the collaborating outcome. Find an equilibrium in which both players play ‘C’ in most of the rounds. Main Idea Limit the computational powers of the players in such a way that a new equilibrium would emerge. The players would be too ‘dumb’ to count the number of rounds therefore would not be able to defect in the last round. Use the fact that the number of strategies is double exponential in the number of rounds. Model Of Computation • The model of computation is automata. • Each pure strategy would be played by a single automaton. • A mixed strategy is a distribution over automata • The resource we limit is the number of states an automaton is aloud to have. Automata Theory - Basics The Automata consists of States and Transitions. The States represent the Output, i.e the strategy played by the player. The transitions represent the Input, i.e the strategy played by the opponent. Example: Tit for Tat D C C C D D Outline of Argument • Limit the number of states an automaton is aloud to have. • Present two complex strategies, in which any deviation is retaliated by defecting forever. • Show that playing the strategies utilizes all of the states, thus no states are left to count the rounds. Question: What should be the bounds on the number of states? Trivial Lower Bound Lemma: An automaton with less then n states, cannot count to n. Corollary: If both of the automata are bounded to have less then n states, then ‘Tit for Tat’ is an equilibrium. If one of the automata has less then n states, then ‘Tit for Tat’ is an equilibrium when the ‘smart’ player defects in the last round. Upper Bound Theorem 1: If both size bounds are at least 2n-1, then, the only equilibrium is the one in which both players defect in all the rounds. Proof: Bottom up dynamic programming on the decision tree of each player. Main Result If at least one of the state bounds in the nround prisoner’s dilemma is bounded by 2O(εn) , then, for large enough n, there is a (mixed) equilibrium with average payoff for each player at least 3-ε. Equilibrium – General Description • The strategies are mixed. The support of each, consists of 2d pure strategies, represented by a string {C,D}d. d<<n. • First the players will exchange their strings. • The players will collaborate, while periodically make sure that the opponent remembers their string. • A deviation would be retaliated by defecting for ever. The Equilibrium Automaton d announce Cooperation + Checking All states contain also a retaliating edge that leads to an ever defecting state. Technical Points I Problem: Some ‘business cards’ are more advantageous than others. Therefore it is better to have a ‘business card’ with many ‘D’. Solution: After the first segment, comes a ‘fix up’ segment: For i=1…d rounds both players play C iff the i’th letter in both their cards is C. Now in the first 2d steps, the average payoff is 1.75, for every business card. The Equilibrium Automaton d Announce d Fix up Cooperation + Checking Technical Points II Problem: The Checking segment might not be fair, making one business card more advantageous then another. Solution: The Checking segment is done by xoring the C’s and the D’s. Thus on average the payments are the same for all business cards. Technical Points III Problem: If the opponent is playing according to his equilibrium strategy, then he will never encounter the punitive edge of a state. A ‘cheating’ automaton might be able to use this in order to save states, and then defect in the last round. Suppose player 1 has a state q in which he plays C and expects C, and another state p in which he plays C and expects D. A cheating player can unite both states and save a state. The ‘Honest’ Player’s Automaton D C C C p q D D C Punitive State. The ‘Cheating’ Player’s Automaton C C p D The Solution After the business card exchange, both players play in unison, i.e. each player, when plays C expects C and when plays D expects D. Therefore the problematic scenario does not occur. Generalization Let G be an arbitrary game and let p = (p1,p2) be a point in the individually rational region realized by pure strategies. For every ε>0, there is c>0 , N>0 such that for all n>N, in the n round repeated game G played by automata. If at least one of the automata is bounded by 2cn then there exists an equilibrium with average payoff of at least p-ε.