Learning and teaching in games: Statistical models of human play in experiments Colin F. Camerer, Social Sciences Caltech (camerer@hss.caltech.edu) Teck Ho, Berkeley (Haas Business School) Kuan Chong, National Univ Singapore How can bounded rationality be modelled in games? Theory desiderata: Precise, general, useful (game theory), and cognitively plausible, empirically disciplined (cog sci) Three components: – Cognitive hierarchy thinking model (one parameter, creates initial conditions) – Learning model (EWA, fEWA) - Sophisticated teaching’ model (repeated games) Shameless plug: Camerer, Behavioral Game Theory (Princeton, Feb ’03) or see website hss.caltech.edu/~camerer Behavioral models use some game theory principles, and weaken other principles Principle equilibrium concept of a game strategic thinking best response mutual consistency learning strategic foresight Thinking Learning Teaching (Typical) experimental economics methods Repeated matrix stage game (Markov w/ 1 state) Repeated with “one night stand” (“stranger”) rematching protocol & feedback (to allow learning without repeated-game reputationbuilding) Game is described abstractly, payoffs are public knowledge (e.g., read out loud) Subjects paid $ according to choices (~$12/hr) Why this style? Basic question is whether S’s can “compute” equiilibrium*, not meant to be realistic Establish regularity across S’s, different game structures Statistical fitting: Parsimonious (1+ parameters) models, fit (in sample) & predict (out of sample) & compute economic value *Question now answered (No): Would be useful to move to low- information MAL designs Beauty contest game: Pick numbers [0,100] closest to (2/3)*(average number) wins relative frequencies Beauty contest results (Expansion, Financial Times, Spektrum) average 23.07 0.20 0.15 0.10 0.05 0.00 numbers 0 22 33 50 100 “Beauty contest” game (Ho, Camerer, Weigelt Amer Ec Rev 98): Pick numbers xi [0,100] Closest to (2/3)*(average number) wins $20 relative frequencies Beauty contest results (Expansion, Financial Times, Spektrum) average 23.07 0.20 0.15 0.10 0.05 0.00 numbers 22 100 50 33 num be r choice s 97 89 81 73 65 57 49 41 33 25 17 9 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 predicted frequency 0 Beauty contest results 0.3 0.25 0.2 0.15 frequency 0.1 81-90 66-70 56-60 46-50 36-40 26-30 0 16-20 6-10 0 0.05 Portfolio managers Econ PhDs CEOs Caltech students Table: Data and estimates of t in pbc games (equilibrium = 0) data steps of subjects/game mean std dev thinking game theorists 19 21.8 3.7 Caltech 23 11.1 3.0 newspaper 23 20.2 3.0 portfolio mgrs 24 16.1 2.8 econ PhD class 27 18.7 2.3 Caltech g=3 22 25.7 1.8 high school 33 18.6 1.6 1/2 mean 27 19.9 1.5 70 yr olds 37 17.5 1.1 Germany 37 20.0 1.1 CEOs 38 18.8 1.0 game p=0.7 39 24.7 1.0 Caltech g=2 22 29.9 0.8 PCC g=3 48 29.0 0.1 game p=0.9 49 24.3 0.1 PCC g=2 54 29.2 0.0 mean 1.56 median 1.30 11~20 31~40 61~70 51~60 91~100 81~90 71~80 Choices 41~50 91~100 81~90 3 21~30 51~60 0.0 5 1~10 31~40 41~50 0.1 0 11~20 21~30 71~80 Choices 61~70 0 1~10 Results 0.6 0.5 0.4 0.3 0.2 Predictions 9 7 Round 0.6 1 0.5 0.4 0.3 0.2 0.1 7 9 0 3 5 1 Round EWA learning Attraction Aij (t) for strategy j updated by A ij (t) =(Aij (t-1) + i[si(t),s-i(t)]/ ((1-)+1) (chosen j) A ij (t) =(A ij (t-1) + i[sij,s-i(t)]/ ((1- )+1) (unchosen j) logit response (softmax) Pij(t)=e^{A j i (t)}/[Σke^{A k i (t)}] key parameters: imagination (weight on foregone payoffs) decay (forgetting) or change-detection growth rate of attractions (=0 averages; =1 cumulations; =1 “lock-in” after exploration) “In nature a hybrid [species] is usually sterile, but in science the opposite is often true”-- Francis Crick ’88 Weighted fictitious play (=1, =0) Simple choice reinforcement (=0) Studies comparing EWA and other learning models Reference Amaldoss and Jain (Mgt Sci, in press) Cabrales, Nagel and Ermenter ('01) Camerer and Anderson ('99, Ec Theory) Camerer and Ho ('99, Econometrica) Camerer, Ho and Wang ('99) Camerer, Hsia and Ho (in press) Chen ('99) Haruvy and Erev (’00) Ho, Camerer and Chong ('01) Hsia (‘99) Morgan & Sefton (Games Ec Beh, '01) Rapoport and Amaldoss ('00 OBHDP, '01) Stahl ('99) Sutter et al ('01) Type of game cooperate-to-compete games stag hunt “global games” sender-receiver signaling median-action coordination 4x4 mixed-equilibrium games p-beauty contest normal form centipede sealed bid mechanism cost allocation binary risky choice decisions “continental divide” coordination price-matching patent races two-market entry games N-person call markets “unprofitable” games alliances patent races 5x5 matrix games p-beauty contest (groups, individuals) 20 estimates of learning model parameters Weighted Fictitious Play Cournot Fictitious Play Cumulative Reinforcement Cumulativ e Average Reinforcement Functional EWA learning (“EWA Lite”) Use functions of experience to create parameter values (only free parameter ) i(t) is a change detector: i(t)=1-.5[k( s-ik (t) - t=1t ss-ik(t)/t ) 2 ] Compares average of past freq’s s-i(1), s-i(2)…with s-i(t) Decay old experience (low ) if change is detected =1 when other players always repeat strategies falls after a “surprise” falls more if others have been highly variable falls less if others have been consistent =/( of Nash strategies) (creates low in mixed games) Questions: (now) Do functional values pick up differences across games? (Yes.) (later) Can function changes create sensible, rapid switching in stochastic games? Example: Price matching with loyalty rewards (Capra, Goeree, Gomez, Holt AER ‘99) Players 1, 2 pick prices [80,200] ¢ Price is P=min(P1,,P2) Low price firm earns P+R High price firm earns P-R What happens? (e.g., R=50) 191~200 181~190 171~180 161~170 151~160 141~150 131~140 121~130 111~120 101~110 91~100 81~90 80 1 3 5 Period 7 9 Empirical Frequency 0.9 0.8 0.7 0.6 0.5 Prob 0.4 0.3 0.2 0.1 0 Strategy 191~200 181~190 171~180 161~170 151~160 141~150 131~140 121~130 111~120 101~110 191~200 181~190 171~180 161~170 151~160 141~150 131~140 9 0 91~100 81~90 80 1 3 5 Period 7 9 5 121~130 111~120 101~110 91~100 81~90 1 80 3 Period 7 Empirical Frequency 0.9 0.7 0.8 0.6 0.5 0.4 Prob 0.2 0.3 0.1 Thinking fEWA Strategy 0.9 0.8 0.7 0.6 0.5 0.4 Prob 0.3 0.2 0.1 0 Strategy Teaching in repeated (partner) games Finitely-repeated trust game (Camerer & Weigelt Econometrica ‘88) lender loan no loan borrower action repay default 40,60 -100,150 10,10 1 borrower plays against 8 lenders A fraction (p(honest)) borrowers prefer to repay (controlled by experimenter) Empirical results (conditional frequencies of no loan and default) Figure b: Empirical Frequency for Default conditional on Loan (Dishonest Borrower) Figure a: Empirical Frequency for No Loan 1.0000 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 Freq 0.4000 0.3000 0.2000 0.1000 0.0000 -0.1000 0.9000 0.8000 0.7000 0.6000 Freq 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 1 2 3 Sequence 4 5 6 7 8 9 1 2 3 4 5 6 7 8 1 2 3 Period Sequence 4 5 6 7 8 9 1 2 3 4 5 6 7 8 Period Teaching in repeated trust games (Camerer, Ho, Chong J Ec Theory 02) Some (=89%) borrowers know lenders learn by fEWA Actions in t “teach” lenders what to expect in t+1 (=.93) is “peripheral vision” weight E.g. entering period 4 of sequence 17 Seq. period 16 1 2 3 4 5 6 7 Repay Repay Repay Default ..... look “peripherally” ( weight) 17 1 2 3 look back Repay No loan Repay 8 Teaching: Strategies have reputations Bayesian-Nash equilibrium: Borrowers have reputations (types) Heart of the model: Attraction of sophisticated Borrower strategy j after sequence k before period t Jt+1 is possible sequence of choices by borrower First term is expected (myopic) payoff from strategy j Second term is summation of expected payoffs in the future (undiscounted) given effect of j and optimal planned future choices (Jt+1) j A B (s; k; t) = N oL X oan j0 PL (a; k; t + 1) ¢¼B (j ; j 0) j 0= L oan + maxf Jt + 1 XT N oL X oan v= t+ 2 j 0= L oan 0 j P^L (a; k; vj j v ¡ 1 2 J t+ 1) ¢¼B (j v 2 J t+ 1; j 0) g Empirical results (top) and teaching model (bottom) Figure b: Empirical Frequency for Default conditional on Loan (Dishonest Borrower) Figure a: Empirical Frequency for No Loan 1.0000 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 Freq 0.4000 0.3000 0.2000 0.1000 0.0000 -0.1000 0.9000 0.8000 0.7000 0.6000 Freq 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 1 2 3 4 5 Sequence 6 7 8 9 1 2 3 4 5 6 7 1 2 8 Period 3 4 5 Sequence 6 7 8 9 1 2 3 4 5 6 7 8 Period Figure d: Predicted Frequency for Default conditional on Loan (Dishonest Borrower) Figure c: Predicted Frequency for No Loan 1.0000 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 Freq 0.4000 0.3000 0.2000 0.1000 0.0000 -0.1000 0.9000 0.8000 0.7000 0.6000 Freq 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 1 2 3 Sequences 4 5 6 7 8 9 1 2 3 4 5 6 7 8 Period 1 2 3 Sequences 4 5 6 7 8 9 1 2 3 4 5 6 7 8 Period Conclusions Learning ( response sensitivity) Hybrid fits & predicts well (20+ games) One-parameter fEWA fits well, easy to estimate Well-suited to Markov games because Φ means players can “relearn” if new state is quite different? Teaching ( fraction of teaching) Retains strategic foresight in repeated games with partner matching Fits trust, entry deterrence better than softmax Bayesian-Nash (aka QRE) Next? Field applications, explore low-information Markov domains… Thinking steps (parameter t) Parametric EWA learning (E’metrica ‘99) • free parameters , , , , N(0) Functional EWA learning • functions for parameters • parameter () Strategic teaching (JEcTheory ‘02) • Reputation-building w/o “types” • Two parameters (, )