Learning and teaching in games: Statistical models

advertisement
Learning and teaching in games: Statistical models
of human play in experiments
Colin F. Camerer, Social Sciences Caltech (camerer@hss.caltech.edu)
Teck Ho, Berkeley (Haas Business School)
Kuan Chong, National Univ Singapore
How can bounded rationality be modelled in games?
Theory desiderata: Precise, general, useful (game theory),
and cognitively plausible, empirically disciplined (cog sci)
Three components:
– Cognitive hierarchy thinking model (one parameter, creates
initial conditions)
– Learning model (EWA, fEWA)
- Sophisticated teaching’ model (repeated games)
Shameless plug: Camerer, Behavioral Game Theory (Princeton, Feb ’03)
or see website hss.caltech.edu/~camerer
Behavioral models use some game theory
principles, and weaken other principles
Principle
equilibrium
concept of a game

strategic thinking

best response

mutual consistency 
learning
strategic foresight

Thinking



Learning


Teaching





(Typical) experimental economics methods
Repeated matrix stage game (Markov w/ 1 state)
Repeated with “one night stand” (“stranger”) rematching protocol &
feedback (to allow learning without repeated-game reputationbuilding)
Game is described abstractly, payoffs are public knowledge (e.g.,
read out loud)
Subjects paid $ according to choices (~$12/hr)
Why this style? Basic question is whether S’s can “compute”
equiilibrium*, not meant to be realistic
Establish regularity across S’s, different game structures
Statistical fitting: Parsimonious (1+ parameters) models, fit (in
sample) & predict (out of sample) & compute economic value
*Question now answered (No): Would be useful to move to low-
information MAL designs
Beauty contest game: Pick numbers [0,100]
closest to (2/3)*(average number) wins
relative
frequencies
Beauty contest results (Expansion,
Financial Times, Spektrum)
average 23.07
0.20
0.15
0.10
0.05
0.00
numbers
0
22
33
50
100
“Beauty contest” game (Ho, Camerer, Weigelt
Amer Ec Rev 98):
Pick numbers xi [0,100]
Closest to (2/3)*(average number) wins $20
relative
frequencies
Beauty contest results (Expansion,
Financial Times, Spektrum)
average 23.07
0.20
0.15
0.10
0.05
0.00
numbers
22
100
50
33
num be r choice s
97
89
81
73
65
57
49
41
33
25
17
9
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1
predicted frequency
0
Beauty contest results
0.3
0.25
0.2
0.15 frequency
0.1
81-90
66-70
56-60
46-50
36-40
26-30
0
16-20
6-10
0
0.05
Portfolio managers
Econ PhDs
CEOs
Caltech students
Table: Data and estimates of t in pbc games
(equilibrium = 0)
data
steps of
subjects/game
mean
std dev
thinking
game theorists
19
21.8
3.7
Caltech
23
11.1
3.0
newspaper
23
20.2
3.0
portfolio mgrs
24
16.1
2.8
econ PhD class
27
18.7
2.3
Caltech g=3
22
25.7
1.8
high school
33
18.6
1.6
1/2 mean
27
19.9
1.5
70 yr olds
37
17.5
1.1
Germany
37
20.0
1.1
CEOs
38
18.8
1.0
game p=0.7
39
24.7
1.0
Caltech g=2
22
29.9
0.8
PCC g=3
48
29.0
0.1
game p=0.9
49
24.3
0.1
PCC g=2
54
29.2
0.0
mean
1.56
median
1.30
11~20
31~40
61~70
51~60
91~100
81~90
71~80
Choices
41~50
91~100
81~90
3
21~30
51~60
0.0
5
1~10
31~40
41~50
0.1
0
11~20
21~30
71~80
Choices
61~70
0
1~10
Results
0.6
0.5
0.4
0.3
0.2
Predictions
9
7
Round
0.6
1
0.5
0.4
0.3
0.2
0.1
7
9
0
3
5
1
Round
EWA learning
Attraction Aij (t) for strategy j updated by
A ij (t) =(Aij (t-1) + i[si(t),s-i(t)]/ ((1-)+1)
(chosen j)
A ij (t) =(A ij (t-1) +  i[sij,s-i(t)]/ ((1- )+1)
(unchosen j)
logit response (softmax) Pij(t)=e^{A
j
i
(t)}/[Σke^{A
k
i
(t)}]
key parameters:
 imagination (weight on foregone payoffs)
 decay (forgetting) or change-detection
 growth rate of attractions (=0  averages; =1
cumulations; =1 “lock-in” after exploration)
“In nature a hybrid [species] is usually sterile, but in science the
opposite is often true”-- Francis Crick ’88
Weighted fictitious play (=1, =0)
Simple choice reinforcement (=0)
Studies comparing EWA and other learning models
Reference
Amaldoss and Jain (Mgt Sci, in press)
Cabrales, Nagel and Ermenter ('01)
Camerer and Anderson ('99, Ec
Theory)
Camerer and Ho ('99, Econometrica)
Camerer, Ho and Wang ('99)
Camerer, Hsia and Ho (in press)
Chen ('99)
Haruvy and Erev (’00)
Ho, Camerer and Chong ('01)
Hsia (‘99)
Morgan & Sefton (Games Ec Beh,
'01)
Rapoport and Amaldoss ('00
OBHDP, '01)
Stahl ('99)
Sutter et al ('01)
Type of game
cooperate-to-compete games
stag hunt “global games”
sender-receiver signaling
median-action coordination
4x4 mixed-equilibrium games
p-beauty contest
normal form centipede
sealed bid mechanism
cost allocation
binary risky choice decisions
“continental divide” coordination
price-matching
patent races
two-market entry games
N-person call markets
“unprofitable” games
alliances
patent races
5x5 matrix games
p-beauty contest (groups,
individuals)
20 estimates of learning model parameters
Weighted Fictitious
Play
Cournot
Fictitious
Play
Cumulative
Reinforcement
Cumulativ
e
Average Reinforcement
Functional EWA learning (“EWA Lite”)
Use functions of experience to create parameter values (only free
parameter )
i(t) is a change detector:
i(t)=1-.5[k( s-ik (t) - t=1t ss-ik(t)/t ) 2 ]
Compares average of past freq’s s-i(1), s-i(2)…with s-i(t)
 Decay old experience (low ) if change is detected
=1 when other players always repeat strategies
 falls after a “surprise”
falls more if others have been highly variable
falls less if others have been consistent
=/( of Nash strategies) (creates low  in mixed games)
Questions:
(now) Do functional values pick up differences
across games? (Yes.)
(later) Can function changes create sensible, rapid switching in
stochastic games?
Example: Price matching with loyalty
rewards (Capra, Goeree, Gomez, Holt AER ‘99)
Players 1, 2 pick prices [80,200] ¢
Price is P=min(P1,,P2)
Low price firm earns P+R
High price firm earns P-R
What happens? (e.g., R=50)
191~200
181~190
171~180
161~170
151~160
141~150
131~140
121~130
111~120
101~110
91~100
81~90
80
1
3
5
Period
7
9
Empirical Frequency
0.9
0.8
0.7
0.6
0.5
Prob
0.4
0.3
0.2
0.1
0
Strategy
191~200
181~190
171~180
161~170
151~160
141~150
131~140
121~130
111~120
101~110
191~200
181~190
171~180
161~170
151~160
141~150
131~140
9
0
91~100
81~90
80
1
3
5
Period
7
9
5
121~130
111~120
101~110
91~100
81~90
1
80
3
Period
7
Empirical Frequency
0.9
0.7
0.8
0.6
0.5
0.4
Prob
0.2
0.3
0.1
Thinking fEWA
Strategy
0.9
0.8
0.7
0.6
0.5
0.4
Prob
0.3
0.2
0.1
0
Strategy
Teaching in repeated (partner) games
Finitely-repeated trust game
(Camerer &
Weigelt Econometrica ‘88)
lender
loan
no loan
borrower action
repay
default
40,60
-100,150
10,10
1 borrower plays against 8 lenders
A fraction (p(honest)) borrowers prefer to
repay (controlled by experimenter)
Empirical results (conditional
frequencies of no loan and default)
Figure b: Empirical Frequency for Default conditional
on Loan (Dishonest Borrower)
Figure a: Empirical Frequency for No Loan
1.0000
1.0000
0.9000
0.8000
0.7000
0.6000
0.5000
Freq
0.4000
0.3000
0.2000
0.1000
0.0000
-0.1000
0.9000
0.8000
0.7000
0.6000
Freq 0.5000
0.4000
0.3000
0.2000
0.1000
0.0000
1
2
3
Sequence
4
5
6
7
8
9 1
2
3
4
5
6
7
8
1 2
3
Period
Sequence
4
5
6
7
8
9 1
2
3
4
5
6
7
8
Period
Teaching in repeated trust games
(Camerer, Ho, Chong J Ec Theory 02)
Some (=89%) borrowers know lenders learn by fEWA
Actions in t “teach” lenders what to expect in t+1
 (=.93) is “peripheral vision” weight
E.g. entering period 4 of sequence 17
Seq.
period
16
1
2
3
4
5
6
7
Repay Repay Repay
Default
.....

look “peripherally” ( weight)
17
1
2
3  look back
Repay
No loan Repay
8
Teaching:
Strategies have reputations
Bayesian-Nash equilibrium: Borrowers have reputations (types)
Heart of the model:
Attraction of sophisticated Borrower strategy j after sequence k before
period t
Jt+1 is possible sequence of choices by borrower
First term is expected (myopic) payoff from strategy j
Second term is summation of expected payoffs in the future (undiscounted)
given effect of j and optimal planned future choices (Jt+1)
j
A B (s; k; t)
=
N oL
X oan
j0
PL (a; k; t
+ 1) ¢¼B (j ; j 0)
j 0= L oan
+ maxf
Jt + 1
XT
N oL
X oan
v= t+ 2 j 0= L oan
0
j
P^L (a; k; vj j v ¡ 1
2 J t+ 1) ¢¼B (j v 2 J t+ 1; j 0) g
Empirical results (top) and
teaching model (bottom)
Figure b: Empirical Frequency for Default conditional
on Loan (Dishonest Borrower)
Figure a: Empirical Frequency for No Loan
1.0000
1.0000
0.9000
0.8000
0.7000
0.6000
0.5000
Freq
0.4000
0.3000
0.2000
0.1000
0.0000
-0.1000
0.9000
0.8000
0.7000
0.6000
Freq 0.5000
0.4000
0.3000
0.2000
0.1000
0.0000
1
2
3
4
5
Sequence
6
7
8
9
1
2
3
4
5
6
7
1 2
8
Period
3
4
5
Sequence
6
7
8
9 1
2
3
4
5
6
7
8
Period
Figure d: Predicted Frequency for Default conditional
on Loan (Dishonest Borrower)
Figure c: Predicted Frequency for No Loan
1.0000
1.0000
0.9000
0.8000
0.7000
0.6000
0.5000
Freq
0.4000
0.3000
0.2000
0.1000
0.0000
-0.1000
0.9000
0.8000
0.7000
0.6000
Freq 0.5000
0.4000
0.3000
0.2000
0.1000
0.0000
1
2
3
Sequences
4
5
6
7
8
9
1
2
3
4
5
6
7
8
Period
1 2
3
Sequences
4
5
6
7
8
9
1
2
3
4
5
6
7
8
Period
Conclusions
Learning ( response sensitivity)
Hybrid fits & predicts well (20+ games)
One-parameter fEWA fits well, easy to estimate
Well-suited to Markov games because Φ means players
can “relearn” if new state is quite different?
Teaching ( fraction of teaching)
Retains strategic foresight in repeated games with
partner matching
Fits trust, entry deterrence better than softmax
Bayesian-Nash (aka QRE)
Next?
Field applications, explore low-information Markov
domains…
Thinking steps
(parameter t)
Parametric EWA learning (E’metrica ‘99)
• free parameters , , , , N(0)
Functional EWA learning
• functions for parameters
• parameter ()
Strategic teaching (JEcTheory ‘02)
• Reputation-building w/o “types”
• Two parameters (, )
Download