Learning and the economics of small decisions

advertisement
Learning and the Economics of Small Decisions
Ido Erev, Technion
Based on a chapter written with Ernan Haruvy for the 2nd Vol. of the Handbook
of Experimental Economics edited by Kagel and Roth.
The chapter focuses on the relationship between basic learning phenomena
and mainstream behavioral economic research.
To clarify the analysis we start with replications of the basic learning
phenomena in a simplified “standard paradigm” (see Hertwig and Ortmann,
2002)
1
The clicking paradigm
The current experiment includes many trials. Your task, in each trial, is to click
on one of the two keys presented on the screen. Each click will be followed by
the presentation of the keys’ payoffs. Your payoff for the trial is the payoff of the
selected key.
0
1
You selected Right. Your payoff in this trial is 1
Had you selected Left, your payoff would be 0
Not a test of rational decision theory; the rationality assumption is not even wrong.
2
1. Underweighting of rare events (Barron & Erev, 2003)
400 trials, ¼ cent per point
S
R
P(R)
5
0
(10, .1; -1)
27
6
0
(-10, .1; +1)
56
Risk
Aversion
Risk
Seeking
Experience-Description gap (Hertwig et al, 2004)
Underestimation? (Barron & Yechiam, 2009)
Robust to prior information (Jessup et al., 2008)
Reversed certainty effect
Similar pattern in Honey Bee (Shafir et al., 2008)
Taleb’s Black Swan effect
Sensitivity to magnitude: -20 vs. -10 (Ert & Erev, 2010)
3
2. The payoff variability effect
(Myers & Sadler, 1960; Busemeyer &Townsend, 1993 ).
H
L
P(H)
1
1
0
96
2
(11, .5; -9)
0
58
0
Proportion of H
Choices
3
(9, .5; -11)
1.2
1
0.8
0.6
0.4
0.2
0
53
Risk aversion
Or Loss aversion?
Neither!!
Problem 1
Problem 2
Problem 3
1 2 3 4 5 6 7 8 9 1
Blocks of 20 trials
4
3. The Big Eye effect (Ben Zion et al., 2010, Grosskopf et al., 2006)
x ~ N(0,300), y ~ N(0, 300)
R1: x
R2: y
M: Mean(R1,R2) + 5
1
Deviation from: maximization, risk aversion, loss aversion.
Implies under-diversification
Robust to prior information
Asset M Prop
0.8
0.6
0.4
0.2
0
1
10 30 50 70 90
Trial
4. The hot stove effect (Hogarth & Einhorn, 1992; March and Denrell, 2002).
5. Surprise-triggers-change (Nevo & Erev, 2010)
Evaluation of the sequential dependency in 2-alternative studies reveals a 4-fold
recency pattern:
Problem
Proportion. Of
repeated R choices
Proportion. Of
Switches to R
0 or (+1, .9; -10)
After +1
After -10
84
69
After +1
After -10
21
31
0 or (+10, .1; -1)
After +10
After -1
60
79
After +10
After -1
23
6
This pattern violates reinforcement learning and similar “positive recency models”.
It is consistent with the observation of high correlation between price change and
volume of trade in the stock market (Karpoff, 1987), and with decrease of
compliance found after an audit (Kastlunger, Kirchler, Mittone & Pitters, 2010).
Can be captured with the assumption that surprise triggers change
6
6. High sensitivity to sequential dependencies (Biele, Erev & Ert, 2008)
S: 0 with certainty
R:1 if the State is High; -1 if the State is Low
State at t
R after
+1
R after
-1
R after
Safe (0)
0.98
0.23
0.15
State at t +1
High
Low
High
.95
.05
Low
.05
.95
7
Implications to descriptive models:
The basic learning phenomena are extremely robust: They appear to be common
to human and other animals, are consistent with stock market phenomena, and can
be easily replicated.
The current replications kept the environment fixed and focused on a single
variable: the incentive structure. Thus, it should be possible to capture these
regularities with a general model without “situation specific parameters.”
In addition, the results show important limitations of traditional reinforcement
learning models.
8
I-SAW (Inertia, Sampling and Weighting; Nevo & Erev, 2010)
Three response modes: Exploration, exploitation and inertia.
At each exploitation trial player i computes the estimated value of alternative j as:
ESV(j) = (1-wi)(Mean of sample of mi from j) + wi (Grand Mean j)
Sampling by similarity, and the very last outcome is more likely to be in the sample.
The alternative with the highest ESV is selected.
Exploration implies random choice.
Inertia implies repetition of the last choice. The probability of inertia decreases when
the outcomes are surprising. Surprise is computed by the gap between the payoff at
t, and the payoffs in the previous trials
An example of a case based decision model (Gilboa & Schmeidler, 1995 and see
related ideas in Kareev, 2000; Osborne and Rubinstein, 1998; Gonzalez et al., 2003)
9
Choice prediction competitions (Erev, Ert & Roth, 2008, 2010)
1. Individual choice tasks http://tx.technion.ac.il/~eyalert/Comp.html
The task: Predicting the proportion of risky choices in binary choice task in the
clicking paradigm without information concerning forgone payoffs.
Two studies (estimation and competition) each with 60 conditions We published the
estimation, and challenge other researchers to predict the result of the second.
The models were rank based on their squared error.
The best baseline is a predecessor of I-SAW.
The winning submission, submitted by Stewart, West & Lebiere is based on a
similar instance based (“episodic”) logic (with a quantification in ACT-R).
Reinforcement learning and similar “semantic” models did not do well.
10
2. Market Entry game http://tx.technion.ac.il/~eyalert/Comp.html
The task: Predicting behavior in four person market game.
The best baseline is a predecessor of I-SAW.
The winning submission, submitted by Chen et al. is a version of I-SAW.
Reinforcement learning and similar “semantic” models did not do well.
11
A second look at the experience-description gap (Marchiori, Di Guida & Erev)
The tendency to overweight rare events in decisions from description (the pattern
captured by PT) may not be a reflection of a distinct decision process.
It can be a product of the nature of past experiences in similar situations.
For example, when the agents is asked to choose between:
S: -5 with certainty
R: -5000 with probability 1/1000
She recalls past experiences with events estimated with similarly low probabilities.
Previous research (e.g., Erev et al., 1994) suggests that events estimated with 1/1000
occurs with much higher probability (around 1/10). Thus, the reliance of these
experiences can lead to the pattern predicted by prospect theory.
12
2. The effect of social interaction and prior information
1. In certain situations the additional complexity have limited effect:
Constant sum game (Erev & Roth, 1998)
The Market Entry Game competition.
2. In some cases the prior information affects the learning process.
Reciprocation in repeated prisoner dilemma game
3. The experience-description gap in games.
Other regarding preferences and the mythical fixed pie
13
Interpersonal conflicts (Erev & Greiner, 2010)
S
B1
B2
B3
E
S
10, 5
9, 0
9, 0
9, 0
9, 0
B1
0, 4
0, 0
0, 0
0, 0
0, 0
B2
0, 4
0, 0
0, 0
0, 0
0, 0
B3
0, 4
0, 0
0, 0
0, 0
0, 0
E
0, 4
0, 0
0, 0
0, 0
12, 12
When the game is played with fixed matching, known payoff matrix, and
noiseless feedback, players select the fair and efficient outcome (E, E).
Violation of one of these condition leads to the (S, S) as predicted by I-SAW and
similar models.
Optimistic implications from a different story
14
3. The economics of small decisions
The experimental paradigms considered here focus on small decisions from
experience: The expected stakes were very low (a few cents or less per choice),
and the decision makers did not spend more than a few seconds on each choice.
We believe that this set of paradigms is not just a good test bed for basic learning
phenomena, it is also a good simulation of natural environments in which
experience is likely to shape economic behavior. In many of these environments
small decisions from experience can lead to consequential outcomes.
15
1. Gentle COP: Enforcement of safety rules (Erev & Rodansky, 2004)
Enforcement is necessary
Workers like enforcement programs
Probability is more important than magnitude
Large punishments are too costly, therefore, gentle enforcement can be optimal
Safety Climate (Zohar, 1980; Zohar and Luria, 1994)
16
2. Car recall (Barron, Leider & Stack, 2008)
3. The decision to explore and the NCAA (Gopher et al., 1989)
Two teams in 2005/6 and 2006/7
Memphis and U of Florida
Ten additional teams in
2007/8 including Kansas
The world's first brain training tool
for basketball players
17
4. Stock market patterns
Black Swan
Insufficient investment in the stock market,
and insufficient diversification.
High correlation between Price change and volume of trade in the
following day
18
4. Summary
Many of the classical properties of human and animal learning can be reliably
reproduced in the easy to run (and to model) clicking paradigm.
The main results can be captured with models that assume best reply to small
samples of experiences in similar cases. The implied behavioral processes are
evolutionary reasonable, but can lead to robust deviations from maximization in
relatively static environments.
These simple models fail when the description of the game suggests easy and
efficient super-game strategies. The clearest example is reciprocation in the
prisoner dilemma game with full information, fixed matching, low noise. We believe
that this set of situations is interesting but small and overrated.
The current understanding of decisions from experience is sufficient to shed light on
many natural problems.
19
Download