Learning and the Economics of Small Decisions Ido Erev, Technion Based on a chapter written with Ernan Haruvy for the 2nd Vol. of the Handbook of Experimental Economics edited by Kagel and Roth. The chapter focuses on the relationship between basic learning phenomena and mainstream behavioral economic research. To clarify the analysis we start with replications of the basic learning phenomena in a simplified “standard paradigm” (see Hertwig and Ortmann, 2002) 1 The clicking paradigm The current experiment includes many trials. Your task, in each trial, is to click on one of the two keys presented on the screen. Each click will be followed by the presentation of the keys’ payoffs. Your payoff for the trial is the payoff of the selected key. 0 1 You selected Right. Your payoff in this trial is 1 Had you selected Left, your payoff would be 0 Not a test of rational decision theory; the rationality assumption is not even wrong. 2 1. Underweighting of rare events (Barron & Erev, 2003) 400 trials, ¼ cent per point S R P(R) 5 0 (10, .1; -1) 27 6 0 (-10, .1; +1) 56 Risk Aversion Risk Seeking Experience-Description gap (Hertwig et al, 2004) Underestimation? (Barron & Yechiam, 2009) Robust to prior information (Jessup et al., 2008) Reversed certainty effect Similar pattern in Honey Bee (Shafir et al., 2008) Taleb’s Black Swan effect Sensitivity to magnitude: -20 vs. -10 (Ert & Erev, 2010) 3 2. The payoff variability effect (Myers & Sadler, 1960; Busemeyer &Townsend, 1993 ). H L P(H) 1 1 0 96 2 (11, .5; -9) 0 58 0 Proportion of H Choices 3 (9, .5; -11) 1.2 1 0.8 0.6 0.4 0.2 0 53 Risk aversion Or Loss aversion? Neither!! Problem 1 Problem 2 Problem 3 1 2 3 4 5 6 7 8 9 1 Blocks of 20 trials 4 3. The Big Eye effect (Ben Zion et al., 2010, Grosskopf et al., 2006) x ~ N(0,300), y ~ N(0, 300) R1: x R2: y M: Mean(R1,R2) + 5 1 Deviation from: maximization, risk aversion, loss aversion. Implies under-diversification Robust to prior information Asset M Prop 0.8 0.6 0.4 0.2 0 1 10 30 50 70 90 Trial 4. The hot stove effect (Hogarth & Einhorn, 1992; March and Denrell, 2002). 5. Surprise-triggers-change (Nevo & Erev, 2010) Evaluation of the sequential dependency in 2-alternative studies reveals a 4-fold recency pattern: Problem Proportion. Of repeated R choices Proportion. Of Switches to R 0 or (+1, .9; -10) After +1 After -10 84 69 After +1 After -10 21 31 0 or (+10, .1; -1) After +10 After -1 60 79 After +10 After -1 23 6 This pattern violates reinforcement learning and similar “positive recency models”. It is consistent with the observation of high correlation between price change and volume of trade in the stock market (Karpoff, 1987), and with decrease of compliance found after an audit (Kastlunger, Kirchler, Mittone & Pitters, 2010). Can be captured with the assumption that surprise triggers change 6 6. High sensitivity to sequential dependencies (Biele, Erev & Ert, 2008) S: 0 with certainty R:1 if the State is High; -1 if the State is Low State at t R after +1 R after -1 R after Safe (0) 0.98 0.23 0.15 State at t +1 High Low High .95 .05 Low .05 .95 7 Implications to descriptive models: The basic learning phenomena are extremely robust: They appear to be common to human and other animals, are consistent with stock market phenomena, and can be easily replicated. The current replications kept the environment fixed and focused on a single variable: the incentive structure. Thus, it should be possible to capture these regularities with a general model without “situation specific parameters.” In addition, the results show important limitations of traditional reinforcement learning models. 8 I-SAW (Inertia, Sampling and Weighting; Nevo & Erev, 2010) Three response modes: Exploration, exploitation and inertia. At each exploitation trial player i computes the estimated value of alternative j as: ESV(j) = (1-wi)(Mean of sample of mi from j) + wi (Grand Mean j) Sampling by similarity, and the very last outcome is more likely to be in the sample. The alternative with the highest ESV is selected. Exploration implies random choice. Inertia implies repetition of the last choice. The probability of inertia decreases when the outcomes are surprising. Surprise is computed by the gap between the payoff at t, and the payoffs in the previous trials An example of a case based decision model (Gilboa & Schmeidler, 1995 and see related ideas in Kareev, 2000; Osborne and Rubinstein, 1998; Gonzalez et al., 2003) 9 Choice prediction competitions (Erev, Ert & Roth, 2008, 2010) 1. Individual choice tasks http://tx.technion.ac.il/~eyalert/Comp.html The task: Predicting the proportion of risky choices in binary choice task in the clicking paradigm without information concerning forgone payoffs. Two studies (estimation and competition) each with 60 conditions We published the estimation, and challenge other researchers to predict the result of the second. The models were rank based on their squared error. The best baseline is a predecessor of I-SAW. The winning submission, submitted by Stewart, West & Lebiere is based on a similar instance based (“episodic”) logic (with a quantification in ACT-R). Reinforcement learning and similar “semantic” models did not do well. 10 2. Market Entry game http://tx.technion.ac.il/~eyalert/Comp.html The task: Predicting behavior in four person market game. The best baseline is a predecessor of I-SAW. The winning submission, submitted by Chen et al. is a version of I-SAW. Reinforcement learning and similar “semantic” models did not do well. 11 A second look at the experience-description gap (Marchiori, Di Guida & Erev) The tendency to overweight rare events in decisions from description (the pattern captured by PT) may not be a reflection of a distinct decision process. It can be a product of the nature of past experiences in similar situations. For example, when the agents is asked to choose between: S: -5 with certainty R: -5000 with probability 1/1000 She recalls past experiences with events estimated with similarly low probabilities. Previous research (e.g., Erev et al., 1994) suggests that events estimated with 1/1000 occurs with much higher probability (around 1/10). Thus, the reliance of these experiences can lead to the pattern predicted by prospect theory. 12 2. The effect of social interaction and prior information 1. In certain situations the additional complexity have limited effect: Constant sum game (Erev & Roth, 1998) The Market Entry Game competition. 2. In some cases the prior information affects the learning process. Reciprocation in repeated prisoner dilemma game 3. The experience-description gap in games. Other regarding preferences and the mythical fixed pie 13 Interpersonal conflicts (Erev & Greiner, 2010) S B1 B2 B3 E S 10, 5 9, 0 9, 0 9, 0 9, 0 B1 0, 4 0, 0 0, 0 0, 0 0, 0 B2 0, 4 0, 0 0, 0 0, 0 0, 0 B3 0, 4 0, 0 0, 0 0, 0 0, 0 E 0, 4 0, 0 0, 0 0, 0 12, 12 When the game is played with fixed matching, known payoff matrix, and noiseless feedback, players select the fair and efficient outcome (E, E). Violation of one of these condition leads to the (S, S) as predicted by I-SAW and similar models. Optimistic implications from a different story 14 3. The economics of small decisions The experimental paradigms considered here focus on small decisions from experience: The expected stakes were very low (a few cents or less per choice), and the decision makers did not spend more than a few seconds on each choice. We believe that this set of paradigms is not just a good test bed for basic learning phenomena, it is also a good simulation of natural environments in which experience is likely to shape economic behavior. In many of these environments small decisions from experience can lead to consequential outcomes. 15 1. Gentle COP: Enforcement of safety rules (Erev & Rodansky, 2004) Enforcement is necessary Workers like enforcement programs Probability is more important than magnitude Large punishments are too costly, therefore, gentle enforcement can be optimal Safety Climate (Zohar, 1980; Zohar and Luria, 1994) 16 2. Car recall (Barron, Leider & Stack, 2008) 3. The decision to explore and the NCAA (Gopher et al., 1989) Two teams in 2005/6 and 2006/7 Memphis and U of Florida Ten additional teams in 2007/8 including Kansas The world's first brain training tool for basketball players 17 4. Stock market patterns Black Swan Insufficient investment in the stock market, and insufficient diversification. High correlation between Price change and volume of trade in the following day 18 4. Summary Many of the classical properties of human and animal learning can be reliably reproduced in the easy to run (and to model) clicking paradigm. The main results can be captured with models that assume best reply to small samples of experiences in similar cases. The implied behavioral processes are evolutionary reasonable, but can lead to robust deviations from maximization in relatively static environments. These simple models fail when the description of the game suggests easy and efficient super-game strategies. The clearest example is reciprocation in the prisoner dilemma game with full information, fixed matching, low noise. We believe that this set of situations is interesting but small and overrated. The current understanding of decisions from experience is sufficient to shed light on many natural problems. 19