ELICITING RISK PREFERENCES USING CHOICE LISTS DAVID FREEMAN, YORAM HALEVY AND TERRI KNEELAND Abstract. We experimentally study the eect of embedding pairwise choices be- tween lotteries within a choice list on measured risk attitude. Subjects choose the riskier lottery signicantly more often when responding to a choice list. This failure of incentive compatibility can be rationalized by the interaction between non-expected utility and the random incentive system, as suggested by Karni and Safra (1987). Keywords: random incentive system, isolation, independence axiom, multiple price list, reduction of compound lotteries, preference reversals. 1. Introduction A preference relation is, by denition, a binary relation over alternatives. As such, the gold standard for revealing preferences through choice is the observation of a single pairwise choice. However, such an experiment provides very limited information about preferences for example, it cannot reveal a subject's certainty equivalent of a lottery. To elicit ner information about preferences, a common experimental practice presents to a subject a sequence of related pairwise choices arranged in a list, known as a Choice List (or Multiple Price List). A randomization device is used to pick one decision to determine the subject's payment, a procedure known as the Random Incentive System (RIS). In recent years, choice lists have become the workhorse method in experimental economics to measure individual preferences. Usually, each pairwise choice a subject makes in a list is interpreted as if she had faced only a single binary choice. This paper investigates whether subjects' choices are inuenced by whether or not they are embedded in a choice list. In one group of treatments, subjects respond to one or two choice lists. In a second group of treatments, subjects make a single (or Date : June 8, 2015 (rst version May 2012). We are grateful for comments from audiences at CEA 2012 in Calgary, SITE Experimental Economics session 2012, ESA North American Winter Meetings 2012 and 2013 (special panel discussion on incentives in experiments), FUR 2014 in Rotterdam, Decisions: Theory, Experiments and Applications (D-TEA) 2014 at HEC Paris, SEA 2014 in Atlanta, CEA 2015 in Toronto, and M-BEES 2015. The research reported in this paper was conducted under UBC Behavioural Reserch Ethics Board certicate of approval H11-01719. Financial support from SSHRC is gratefully acknowledged. 1 ELICITING RISK PREFERENCES USING CHOICE LISTS 2 two) pairwise choice(s); each such choice corresponds exactly to a single line of the corresponding choice list. We nd that embedding a pairwise choice in a choice list increases the fraction of subjects choosing the riskier lottery from 23% to 45% when the safer alternative is certain (p < .001), but does not signicantly aect choices when the safer alternative (p = .17). This suggests that embedding choices in a list can aect subjects' is risky responses. These ndings are consistent with Karni and Safra's (1987) impossibility result concerning the experimental observability of non-expected utility preferences. They show that in an experiment that uses the RIS to select a decision to be paid, if the subject reduces compound lotteries, the experiment may fail to be incentive compatible if the independence axiom does not hold. We conjecture that the list presentation invokes reduction-like evaluation of the experiment, and since independence is known 1 to be prone to failure in the vicinity of certainty, the certainty eect can rationalize our experimental ndings. Our work includes a practical recommendation for experimentalists who would like to continue to use the more precise information contained in choice lists. We believe that the between-subject design employed in the current study, in which a control group of subjects who make a single pairwise choice is used to test for systematic bias in the choice list, could and should be easily incorporated in future studies. 2. Experimental design Subjects were recruited from Amazon's Mechanical Turk online labor market and completed the experiment on our server; the full details of the procedure are described in Appendix A. The details of a robustness check using a student sample are described in Appendix B.2. The main experiment consisted of sixteen dierent treatments. The whole design is between-subjects, so each subject was randomly assigned to one of the treatments. In treatments P1 and P2, subjects made only a single pairwise (binary) choice and were paid based on their choice. In P12 and P21 subjects made two pairwise choices (in reverse order), one of which was randomly selected to determine payment; these choice problems corresponded to choice tasks subjects faced on line 11 of the choice lists appearing in Table 1. 1Section In treatments beginning with the letter L (list), each 6 in Cerreia-Vioglio, Dillenberger, and Ortoleva (2015) provides a recent discussion of this evidence. ELICITING RISK PREFERENCES USING CHOICE LISTS 3 Table 1. The list questions Q1 Q2 Line Option A Option B Option A Option B 1 (3, 1) (3, 1) (4, 1) (4, .98) (3, .5) (3, .5) (4, .50) (4, .49) . . . . . . . . . . . . 11 . . . (3, 1) (4, .80) (3, .5) (4, .40) . . . . . . . . . . . . 26 (3, 1) (4, .50) (3, .5) (4, .25) 2 . . . question consisted of a list of pairwise choices in which the left hand side lottery (Option A) was held constant throughout the list while the probability of winning the higher prize in the right hand side lottery (Option B) decreased as subjects proceeded down the list. In treatments L1 and L2, subjects responded to a single choice list while in treatments LO12, LO21, LA12, and LA21, subjects responded to two choice lists; Table 1 presents the list of pairwise choices employed in all list treatments. In treatments LO12 and LO21, one of the two choice lists was randomly selected to determine payment, while in treatments LA12 and LA21 both choice lists determined the payment. Treatments LO21 and LA21 reversed the order of the list questions of treatments LO12 and LA12 respectively. Subjects were informed that whenever a choice list was used to determine their payment, one line from that list - which corresponds to a single pairwise choice, would be randomly selected to be played out to determine the subject's bonus payment. In the L treatments, we allowed subjects to switch from Option A to Option B at any number of points on the list, but used a pop-up to warn subjects who switched from Option B to Option A and then back to Option B. The S (separate screens) treatments mirror the L treatments, except that before completing each list subjects responded to a sequence of (non-incentivized) pairwise choices that appeared on separate screens. In the S treatments, the pairwise choice tasks were presented so as to converge towards the switching point for a subject with monotone preferences. Subjects then responded to an incentivized list that was already lled in using their responses to the pairwise choice tasks but was otherwise identical to that in the corresponding L treatment (crucially, subjects were free to change their answers in the list). Subjects completed the experiment (HIT) by submitting a completion code generated by our website to the Mechanical Turk interface. A random number generator was used to resolve all risks automatically, and subjects were informed of how much ELICITING RISK PREFERENCES USING CHOICE LISTS 4 Table 2. Treatments and treatment labels One Q Pairwise choice (P) Two Qs, Order Q1 (1) Q2 (2) Q1, Q2 (12) Q2, Q1 (21) 39 41 20 22 36 (27) 35 (29) 47 (43) 45 (41) 36 (31) 33 (26) 37 (36) 32 (29) 26 (25) 25 (25) Pay One Standard List (L) Lists List (O) Pay Both Lists (A) Separate Pay One Screens List (O) then List Pay Both (S) Lists (A) 48 (46) 49 (48) Entries indicate the number of subjects and, for L and S treatments, the number of monotone subjects in brackets (where applicable) Table 3. Answers to line 11 Pairwise choice Choice list One choice Two choices One list Two lists 23% 24% 40% (42%) 46% (47%) Q2 27% 33% 39% (39%) 38% (38%) n 39/41 42 95/94 (89/89) 260 (228) Treatments P1,P2 P12, P21 L1,L2,S1,S2 All O and A Q1 Fraction choosing the riskier option, all subjects (all monotone subjects in brackets) of a bonus would be paid after completing the study. Payments were credited to subjects' Mechanical Turk accounts within 30 minutes of completing the experiment. Given that the experiment took at most 15 minutes to complete, our $1 base payment was somewhat high, and the bonus payments of $3 or $4 provided very high incentives for this subject pool. In Appendix C we discuss Amazon's Mechanical Turk as a subject pool for economic experiments. Table 2 presents the assignment of subjects to the dierent treatments. 3. Results Some of the analysis focuses on monotone subjects who exhibit single-switching in each choice list and who do not choose a dominated option in the rst line of a list. 3.1. Choice lists versus pairwise choice. Table 3 presents the distribution of choices for line 11 of the list, grouped by the incentives provided. There are no signicant dierences between asking one question or asking two questions shown on separate screens when only pairwise choices are made (p = .92 ELICITING RISK PREFERENCES USING CHOICE LISTS for Q1 and p = .53 5 for Q2, exact tests), a nding consistent with the literature supporting the incentive compatibility of the RIS in which subjects respond to a small number of pairwise choices (Starmer and Sugden, 1991; Cubitt, Starmer, and Sugden, 1998). The most obvious dierence in Table 3 is that in Q1 only 23% of subjects responding in the pairwise choice treatments chose the risky (B) option, but 45% of subjects responding to the choice list chose this option, a signicant dierence (p < .001, exact test). In Q2, 30% of subjects chose lottery B in pairwise choice, and 38% chose this lottery in the choice list, a dierence that is not statistically signicant (p = .17, exact test). Comparable results would hold if we include only monotone subjects or only subsets of list treatments. 3.2. The independence axiom. Under both pairwise choice and choice list, re- sponses are close to expected utility. Choices exhibit a slight common ratio eect with pairwise choice, and a slight reverse common ratio eect when using choice lists. In pairwise choice, the violations of the independence axiom are not signicant (p = .80 for an exact test for P1 vs P2, all of the P treatments). p = .38 for an exact aggregate test pooling However, since these two questions only look in a very particular region of the Marschak-Machina triangle, we do not view this evidence as providing strong evidence in favor of the independence axiom. Pooling all the list treatments and ignoring the within-subject nature of part of the treatments, the median switching points (for monotone subjects) in the choice lists are consistent with the following ranking: ($3, .5) ($4, .43), ($4, .86) ($3, 1) ($4, .84) and which is inconsistent with the independence axiom in the reverse direction of the standard common-ratio eect. This violation of the independence axiom is quantitatively small and is statistically insignicant at 1% (p sum test). ($4, .44) = .02, rank- 2 The aggregate analysis of behavior masks substantial heterogeneity of individual decisions that within-subject analysis of choice list data picks up. In choice lists, we detect violations of the independence axiom for 78.5% of monotone subjects, split between standard common ratio and reverse common ratio violations with the latter being slightly more frequent (Table 4), and is statistically insignicant at 1% 2With the large sample size, it is appropriate to use lower signicance level to limit the probability of a Type I error. ELICITING RISK PREFERENCES USING CHOICE LISTS 6 Table 4. Behavior relative to the independence axiom: within-subject Common Ratio Independence Reverse Common Ratio Choice List (L and a 33.3% 21.5% 45.2% 11.4% 68.0% 20.6% 19.1% 71.4% 9.5% S) Choice List (L and b S), line 11 c Pairwise Choice (P) a percentage of monotone subjects who switched earlier to lottery A in Q1 than in Q2 (common ratio), switched on the same line in Q1 and Q2 (independence), or switched on an earlier line to lottery A in Q2 than in Q1(reverse common ratio). b percentage of monotone subjects who answered both Q1 and Q2 and: chose lottery A in Q1 and lottery B in Q2 on line 11 (common ratio), made similar choices on line 11 (AA or BB, independence), chose B in Q1 and A in Q2 on line 11. c percentage of subjects in the P12 and P21 treatments, who chose A in Q1 and B in Q2 (common ratio), made similar choice in Q1 and Q2 (independence), chose B in Q1 and A in Q2 (reverse common ratio). (p = .052, 3 Sign test). Utilizing the ordinal information concerning the extent of de- viation from the independence axiom, these deviations become statistically signicant (p < .001, two-sided Wilcoxon signed-rank test). Pairwise choice data only detects violations of the independence axiom (AB and BA choice patterns) for 29% of subjects. Using only data from line 11 would detect a similar fraction of EU violations for the L and S treatments; in both cases, the deviations from the independence axiom are statistically insignicant at 1% (p = .388 and p = .019 Sign tests for pairwise and list data, respectively). 3.3. Treatment eects. By employing a choice list, the possibility of within-list contamination is equally present in all L and S treatments. Our treatments al- low us to test whether any of the varied factors in the subset of two-list treatments induce dierences in behavior. The 2x2x2 design embedded in the treatments ({L,S}×{O,A}×{12,21}) allows us to separately test for the presence of display (separate screen), payment mechanism, and order eects in each question. We nd that none of these eects (Table 5) are signicant at 1%, even without correcting for multiple hypothesis testing. Some of our discussion has focused on monotone subjects. A striking treatment eect is that the proportion of monotone subjects is higher in the S (separate screens) 3In a Sign Test one omits the zeroes to achieve UMP test. See Lehmann and Romano (2005) page 136. ELICITING RISK PREFERENCES USING CHOICE LISTS 7 Table 5. Display, payment mechanism and order eects in choice list Q1 Q2 Separate screens eect (L=S) .94 .81 Order eect (12=21) .15 .05 Payment mechanism eect (O=A) .38 .77 p-values reported for a Wilcoxon rank-sum test of equality of distribution Tests use only monotone subjects treatments than in the L (list) treatments (96% vs. Table 2). 85%; p < .001, As one might expect, there are relatively more (94% vs. exact test see 88%; p = .02, exact test) monotone subjects in the treatments in which subjects faced only a single list (as opposed to two). Neither order nor payment mechanism signicantly aect the proportion of monotone subjects (p = .85, .57 respectively, exact tests). We introduced the S (separate screens) treatment later in our experimental investigation, hoping it would bridge the dierence between standard choice list and pairwise choice. We conjectured that a combination of isolation in pairwise choices made on dierent screens, a lack of inuence of hypothetical versus real incentives, and a creation of a default when the actual list was displayed, would eliminate the observed dierences in responses made in choice list versus pairwise choice. Table 5 shows that the incentivized choice data do not tend to support this view. Moreover, comparing the responses to line 11 in the S treatments to pairwise choice, we nd signicant dierence in Q1 (p (p = 0.21, < 0.001, exact test) and insignicant dierence in Q2 exact test), similar to the L treatment. 4 4. Theory: Pairwise choice versus choice lists Faced with the robust experimental ndings documented so far, this section demonstrates they can be rationalized within existing theoretical models of non-expected utility preferences. The theory provides the tools to evaluate the robustness of inference from existing studies to the behavior documented here, and enables one to improve future experimental designs. The two main facts that we want to account for by all models considered are that subjects are more risk averse when making a pairwise choice involving certainty than 4We nd that 29% of monotone subjects who faced one list and 43% of monotone subjects who faced two lists in the S treatments amended at least one of their choices. They switched in both directions, with 65% of switches involving a move from riskier-preliminary to safer-incentivized choices. Looking only at line 11, the aggregate distribution of preliminary choices made in Q2 is identical to the incentivized choices, and the preliminary choices in Q1 are slightly more risk-taking (by 5 subjects) than the incentivized choices. ELICITING RISK PREFERENCES USING CHOICE LISTS 8 when making choices in a choice list, although their choices in the latter tend to satisfy the independence axiom (on average). n Consider a simple lottery p = (xi , pi )i=1 paying xi with probability pi , where xi > m xi+1 for each i. A compound lottery π = [pi , πi ]i=1 pays the simple one-stage lottery pi with probability πi . A subject who chooses option B for the last time at line ($4, 1 − 0.02 (i − 1) ; $0, 0.02 (i − 1)) (4.1) i of the list version of Q1 receives the two-stage compound lottery: 1 1 26 − i ($4, 1) , ; . . . ; ($4, 1.02 − .02i; $0, 0.02i − 0.02) , ; ($3, 1) , 26 26 26 A subject with preferences over compound lotteries and certainty equivalent function c (·) over one-stage lotteries who satises Segal's (1990) compound independence axiom (recursivity) is indierent between the compound lottery in (4.1) and the singlestage lottery: (4.2) 1 26 − i 1 c (($4, 1)) , ; . . . ; c (($4, 1.02 − .02i; $0, 0.02i − 0.02)) , ; $3, 26 26 26 A subject whose preferences are monotone with respect to rst order stochastic dominance and satisfy compound independence (she evaluates the choice of switching line in a choice list according to (4.2)), will choose the risky option on line B) whenever c (($4, 1.02 − .02i; $0, 0.02i − 0.02)) > $3. i (Option Thus, she will make identical choices in each line of the choice list as she would have made in an experiment in which she only faced the single pairwise choice. This prediction is inconsistent with the main nding of our experiment. We believe that the key observation in understanding our experimental ndings is that a subject who chooses the risky option (B) on the rst few lines of the list, does not face a choice that involves certainty on line 11 of the choice list. In other words, choosing the sure outcome (A) on line 11 does not lead to payment with certainty since the RIS may select one of the rst lines, reducing the attractiveness of the safe option for a subject sensitive to certainty. We suggest to model this behavior by assuming that the subject chooses her switching line in the choice list as if she reduces the compound lottery in (4.1) according to the laws of probability (Reduction of Compound Lotteries Axiom, ROCL; Samuelson 1952). Therefore, switching to the sure outcome (Option A) on line i+1 in the choice list induces the one-stage lottery ELICITING RISK PREFERENCES USING CHOICE LISTS 9 4 0.8 B1 0.75 List1 B2 0.4 0.375 Line 11 List2 Line 11 A1 3 A2 0.5 0 The decision problem in a Marschak-Machina triangle under reduction of compound lotteries: the pairwise choice in Q1 is between A1 and B1. The choice of a switching point in Q1 corresponds to a choice on the curve List1, such that an earlier switching point corresponds to a point on List1 that is closer to A1. The slope of List1 at line 11 equals the slope of A1B1: consider an expected utility subject (satisfying compound independence in addition to ROCL) who is indierent between A1 and B1. She will be indierent between Option A and Option B on line 11 of the choice list. Since her indierence curves are parallel straight lines, it follows that her indierence curves' tangency point to List1 occurs at the point corresponding to Line 11. Figure 4.1. (4.3) 1.01i − .01i2 26 − i .01i2 − .01i ; $3, ; $0, $4, 26 26 26 This follows the modeling approach used by Karni and Safra (1987) to explain preference reversals. In our view, the presentation of the choice list and instructions describing the RIS make the incentive structure particularly transparent to subjects, ELICITING RISK PREFERENCES USING CHOICE LISTS 10 leading them to choose the switching point as if they reduce the compound lottery 5 formed by their choices in the list and the RIS. We show by way of examples that plausible specications of preferences from the non-expected utility literature can generate the type of behavior we observe in the experiment. 4.1. Cautious Expected Utility. Suppose a subject's preferences are repre- sented by Cautious Expected Utility (Cerreia-Vioglio, Dillenberger, and Ortoleva, n 2015); that is, she ranks any (single-stage) lottery according to U ((xi , pi )i=1 ) = n P min u−1 pi u(xi ) , where U is a set of expected utility functions from R+ → R. u∈U i=1 Cerreia-Vioglio, Dillenberger, and Ortoleva show that the Cautious Expected Utility is characterized by the Negative Certainty Independence (NCI; Dillenberger, 2010) axiom (in addition to continuity and weak payo monotonicity). In our setting, the NCI axiom implies that (4.4) ($4, p; $0, 1 − p) % ($3, 1) =⇒ λ ($4, p; $0, 1 − p) (1 − λ) ($4, `1 ; $3, `2 ; $0, 1 − `1 − `2 ) % λ ($3, 1) (1 − λ) ($4, `1 ; $3, `2 ; $0, 1 − `1 − `2 ) where is the linear mixture operator, for any `1, `2 ≥ 0 with `1 + `2 ≤ 1 and any λ ∈ (0, 1). This is equivalent to: ($4, λp + (1 − λ) `1 ; $3, (1 − λ) `2 ; $0, λ (1 − p) + (1 − λ) (1 − `1 − `2 )) % ($4, (1 − λ) `1 ; $3, λ + (1 − λ) `2 ; $0, (1 − λ) (1 − `1 − `2 )) Consider a subject who chooses the risky alternative in the pairwise choice, hence she ranks ($4, .8; $0, .2) % ($3, 1). Monotonicity with respect to rst order stochastic dominance and transitivity imply that: ($4, 1 − .02 (i − 1) ; $0, .02 (i − 1)) ($3, 1) (4.5) for i < 11 Which means that the subject prefers the risky option (B) to the sure outcome (A) on lines 1, . . . 10. Note that for each line i of the Q1 choice list, the reduced one-stage lottery corresponding to arbitrary choices on all other lines, and the risky or safe alternative on line 5There i, can be written as a mixture of ($4, `i1 ; $3, `i2 , $0, 1 − `i1 − `i2 ) for exist evidence against reduction of compound lotteries as a descriptive axiom in other con- texts. Reduction of compound lotteries is a convenient modeling simplication that is sucient but not necessary. Moreover, in the examples that we provide the full force of reduction is inessential. ELICITING RISK PREFERENCES USING CHOICE LISTS 11 `i1 , `i2 ≥ 0 for which `i1 + `i2 ≤ 1 (corresponding to the lottery induced by choices on lines other than i), and ($4, 1 − .02 (i − 1) ; $0, .02 (i − 1)) or ($3, 1) (corresponding to the choice on line i). By (4.5), NCI (4.4) implies that for each i ≤ 11 she would some also rank 25 1 ($4, 1 − .02 (i − 1) ; $0, .02 (i − 1)) ⊕ $4, `i1 ; $3, `i2 , $0, 1 − `i1 − `i2 26 26 1 25 % ($3, 1) ⊕ $4, `i1 ; $3, `i2 , $0, 1 − `i1 − `i2 26 26 strictly for i < 11. Thus it follows that the subject will choose the risky option on line 11 and all preceding lines. An alternative way to view this result is through Figure 4.1. Recall that by Lemma 3 in Dillenberger (2010), indierence curves of preferences that satisfy the NCI axiom A1; this indierence curve must also be linear. Therefore, if the subject chose lottery B1 ($4, 0.8; $0, 0.2) in the pairwise choice, it must be that the steepest indierence curve is atter than the line A1B1. It follows are steepest through ($3, 1) - point that the optimal switching line in the Q1 choice list must be to the right of the point List1, since the the line A1B1. representing line 11 on the curve line 11 is equal to the slope of slope at the point corresponding to Preferences that satisfy NCI are consistent with the reverse pattern of choosing ($3, 1) in the pairwise choice and ($4, 0.8; $0, 0.2) on line 11 of the Q1 choice list. NCI does not have any implication for Q2, except that it is inconsistent with the combination of choosing ($3, 0.5; $0, 0.5) in Q2 and ($4, 0.8; $0, 0.2) in Q1 in the pair- wise choice problems. Cautious Expected Utility preferences are therefore consistent with our main ndings, and exclude the opposite of the main behavioral pattern observed in the experiment. 4.2. Rank Dependent Utility. Suppose a subject has rank-dependent utility prefn erences; that is, she ranks any (single-stage) lottery according to U ((xi , pi )i=1 ) = " ! !# n P P P f pj − f pj u (xi ). Suppose further the subject evaluates her choices i=1 j<i j≤i in our experiment according to (4.1). Take the neo-additive weighting function Wakker, 2010, pages 208-210), of the form: 6It is trivial to add a constant a≥0 f (see 6 to the weighting function in order to capture the possibility eect, and the same results as below hold as long as a+b<1 and a + 0.8b < 0.8. ELICITING RISK PREFERENCES USING CHOICE LISTS bp f (p) = 1 (4.6) if 0≤p<1 if p=1 12 0 ≤ b ≤ 1. Normalize u(0) = 0, u(4) = 1. Risk aversion implies that u(3) ∈ [.75, .8] (Chew, Karni, and Safra, 1987). Assume the subject is indierent between ($3, 1) and ($4, 0.8; $0, 0.2) in pairwise choice. Then u (3) = f (0.8) = 0.8b. If b < 1 7 it follows that ($3, 0.5; $0, 0.5) ≺ ($4, 0.4; $0, 0.6). where Consider a continuous approximation to the choice lists in which the subject chooses a switching point from B to A, tively. 8 q1 and q2 in Q1 and Q2 respec- Let the random line be selected for payment with a uniform probabil- [0, 0.5] in Q1 and on [0.5, .75] in Q2. Q1 (q1 ) = ($4, 2q1 − q12 ; $3, 1 − 2q1 ; $0, q12 ) so the ity on q1 Choosing induces the lottery subject would choose q1∗ to maxi- mize: U (Q1 (q1 )) = f 2q1 − q12 + u (3) f 1 − q12 − f 2q1 − q12 Substituting the neo-additive weighting function (4.6) and assuming an indierence in the pairwise choice version of Q1 results in an optimal switching point of 1 − f (0.8) = 1 − 0.8b. q1∗ = Therefore the subject will switch to the safe outcome between the lines corresponding to 0.2 and 0.25 in the choice list. The subject's payo at is higher than her payo of always choosing ($3, 1).9 That is, if q1∗ b < 1 the subject will ($4, 0.8; $0, 2) to ($3, 1) in the choice list. 3 3 2 2 Similarly, Q2 (q2 ) = $4, −2q2 + 4q2 − ; $3, − 2q2 ; $0, 2q2 − 2q2 + 1 . So the sub2 2 ject chooses q2 to maximize: 3 3 2 2 2 U (Q2 (q2 )) = f −2q2 + 4q2 − + u (3) f −2q2 + 2q2 − f −2q2 + 4q2 − 2 2 strictly prefer Substituting the neo-additive weighting function (4.6) and calculating the optimal 1−q ∗ u(3) ∗ switching point yields 1 − q2 = = 2 1 . Therefore, the subject will switch at 2 the same line in Q1 and Q2. That is, although she exhibited the certainty eect in 7Since U ($4, 0.4; $0, 0.6) = 0.4b > 0.4b2 = (0.5b) (0.8b) = U ($3, 0.5; $0, 0.5). 8So the probability of winning $4 at the switching point is 1 − q . i 9U (Q1 (q ∗ )) = f 1 2 q1∗ ) h 2 − f 1 − (1 − q1∗ ) 1 − (1 − + f (0.8) f 1 − h i h i 2 2 2 = = b 1 − (0.8b) + 0.8b2 2 ∗ 0.8b − (0.8b) − 1 − (0.8b) q1∗2 = b + b (0.8b) ((0.8b) − 1). U (Q1 (q1∗ )) − U ($3, 1) = b [0.2 − (0.8b) (1 − 0.8b)] ≥ 0 since 0.75 ≤ 0.8b ≤ 0.8. Thus: i = ELICITING RISK PREFERENCES USING CHOICE LISTS 13 pairwise choice, the choices made in the two lists will be consistent with expected utility. These results can be visualized using Figure 4.1. The indierence curves of the neoadditive weighting function are parallel straight lines in the interior of the triangle, but are discontinuous on its boundary in which the probability of earning $0 equals 0 (if u(3) b < 1). The indierence curves are atter (their slope equals 1−u(3) ) than the dashed line A1B1, so the indierence curve passing through B1 approaches the vertical axis 10 above A1. A subject who is indierent between therefore choose a switching point in List1 ($3, 1) and ($4, 0.8; $0, 0.2) that is to the right of Line 11. will Since the indierence curves in the interior of the triangle are parallel straight lines, this subject will strictly prefer B2 to A2 in pairwise choice (exhibit the certainty eect) but will have the same switching point in List2 as in List1 (just like an expected utility subject). 5. Related literature To elicit ner information about preferences, Becker, DeGroot, and Marschak (1964) proposed a convenient method (BDM, equivalent to a second-price sealed bid auction) for measuring the valuation of an alternative, which under standard assumptions is incentive compatible. That is, the elicited valuation is consistent with the underlying preferences. Generally, the BDM mechanism is viewed to be cognitively dicult for subjects (Plott and Zeiler, 2005; Cason and Plott, 2014). In the domain of risk preferences, the BDM mechanism has been used to elicit certainty equivalents of lotteries. The preference reversal literature demonstrated that these valuations can be systematically inconsistent with subjects' pairwise choices (Grether and Plott, 1979). Holt (1986), Karni and Safra (1987), and Segal (1988) identied that a failure of a version of the independence axiom may undermine the incentive compatibility of the BDM and rationalize observed behavior. Meanwhile, the experimental economics literature (confronted with the challenges involving the BDM in other domains) opted to use choice lists (e.g. Holt and Laury (2002), Andersen, Harrison, Lau, and 10The limit on the vertical axis equals 0.8(1−b) 1−0.8b . ELICITING RISK PREFERENCES USING CHOICE LISTS 14 Rutström (2006), Abdellaoui, Baillon, Placido, and Wakker (2011), Bruhin, FehrDuda, and Epper (2010)), which are a discrete implementation of the BDM through 11 a sequence of related pairwise choices. This experimental literature usually cites Kahneman and Tversky's (1979) isolation hypothesis as a sucient condition for the RIS to be incentive compatible, that is choices made under the RIS to be consistent with pairwise choices. Indeed, Starmer and Sugden (1991) and Cubitt, Starmer, and Sugden (1998) provide evidence that suggests that incentive compatibility tends to hold in experiments in which subjects make a small number of pairwise choices and are paid according to the RIS (see also Davis and Holt (1993), p. 451-3 for a critical discussion of this evidence). Choices made by subjects in these studies are generally inconsistent with the reduction of compound lotteries axiom. In experiments that use the RIS, incentive compatibility is equivalent to assuming compound independence (Segal, 1990). This observation has been recently generalized by Azrieli, Chambers, and Healy (2015) who analyze the optimal incentive scheme as a mechanism design problem that applies to a wider variety of environments. In a recent work, Cox, Sadiraj, and Schmidt (2015) use comparable designs to Starmer and Sugden (1991) but consider a larger number of mechanisms for paying subjects who answer multiple questions, which are incentive compatible for alternative theories of choice under risk. They nd evidence that the specic mechanism employed aects choices. None of the papers cited above uses choice lists, which have become the workhorse method of experimental economists studying individual preferences. Moreover, the earlier experimental papers that support incentive compatibility when subjects make a small number of pairwise choices have been taken to justify the usage of choice lists, notwithstanding the existing theoretical critiques of the BDM mechanism. Our results point to the empirical relevance of these critiques for experiments that use choice lists. 11A 12 choice list with varying probabilities, like the one used in the present study, was rst used (without incentives) by Davidson, Suppes, and Siegel (1957), revisited by McCord and De Neufville (1986), and was revisited (with the RIS) by Sprenger (forthcoming). 12Concurrent work by Castillo and Eil (2014) also explores behavior in choice lists by proposing and testing a model inspired by status-quo bias; however they do not compare behavior in choice lists to behavior in pairwise choice. Work in progress by Brown and Healy (2014) follows up and extends the current study. ELICITING RISK PREFERENCES USING CHOICE LISTS 15 6. Conclusion Our between-subjects study documents a signicant violation of incentive compatibility in choice lists for some pairwise choices that involve certain payment. We demonstrate how these ndings could be understood in light of the interaction between non-expected utility preferences and the RIS (Karni and Safra, 1987). A typical experiment whose primary goal is to measure preferences at the individual level must present a subject a sequence of decision problems. Usually (but not always) one of them is selected for payment. A core feature of our design is a between-subject comparison with a group of subjects who make a single pairwise choice. We believe that this design feature can and should be incorporated into future studies to evaluate existing or proposed methods for eliciting preferences. References Abdellaoui, M., A. Baillon, L. Placido, and P. P. Wakker (2011): The rich domain of uncertainty: Source functions and their experimental implementation, American Economic Review, 101(2), 695723. Andersen, S., G. Harrison, M. Lau, and E. Rutström (2006): Elicitation using multiple price list formats, Experimental Economics, 9(4), 383405. Azrieli, Y., C. Chambers, and P. Healy (2015): Incentives in Experiments: A Theoretical Analysis, . Becker, G., M. DeGroot, and J. Marschak (1964): Measuring utility by a single-response sequential method, Behavioral science, 9(3), 226232. Berinsky, A., G. Huber, G. Lenz, et al. (2012): Markets for Experimental Research: Evaluating Online Labor Amazon.com's Mechanical Turk, Political Analysis. Brown, A., and P. Healy (2014): Monotonicity failure versus framing eects in list elicitation procedures, presented at the Southern Economic Association Meeting in Atlanta. Bruhin, A., H. Fehr-Duda, and T. Epper (2010): Risk and rationality: Uncov- ering heterogeneity in probability distortion, Econometrica, 78(4), 13751412. Buhrmester, M., T. Kwang, and S. Gosling (2011): Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data?, Perspectives on Psychological Science, 6(1), 35. ELICITING RISK PREFERENCES USING CHOICE LISTS Cason, T., and C. Plott (2014): 16 Misconceptions and game form recognition of the BDM method: challenges to theories of revealed preference and framing, Journal of Political Economy. and Castillo, M., D. Eil (2014): Taring the Multiple Price List: Imperceptive Preferences and the Reversing of the Common Ratio Eect, . Cerreia-Vioglio, S., D. Dillenberger, and P. Ortoleva (2015): Cautious Expected Utility and the Certainty Eect, Econometrica, 83(2), 693728. Chew, S. H., E. Karni, and Z. Safra (1987): Risk Aversion in the Theory of Ex- pected Utility with Rank Dependent Probabilities, Journal of Economic Theory, 42, 370381. and Cox, J., V. Sadiraj, U. Schmidt (2015): Paradoxes and mechanisms for choice under risk, Experimental Economics, 18, 215250. Cubitt, R., C. Starmer, and R. Sugden (1998): On the validity of the random lottery incentive system, Experimental Economics, 1(2), 115131. Davidson, D., P. Suppes, and S. Siegel (1957): Decision making: an experi- mental approach, . Davis, D. D., and C. A. Holt (1993): Experimental Economics. Princeton Univer- sity Press. Dillenberger, D. (2010): Preferences for one-shot resolution of uncertainty and Allais-type behavior, Econometrica, 78(6), 19732004. Fudenberg, D., and A. Peysakhovich (2014): Recency, Records and Recaps: The eect of feedback on behavior in a simple decision problem, Proceedings of EC. Greiner, B. (2004): An online recruitment system for economic experiments, . Grether, D., and C. Plott (1979): Economic theory of choice and the preference reversal phenomenon, American Economic Review, 69(4), 623638. Holt, C. (1986): Preference reversals and the independence axiom, American Eco- nomic Review, 76(3), 508515. Holt, C., and S. Laury (2002): Risk aversion and incentive eects, American Economic Review, 92(5), 16441655. Horton, J., D. Rand, and R. Zeckhauser (2011): The online laboratory: con- ducting experiments in a real labor market, Experimental Economics, 14(3), 399 425. Kahneman, D., and A. Tversky (1979): under risk, Econometrica, 47, 263291. Prospect theory: an analysis of decision ELICITING RISK PREFERENCES USING CHOICE LISTS Karni, E., and Z. Safra (1987): 17 `Preference reversal' and the observability of preferences by experimental methods, Econometrica, pp. 675685. Lehmann, E., and J. Romano (2005): Testing Statistical Hypotheses, Springer Texts in Statistics. Springer, 3rd edn. Mason, W., and S. Suri (2011): Conducting behavioral research on Amazon's Mechanical Turk, Behavior research methods, pp. 123. McCord, M., and R. De Neufville (1986): " Lottery Equivalents": Reduction of the Certainty Eect Problem in Utility Assessment, Management Science, pp. 5660. Paolacci, G., J. Chandler, and P. Ipeirotis (2010): Running experiments on amazon mechanical turk, Judgment and Decision Making, 5(5), 411419. Plott, C., and K. Zeiler (2005): The Willingness to PayWillingness to Accept Gap, American Economic Review, 95(3), 530545. Samuelson, P. (1952): Probability, Utility, and the Independence Axiom, Econo- metrica, 20(4), 670678. Segal, U. (1988): Does the preference reversal phenomenon necessarily contradict the independence axiom?, American Economic Review, 78(1), 233236. (1990): Two-stage lotteries without the reduction axiom, Econometrica, 58(2), 349377. Sprenger, C. (forthcoming): An Endowment Eect for Risk: Experimental Tests of Stochastic Reference Points, Journal of Politicl Economy. Starmer, C., and R. Sugden (1991): Does the random-lottery incentive sys- tem elicit true preferences? An experimental investigation, American Economic Review, 81(4), 971978. Wakker, P. (2010): Prospect Theory: For Risk and Ambiguity. Cambridge Univer- sity Press, Cambridge, UK. Appendix A. Experimental details Our online experiment was composed of two components: the Mechanical Turk (mTurk) interface used to recruit and pay subjects and an external experiment website where subjects made their choice decisions. Using the mTurk interface, we (as the recruiter) released an ad for a task (HIT for Human Intelligence Task) which could be viewed by online workers (turkers). All turkers that satisfy the required criteria can view a description of the HIT (in our case, we required turkers to have a US based account and a completion record of ELICITING RISK PREFERENCES USING CHOICE LISTS 18 Figure A.1. Mechanical Turk HIT description Description: This HIT asks you to make a series of choices among alternatives that involve monetary prizes. The HIT should take between 5-10 minutes to complete. Your answers will be used in an academic study on decision-making. Please click the link below to begin the HIT. Please enter your mTurk Worker ID and the following mTurk HIT ID where prompted in order to begin the survey. mTurk HIT ID: ${pw} When you are finished, you will receive a Completion Code that you must enter in the box below to receive credit for participation. Completion Code: Please do not take this HIT if you are not willing to commit 10 minutes of your full concentration to the HIT. The data we collect is being used for scientific research. We greatly appreciate your full attention and careful consideration of each question. Note: Any versions of this HIT can only be taken once by each worker. If you complete this HIT more than once, you will only be paid for the first time. Click here for list of workers who have completed a version of this HIT. *Note: Javascript is required for this HIT Please accept the HIT before you begin! CLICK HERE TO BEGIN HIT 95% or greater). Our HIT description was a short description of the experiment that included a unique HIT passcode along with a link to our experiment webpage, which was hosted on a private server. Turkers could either accept or decline the HIT once they read the description. If a turker accepted the HIT, he would click on the link to the external experiment website and enter his unique mTurk identier and the HIT passcode. The passcode was unique per HIT and one-time use. The passcode would expire after the turker completed the experiment. This prevented a turker from completing the HIT multiple times. Figure A.1 provides an example of one of our HIT descriptions. Once subjects logged into the external experiment website, they consented to the experiment, read the instructions, answered a short quiz to indicate understanding, made their choices, and were then informed of their bonus payment (determined by ELICITING RISK PREFERENCES USING CHOICE LISTS 19 Figure A.2. Instructions for treatment L1 **Do not use the BACK or REFRESH Buttons** Instructions You will be paid based on your choices in this experiment. When you have finished, you will be given an Completion code. Please retain that completion code, you will need it for payment. You will be asked to answer one question in which you make a series of choices between two options (Option A and Option B). Your payment will be determined by your choices in this question. Options A and B will consist of a monetary payment (either 10 or 13 dollars) to be paid with some random chance. The random chance is determined by a stated number X which is between 1 and 100 and a numbered ball drawn from a box. The box contains 100 balls numbered 1 to 100. If the number on the ball drawn is less than or equal to the stated number X, then the random draw is successful The box contains 100 balls numbered 1 to 100. Each number is in the box exactly once. Each number is equally likely to be drawn. To ensure you understand the instructions, please answer the following quiz. You must successfully complete the quiz before you may continue. Quiz Click on the answers below to select your answers. Your selected answer will be highlighted in yellow. 1. A ball is drawn from a box containing 100 balls numbered 1 to 100 as described above. (a) The ball with the number 11 has the same chance of being drawn as the ball with the number 85? (b) Which is more likely? True False The ball drawn has a number The ball drawn has a number less than or equal to 90 less than or equal to 20 The ball drawn has a number The ball drawn has a number (c) Which is more likely? less than or equal to 40 less than or equal to 60 2. Suppose you choose the Option "$13 if the number on the ball chosen is less than or equal to 50." (a) What is your payment if the ball drawn is numbered 32? $0 $13 (b) What is your payment if the ball drawn is numbered 69? $0 $13 (b) What is your payment if the ball drawn is numbered 50? $0 $13 You may continue when you have completed the Quiz Continue a random number computer generator) and received a unique completion code. Subjects then entered the completion code back in the HIT page in the mTurk interface to complete the HIT. Figures A.2 and A.3 provide a set of example instructions and questions from treatment L1. Instructions and questions from other treatments were similar. Subjects were linked in our dataset (that contained the choice and payment data) to the Mechanical Turk site by both their mTurk identier and their completion code. ELICITING RISK PREFERENCES USING CHOICE LISTS 20 Figure A.3. List for treatment L1 **Do not use the BACK or REFRESH Buttons** For each line below, please choose Option A or Option B. Your payment will be determined by your choice (Option A or Option B) from a randomly selected line. Each choice could be the one that counts, so you should treat each and every line as if that choice will determine your payment. A number will be drawn from a box containing 100 balls numbered 1-100 as described in the instructions. If the number on the ball drawn is less than or equal to the number indicated in the question, then you will be paid according to your choice in the selected line. For example, suppose the second line is randomly selected: Option A is $10 if the number on the ball drawn is less than or equal to 100. Option B is $13 if the number on the ball drawn is less than or equal to 98. If you choose Option A, then you would be paid $10. If you choose Option B, then you would be paid $13 if the number on the ball drawn is less than or equal to 98. Click in the box below to select your choice. Your selected choice will be highlighted in yellow. Please select either Option A or Option B in each line. Line # Option A Option B $10 if the number on the ball drawn is less than or equal to: $13 if the number on the ball drawn is less than or equal to: 1 100 100 2 100 98 3 100 96 4 100 94 5 100 92 6 100 90 7 100 88 8 100 86 9 100 84 10 100 82 11 100 80 12 100 78 13 100 76 14 100 74 15 100 72 16 100 70 17 100 68 18 100 66 19 100 64 20 100 62 21 100 60 22 100 58 23 100 56 24 100 54 25 100 52 26 100 50 After you have made your choice please press `Continue'. This will complete the part of the experiment that involves your payment and on the next page you will be asked to provide some basic information about yourself. Continue This allowed us to match a turker's account with his payment information recorded in our dataset, and pay the turker accordingly. ELICITING RISK PREFERENCES USING CHOICE LISTS 21 Subjects were paid a at rate payment for completing the HIT and earned a `bonus' based on their choices. In our experiment, the payment corresponded to the show-up fee and the bonus corresponded to the incentivized payment. Payments must be set equal for all turkers who complete a HIT in the same batch, but bonuses may dier. Both payments and bonuses are at the recruiters discretion, thus turkers do not need to be paid unless they complete the task. We oered a payment of $1 for completing the HIT, and a bonus of $0, $3, or $4 corresponding to the risky outcomes in our lotteries. Bonuses depended upon the element of chance described in the RIS and the lotteries and the subject's choices. All payments were in American dollars. A recruiter can recruit n subjects for an experiment by releasing a 'batch' with n HITS. These tasks can be identical or individualized by the inputs in a csv le. We used individualized tasks dierentiated by unique HIT passcodes. HITs recruits n A batch with n dierent subjects. However, dierent batches of HITs could poten- tially be completed by the same subjects who completed HITs in dierent batches. The mTurk interface has no method to block turkers who have completed a HIT in a previous batch from completing future HITs. Our external experimental interface prevented this by blocking such subjects by matching entered mTurk identiers with a list of those who had previously completed a HIT (this list was automatically updated each time a turker completed a HIT). We also built in a secondary feature to ensure subject uniqueness by recording a subject's IP address when he completed a HIT. We could then cross-check the list of IP addresses to ensure that the same IP address did not appear for multiple subjects. Appendix B. Robustness checks B.1. Excluding extreme risk attitude. Table 3 contains our main result. One might worry that our main result is driven by a subset of extreme subjects. Table 6 below restricts analysis to subjects who never exhibit extreme risk seeking by sticking with B throughout the list, nor exhibit extreme risk aversion by switching immediately to A on the second line (non-extreme subjects). We repeat our statistical analysis on this subset of subjects. In Q1 23% of subjects responding in pairwise choice treatments chose the risky option, but 49% of non-extreme subjects responding to the choice list chose the risky option, a signicant dierence (p < .001, exact test). In Q2, 30% of subjects chose the risky option in pairwise choice, but 45% of nonextreme subjects choose the risky option in a choice list, which (given our sample size) we consider to be an insignicant dierence (p = .02, exact test). ELICITING RISK PREFERENCES USING CHOICE LISTS 22 Table 6. Answers to line 11 Pairwise choice Choice list One choice Two choices One list Two lists Q1 23% 24% 47% 50% Q2 27% 33% 46% 44% n 39/41 42 66/74 163 Treatments P1,P2 P12, P21 L1,L2,S1,S2 All O and A Fraction choosing the riskier option, non-extreme subjects Table 7. mTurk workers vs. student subjects Pairwise Choice Choice List mTurk Students mTurk Students 23% 33% 45% 44% n 81 27 355 27 Treatments All P Q1 All L,S,O,A Fraction choosing the riskier option B.2. Comparison to a student subject pool. A reader may have a legitimate concern that our main nding, which is the substantial and signicant dierence in responses to Q1 when it is made in a pairwise choice (P1) and when it is embedded in a choice list (line 11 in L1), can be attributed to the subject pool of mTurk workers. We therefore decided to check whether there exists a signicant dierence between behavior of students (which comprise the standard subject pool used in economics experiments) and mTurk workers. Subjects were recruited through the UBC's Economics department subject pool using ORSEE (Greiner, 2004) and oered the opportunity to participate in an online 13 experiment in which they would be paid by online money transfer. There was no show-up fee, which was disclosed to subjects in advance according UBC's BREB requirement. Our expectation was that students who are willing to sign up for an experiment with no show-up fee may be less sensitive to the certainty of a reward compared to a typical student subject. This possible selection bias could attenuate our results. Since student subjects tend to be paid much higher amounts per hour than mTurk workers typically earn, we scaled up the payos to $13, $10, and $0 from $4, $3, and $0. Table 7 presents our ndings in the two treatments: Q1 under pairwise choice (P1), and Q1 in a choice list (L1). 13All major Canadian banks have a $10 minimum transfer. ELICITING RISK PREFERENCES USING CHOICE LISTS 23 The students' behavior demonstrates exactly the same pattern as the data from mTurk workers - students are more likely to choose the sure payment in a pairwise choice task than when the choice is embedded in a choice list. Students' responses to Q1 in pairwise choice were not signicantly dierent from mTurk workers' choices in Q1 under pairwise choice (p = .32, exact test), nor were students' responses to Q1 in the list signicantly dierent from turkers' responses to Q1 under choice list (p > .99, exact test). Appendix C. Review of experiments on Mechanical Turk As a large online labor market, Mechanical Turk provides a convenient way to recruit and pay subjects over the internet. Mechanical Turk allows researchers to economize on costs and experiment on a dierent population from undergraduates. Mechanical Turk has been advocated as a platform for recruiting subjects by psychologists studying judgement and decision-making (Mason and Suri (2011), Paolacci, Chandler, and Ipeirotis (2010), Buhrmester, Kwang, and Gosling (2011)), political scientists (Berinsky, Huber, Lenz, et al., 2012), and economists (Horton, Rand, and Zeckhauser, 2011). A potential downside of running experiments on Mechanical Turk is that subjects complete the experiment from their home computer, and not in a controlled lab environment, making it dicult to know for sure who the subjects really are and how much attention they are paying to the tasks. Paolacci, Chandler, and Ipeirotis (2010) nd that the population of US-based turkers who participate in experiments is heterogeneous and is more representative of the US population than typical undergraduate samples, and that turkers pay as much attention to experimental tasks as undergraduates in a lab. Paolacci, Chandler, and Ipeirotis (2010) and Horton, Rand, and Zeckhauser (2011) show that some standard experimental results in the judgement and decision-making literature can be qualitatively and quantitatively replicated using turkers. student subjects. Our paper also replicates our result with a set of Fudenberg and Peysakhovich (2014) provide a recent use of the Mechanical Turk subject pool for an incentivized experiment on economic decisionmaking under imperfect information; their stakes are smaller but comparable to ours (50 cent show-up plus up to a $2 bonus) and their experiment is slightly more time consuming than ours (10-17 minutes). Freeman: Department of david_freeman@sfu.ca. Web: Economics, Simon Fraser http://www.sfu.ca/~dfa19/ University. e-mail: ELICITING RISK PREFERENCES USING CHOICE LISTS Halevy: 24 Vancouver School of Economics, The University of British Columbia. 997-1873 East Mall Vancouver BC V6T 1Z1 Canada. e-mail: yoram.halevy@ubc.ca. Web: http://www.economics.ubc.ca/yhalevy/ Kneeland: Department of t.kneeland@ucl.ac.uk. Web: Economics, University College http://terri.microeconomics.ca/ London. e-mail: