Preference Identification Under Inconsistent Choice: A Reduced-Form Approach∗ Jacob Goldin† Daniel Reck‡ June 13, 2014 Abstract Behavioral economics has documented numerous settings in which behavior varies according to seemingly-arbitrary features of the choice environment such as defaults, salience, or framing effects. Optimal policy design requires accounting for the preferences of inconsistent decision-makers but traditional revealed preference analysis breaks down when individuals exhibit systematic choice reversals. We consider binary choice problems in which preference-irrelevant “frames” affect the behavior of a subset of decision-makers in monotonic directions. In such settings, we show that preference identification hinges upon understanding the empirical relationship between decision-makers’ preferences and their propensity to optimize. We provide a range of tools for examining this relationship and identifying preferences, drawing on techniques analogous to those commonly employed in the program evaluation literature. We illustrate the usefulness of these techniques in an application to the optimal default problem. ∗ The authors wish to thank Jason Abaluck, Roland Benabou, Charlie Brown, James Hines, Bo Honore, Miles Kimball, Alvin Klevorick, David Lee, Alex Mas, Wolfgang Pesendorfer, Kareen Rozen, Joel Slemrod, Jesse Shapiro, and seminar participants at Princeton and the University of Michigan for helpful discussion and comments. Any errors are our own. † Department of Economics, Princeton University, email: jgoldin@princeton.edu ‡ Department of Economics, University of Michigan, email: dreck@umich.edu 1 1 Introduction Suppose policymakers wish to require that companies give their customers discretion over how their personal data, such as internet usage data, is collected and analyzed. Regulations of this kind are a subject of active debate in many countries. A key feature of these regulations is that privacy controls can be opt-in - so that customers must actively give a company permission to collect and/or use their personal data - or they can be opt-out - so that customers must actively tell a company not to use their data.1 Some consumers may prefer that their data be used by companies, in order to improve the quality of some online service or to see advertisements that are relevant to their interests. Other consumers may wish that their personal data remain private. Suppose that with opt-in regulation, 40 percent of individuals allow a given company to use their data, but that with opt-out regulation, 70 percent do so (and 30 percent opt out). If regulators’ only goal is to maximize individual welfare, how should they decide whether privacy controls should be opt-in or opt-out? Answering questions such as this one through the lens of economic theory is complicated by the difficulty in measuring preferences when behavior varies based on seemingly irrelevant features of the choice environment – such as defaults, salience, or framing effects. For example, if decision-makers’ preferences over a menu do not depend on which option is the default, observing that individuals’ choices are sensitive to the default casts doubt on the standard revealed preference approach to welfare economics. The question of how to conduct welfare analysis in such situations lies at the heart of important controversies in behavioral economics. In particular, for a benevolent planner to design a choice environment in a way that maximizes well-being, he or she must first have some means of identifying the preferences of those individuals whose choices will be affected. Most prior work takes one of two approaches to addressing the problem of preference recovery under inconsistent choice. First, one may utilize a positive model of behavior that fully specifies the mapping from a decision-maker’s preferences to her (potentially sub-optimal) behavior [e.g. Rubinstein and Salant, 2012, Carroll et al., 2009, Kahneman et al., 1997, Köszegi and Rabin, 2006]. Such approaches yield important insights but in many cases the resulting welfare conclusions are sensitive to the modeler’s choice between competing positive models that are observationally similar [Bernheim and Rangel, 2009, Bernheim, 2009]. An alternative approach is to restrict preference inferences to a subset of observed choice situations in which decision-makers choose consistently [Bernheim and Rangel, 2009]. However, by design, such approaches yield no information on the preferences of those decision-makers whose choices are influenced by the frame – the very group whose preferences are most relevant for selecting the optimal policy regarding which frame to 1 For a more detailed discussion of this issue, see Johnson et al. [2002]. 2 implement.2 Further “refinements” can provide a path forward for behavioral welfare analysis in contexts where choice situations are observed in which an observer is willing to assume that all decision-makers have chosen optimally.3 However, in many contexts, such as sensitivity to default options, there will be little reason to believe that any of the observed choice situations satisfy these strict requirements.4 In this paper we develop a framework for preference identification over binary choices when some – but not all – of the observed decision-makers optimize. In particular, we provide conditions under which one can recover the preferences of the optimizing decision-makers and methods for extrapolating those recovered preferences to the preferences of the population. To begin, we follow Salant and Rubinstein [2008] and Bernheim and Rangel [2009] by modeling decisions in terms of menus and frames – preference-irrelevant features of the choice environment that may affect behavior such as the default, the presence of irrelevant alternatives or irrelevant information, or the order in which choices are presented. When decision-makers choose consistently across frames, we assume that those choices reflect their preferences, an assumption we call the consistency principle. We also allow decision-makers to choose inconsistently across frames, but we limit our analysis to choice situations in which the frames affect all decision-makers in a uniform direction, an assumption we label frame monotonicity. Crucially, our approach does not require that an outside observer can identify ex ante which individual decision-makers are optimizing, nor that one can observe an individual decision-maker in multiple choice situations. Instead, we exploit the fact that frame monotonicity and the consistency principle imply that decision-makers who choose “against the frame” prefer the options that they choose. This insight, along with a statistical assumption concerning the assignment of decision-makers to frames, allows us to recover the preferences of consistent decision-makers – the subset of the population whose choices are unaffected by the frame.5 We next consider what can be learned about population preferences from the distribution of preferences among consistent choosers. We first show that population preferences may be partially identified using worst-case bounds. That is, once we recover the preferences of the consistent choosers, we can bound population preferences by alternatively assuming that the inconsistent decision-makers have the most extreme preferences possible (in either direction). 2 The observed choice data may still admit to an incomplete preference ordering [Bernheim and Rangel, 2009, Bernheim, 2009], useful for various normative applications [Fleurbaey and Schokkaert, 2013]. We formalize the claim that the optimal frame depends on the preferences of the frame-sensitive decision-makers in Section 8. 3 Applications of this refinement approach can be found in the tax salience literature [Chetty et al., 2009, Goldin, 2014, Reck, 2014, Allcott and Taubinsky, 2013]. 4 Another alternative is to turn from choice to survey data, either on hypothetical choice situations designed to elicit preference parameters [Barsky et al., 1997], or from surveys about subjective well-being [Benjamin et al., 2012]. While useful, survey approaches are also subject to numerous potential framing effects [Schwarz and Clore, 1987, Schwarz, 1987, Deaton, 2012]. A useful discussion of other approaches to preference recovery is provided in Beshears et al. [2008]. 5 An alternative interpretation of the empirical evidence concerning “preference reversals” is to conclude that inconsistent decision-makers simply lack normatively-relevant preferences in the first place. For someone who takes that view as a starting point, the contribution of our paper is that it provides a method for backing out the preferences of the consistent decision-makers from the aggregate observed choice data. 3 Turning to full identification of population preferences, we show how, under our assumptions, the problem of population preference recovery under inconsistent choice boils down to understanding the empirical relationship between decision-makers’ preferences over the menu items and their propensity to optimize. When using the term “optimize” here and elsewhere in the paper, we mean that a decision-maker chooses her preferred option in both frames, so that optimizing decision-makers and consistent decision-makers are one and the same. When optimizing behavior is uncorrelated with variation in decision-makers’ preferences – a condition which we refer to as decision-quality independence – population preferences may be recovered by extrapolating the preferences of consistent decision-makers directly to those whose behavior is sensitive to the frame. Whether decision-quality independence holds in a given context is an empirical question, one whose answer will vary based on the specific decision being observed. For the many situations in which decision-quality independence does not hold, we provide two types of tools for shedding light on the empirical relationship between optimizing behavior and decision-makers’ preferences. This information can then be used to make inferences about the aggregate preferences of the population. As in other empirical contexts, the more structure one is willing to impose on this relationship, the weaker the informational requirements are for making inferences about the parameter of interest. First, rather than simply extrapolating from consistent to inconsistent decision-makers – an invalid approach when decision-quality independence fails – we can adjust for observable differences between the two groups. That is, even though we cannot observe which individuals are consistent and which are inconsistent, we are able to recover the aggregate preferences of the consistent choosers as well as the aggregate observable characteristics of each group. If decision-quality independence holds conditional on these observable characteristics, we can recover population preferences by separately estimating the average preferences for each group of decision-makers and re-weighting those cell averages based on the distribution of observable characteristics among the inconsistent decision-makers. For example, it may be that the rich are more likely to optimize than the poor, and also that the rich have on average different preferences than the poor, but conditional on income, decision-makers’ optimizing behavior is uncorrelated with their preferences. As in other empirical contexts, the plausibility of this matching-on-observables estimator depends on what information about decision-makers can be observed. The more potential determinants of optimizing behavior that can be observed, the more likely it is that conditional decision-quality independence will hold. Second, we develop techniques that utilize variation in the decision-making environment through a decision-quality instrument with the following properties: (1) the variation affects decision-makers’ propensity to optimize monotonically, and (2) conditional on whether an individual optimizes, the variation does not affect choice behavior. For example, this variation could take the form of differences in the time pressure under which a decision must be made, the costs of comparing the available options, or the presence or ab4 sence of other drains on the decision-maker’s “cognitive load.” Under these assumptions, we show how one may recover the preferences of decision-makers whose optimizing behavior varies between states.6 Imposing additional structure on the relationship between preferences and the propensity to optimize allows for recovery of the full distribution of preferences and extrapolation to the preferences of decision-makers who never optimize. We demonstrate this using a latent variable model imposing functional form assumptions on the distribution of preferences and the propensity to optimize. Variation in a decision-quality instrument identifies the correlation between decision-makers’ preferences and their propensity to optimize, the key unknown necessary for recovering population preferences. With sufficient observed variation in decision-quality instruments, one can also model the relationship between decision-makers’ preferences and their propensity to optimize in a more flexible manner, which allows us to extrapolate to population preferences under weaker functional form assumptions. Finally, we describe how variation in the decision-quality instrument lends itself to an over-identification test of the decision-quality independence assumptions discussed above. In describing these various approaches to preference identification under inconsistent choice, we do not claim to have discovered a one-size-fits-all solution to this fundamental problem. Rather, we view our primary contribution as showing how imposing modest structure can transform the problem into one that is both more familiar and more tractable. The tools we introduce for recovering population preferences require additional assumptions relative to Bernheim-Rangel; the payoff to this additional structure is that our approach can be applied in a much broader range of situations – namely those in which only a subset of decision-makers are optimizing and in which each individual’s choice is observed only once. Even with this additional structure, our approach retains an important “reduced-form” flavor that allows us to draw conclusions about welfare without specifying the exact positive model that generates behavior. To illustrate the usefulness of our techniques, we formalize a planner’s choice of frame problem, such as that faced by a government seeking to “nudge” its citizens in a beneficial direction by choosing which of two options should be the default [Thaler and Sunstein, 2008]. We show that the solution depends on a weighted average of the preferences of the consistent and the inconsistent decision-makers, so that determining the optimal frame requires separately identifying the preferences of the consistent and the inconsistent decisionmakers. The tools we provide allow one to estimate these quantities from observed choice data. Limiting our analysis to binary choices simplifies the analysis considerably, but the tools we develop are useful outside of that domain as well. Before concluding, we provide one generalization of our approach, focusing on situations where decision-makers choose from a discrete,ordered set of options.. We briefly discuss additional generalizations to other choice situations. 6 As discussed below, these assumptions and results parallel the identification of a Local Average Treatment Effect using instrumental variables [Imbens and Angrist, 1994]. 5 The remainder of the paper proceeds as follows: Section 2 sets up the model, Section 3 shows how to estimate the preferences of consistent choosers, Section 4 discusses the identification of population preferences, Section 5 describes the matching-on-observables approach, Section 6 describes the decision-quality instrumental variables approach, and Section 7 describes the recovery of inconsistent decision makers’ preferences using the observed relationship between preferences and decision-quality. Section 8 shows how the optimal choice of default depends on the parameters our methods identify from choice data, and Section 9 generalizes the results beyond the binary choice setting. Section B in the Appendix shows how several positive models of framing effects relate to our assumptions. 2 Setup Consider a population of decision-makers of density 1, with individuals denoted by i. Each individual chooses from a fixed menu of two items, X = {x, y}. A choice-situation g ∈ G consists of a menu X = {x, y} and a frame d ∈ {dx , dy }. Choice behavior for agent i is described by choice function ci : G −→ X. Each decision-maker’s choice over X is observed once, under one of the two possible frames (dx or dy ). Let yi (d) indicate whether individual i would choose y from X under frame d, yi (d) = 1 ⇐⇒ ci (X, d) = y. Let dij indicate whether individual i is observed under frame dj , for j ∈ {x, y}. Let E[yi |dj ] denote the population average of choices observed under frame dj , E[yi |dj ] ≡ E[yi |dij = dj ]. We assume throughout that these population moments are directly observable, putting aside issues of finite sample size. To illustrate the notation using the privacy controls example described in the introduction, we may suppose that yi (dj ) indicates whether the individual allowed a company to use her data for the case whether privacy controls are opt-in (dx ) or opt-out (dy ). The hypothetical data from the introduction are E[yi |dy ] = 0.70 and E[yi |dx ] = 0.40. Moments from this hypothetical example used to illustrate our techniques are contained in Tables 1 through 3. We assume that agents have well-defined preferences over the elements in X. The preferences of agent i are represented by choice function mi : G −→ X. We assume that agents’ preferences are insensitive to the frame, which we call frame separability: mi (X, dx ) = mi (X, dy ) ∀i (1) As such, it will be convenient to write mi (X, d) = mi (X). Absent an assumption along these lines, observed differences in choices at dx versus dy pose no problem for standard revealed preference welfare analyses; that is, the fact that a decision-maker chooses differently in dx and dy could simply reflect the fact that her 6 preferences over x and y are contingent on information contained in the frame.7 Examples of frames might include: (1) which option is framed as the default; (2) the order in which options are displayed; (3) whether the consequence of selecting an option is framed as a loss or a gain; (4) whether the menu of options includes an irrelevant alternative; (5) the point in time at which a decision is made; or (6) whether various features of the choice are made salient. Let φi indicate whether individual i prefers y to x, φi = 1 ⇐⇒ mi (X) = y. Let E[φi ] denote average preferences for the population. Because preferences are not sensitive to frames, an agent who optimizes over the elements in X (i.e. according to m(.)) will make the same choice regardless of the frame. To accommodate the possibility that choices may depend upon the frame, we do not impose (as is done in conventional revealed preference analysis) that ci (X, d) = mi (X) ∀i, d. Rather, we allow agents to either choose consistently or to choose in a way that is sensitive to the frame. We assume that, when individuals choose consistently, those choices reveal their preferences – an assumption which we refer to as the consistency principle: ci (X, dx ) = ci (X, dy ) =⇒ mi (X) = ci (X, dx ) = ci (X, dy ) (2) As in Bernheim and Rangel [2009], (2) represents a weakening of the standard instrumental rationality assumption behind revealed preference analyses; to the extent that an agent makes sub-optimal choices consistently across frames, our approach will incorrectly treat such choices as revealing the agent’s true preferences.8 For example, if individuals choose whether or not to enroll in an individual retirement account, and the frame manipulates whether the default is enrollment or non-enrollment, then some individuals might consistently choose not to enroll in a retirement plan due to a “present bias” toward consuming in the present rather than in the future. To the extent that we believe that this present bias causes the individual to act against her own interests, the consistency principle would not be applicable. 9 In addition to allowing the frame to affect choice, we will impose frame monotonicity, which requires that when the frame affects choice, it does so in the same direction for each decision-maker: 7 For example, if an agent chooses hot chocolate from {hot chocolate, ice cream} under d and ice cream from {hot chocolate, x ice cream} under dy , there would be no apparent deviation from rationality if the frame indicated whether the season was winter or summer. This assumption is explicit in Salant and Rubinstein [2008] and implicit in Bernheim and Rangel [2009], who require it for determining when two potentially conflicting choice situations differ in terms of the frame or in terms of the available menu items. In this sense, frame separability is the property that distinguishes variation in frames from variation in menu items. 8 Although our discussion of the results focuses on the case in which decision-makers’ inconsistency across frames represents a failure of rational choice, our approach works equally well if framing effects are due to neoclassical factors such as the presence of transaction costs associated with selecting an option other than the default. See Appendix Section B. 9 For additional criticism of the consistency principle, refer to Masatlioglu et al. [2012], whose relationship to our work we describe in the Appendix. 7 yi (dy ) ≥ yi (dx ) ∀i (3) where dx and dy are labeled without loss of generality. This assumption rules out the possibility that some agents choose y if and only if x is the default. In conjunction with the consistency principle (2), frame monotonicity (3) implies ci (X, dx ) 6= ci (X, dy ) =⇒ ci (X, dx ) = x , ci (X, dy ) = y. With data on both frames, which we assume not to have for the analysis in this paper, this implication is a testable hypothesis.10 In the privacy controls example, frame monotonicity implies that when an individual is not consistent about whether she allows a company to collect and use her data, she will always let the company use her data under opt-out policies and never under opt-in policies.11 We can embed frame monotonicity and the consistency principle in the following expression: yi (dy ) ≥ φi ≥ yi (dx ). Intuitively, our assumptions imply that the two frames lead inconsistent decision-makers away from their preferred choice in opposite directions. Let ψi indicate whether agent i chooses consistently across frames, ψi = 1 ⇐⇒ ci (X, dx ) = ci (X, dy ). Under (1), ψi = 0 implies that the decision-maker fails to choose her preferred option in at least one of the available frames. In contrast, when ψi = 1, (2) allows us to conclude that the agent’s (consistent) choice behavior is optimal. Since we wish to study situations where some individuals optimize, we assume that a non-empty set of decision-makers choose consistently across frames:12 ∃i s.t. ψi = 1 (4) Finally, in order to avoid conflating the effects of d with heterogeneity in decision-makers’ preferences, it must be the case that individuals are not systematically assigned to choice-situations in ways that are correlated with their preferences or their propensity to optimize (“unconfoundedness”). Thus we assume that (φi , ψi ) ⊥ dix (5) Unconfoundedness might fail, for example, if new employees at a firm were presented with a different default 10 Revising our estimators to allow for the presence of a known fraction of non-conformist inconsistent choosers is a straightforward exercise. 11 One way that frame monotonicity would fail in this example is if using an opt-in privacy policy signaled to users that a company was trustworthy in its respect for privacy, which could cause her to allow data use under opt-in and not opt-out. However, this situation would also constitute a violation of frame separability, since the individuals preferences over whether she allows the company to use her data change when she learns about the company’s policies. 12 More formally, we assume that there is a subset i∗ of the population with strictly positive measure, such that ∀i ∈ i∗ , ψ = 1. i We write the assumption in terms of the existence of a consistent chooser for clarity. The same goes for assumptions 15 and 24 later on in the paper. 8 than more senior employees when choosing health plans. Unconfoundedness is guaranteed when individuals are randomly assigned to frames. 3 Identification of Preferences of Consistent Choosers Our first result shows how the above assumptions allow one to recover the preferences of the decision-makers who choose consistently across frames, despite observing each individual’s choice under a single frame only. Proposition 1 Let YA ≡ E[yi |dx ] E[yi |dx ]+1−E[yi |dy ] . Frame separability (1), the consistency principle (2), frame monotonicity (3), the existence of consistent choosers (4), and unconfoundedness (5) imply the following: (1.1) The fraction of the population that chooses consistently, E[ψi ], is given by E[ψi ] = E[yi |dx ] + 1 − E[yi |dy ]. (1.2) The fraction of consistent choosers who prefer y, E[φi |ψi = 1], is given by E[φi |ψi = 1] = YA . Proof of Proposition 1: The proof uses the fact that under our assumptions, choosing “against the frame” reveals preferences. We first analyze the case where the frame is dx . Use the law of iterated expectations to write: E[yi (dx )|dij = dx ] = E[yi (dx )|dij = dx , ψi = 1] p(ψi = 1|dij = dx )+E[yi (dx )|dij = dx , ψi = 0] p(ψi = 0|dij = dx ) (6) By the consistency principle (assumption 2), we know ψi = 1 =⇒ yi (dx ) = φi , which implies E[yi (dx )|dij = dx , ψi = 1] = E[φi |dij = dx , ψi = 1]. Unconfoundedness (5) then implies E[φi |dij = dx , ψi = 1] = E[φi |ψi = 1], and p(ψi = 1|dij = dx ) = p(ψi = 1). Similarly, frame monotonicity (assumption (3)) and the definition of ψi jointly imply that individuals who do not optimize will choose x under dx . Formally, ψi = 0 ⇒ yi (dx ) = 0. Hence we have E[yi (dx )|dij = dx , ψi = 0] = 0. Substituting these results into (6) yields: E[yi |dx ] = E[φi |ψi = 1] p(ψi = 1) (7) Now we apply a similar set of steps to E[yi |dy ] = E[yi (dy )|dij = dy ] to obtain that:13 13 By the law of iterated expectations: E[yi (dy )|dix = 0] = E[yi (dy )|dij = dy , ψi = 1] p(ψi = 1|dij = dy ) + E[yi (dy )|dij = dy , ψi = 0] p(ψi = 0|dij = dy ) 9 (8) E[yi |dy ] = E[φi |ψi = 1] p(ψi = 1) + p(ψi = 0) (9) Solving (7) and (9) for E[ψi ] = p(ψi = 1), and applying the identity p(ψi = 1) = 1 − p(ψi = 0) yields (1.1). By (4), p(ψi = 1) > 0, so we can substitute (7) and (9) into the expression for YA to obtain: YA ≡ E[yi |dx ] E[φi |ψi = 1] p(ψi = 1) = E[yi |dx ] + 1 − E[yi |dy ] p(ψi = 1) (10) The existence of consistent choosers (4) guarantees this quantity is well-defined. Simplifying the expression yields (1.2). Discussion of Proposition 1: An outside observer cannot identify exactly which individual decisionmakers are optimizing because each individual’s choice is observed only under a single frame However, Proposition 1 follows from the insight that, under frame monotonicity, only consistent decision-makers choose against the frame (i.e. choose x when confronted with dy or choose y when confronted with dx ). Formally, the useful property of choices under frame monotonicity is:14 ci (X, dx ) = y or ci (X, dy ) = x ⇐⇒ ψi = 1 (11) By (11) and the consistency principle (2), we can conclude that individuals who choose against the frame are revealing their preferences. Individuals’ frames are independent of preferences and optimizing by unconfoundedness, so we can regard the set of consumers choosing against the frame as a representative sample of all consistent choosers. The fraction of individuals choosing against the frame yields the size of the consistent population, and the subset of those individuals who choose y under dx yields the fraction of consistent choosers who prefer y. Note that both frame-monotonicity and unconfoundedness would be unnecessary (and testable) were we able to observe individual choice data on each decision-maker under each frame. Applying Proposition 1 to hypothetical choice data for our running example of opt-in versus opt-out privacy policies, in Table 1, we have E[ψi ] = 0.7, so that 70 percent of the population chooses consistently across frames. Similarly, we have YA = 0.4 0.4+0.3 = 4 7, which implies that approximately 57 percent of the As before, (2), guarantees ψi = 1 ⇒ yi (dy ) = φi , which implies E[yi (dy )|dij = dy , ψi = 1] = E[φi |dij = dy , ψi = 1]. Similarly, (3) and (2) imply that ψi = 0 ⇒ yi (dy ) = 1, which allows us to write E[yi (dy )|dij = dy , ψi = 0] = 1. Finally, unconfoundedness (5) guarantees E[φi |dij = dy , ψi = 1] = E[φi |ψi = 1], p(ψi = 1|dij = dy ) = p(ψi = 1), and p(ψi = 0|dix = 0) = p(ψi = 0). Substituting these results into (8) yields equation (9). 14 Proof : Suppose c (X, d ) = y, so y (d ) = 1. By frame monotonicity, we must also have y (d ) = 1. Now suppose instead x i i x i y that ci (X, dy ) = x, so yi (dy ) = 0. By frame monotonicity, we must also have yi (dx ) = 0. In either case, we therefore have y(dy ) = y(dx ). For the opposite direction, note that if y(dx ) = y(dy ), we must have that either c(X, dx ) = c(X, dy ) = x or c(X, dx ) = c(X, dy ) = y. So either ci (X, dy ) = x or ci (X, dy ) = y. 10 consistent decision-makers prefer y to x. Just over half of consistent choosers in this example prefer to have the company collect and use their data. Table 1: Average Choices by Frame Fraction choosing y under dy , E[yi |dy ] 0.70 Fraction choosing y under dx , E[yi |dx ] 0.40 Fraction consistent, E[ψi ] 0.70 Fraction of consistent that prefer y, E[φ|ψi=1 ] 0.57 It is important to note that identifying the preferences of the consistent decision-makers may be important in its own right. First, as described in Section 8, the key parameter needed for implementing the optimal frame is the average preferences of the inconsistent decision-makers. Thus even if one is able to successfully implement a refinement approach (as described in the introduction) to recover the preferences of the aggregate population, one must still have some technique for isolating the preferences of the consistent decision-makers in order to back out the preferences of the inconsistent decision-makers (using Bayes Rule). Second, contrary to our starting point one might assume that individuals whose choices are inconsistent across frames simply lack normatively-relevant preferences over the available options.15 In that case, Proposition 1 provides a method for isolating the preferences of those decision-makers who do choose in a consistent way. 4 Identification of Population Preferences The remainder of the paper focuses on the question of how to recover the distribution of preferences of individuals who choose inconsistently across frames. This section describes what barriers must be overcome to solve this problem. 4.1 Bounds on Population Preferences First, we note that we can partially identify population preferences using Proposition 1. Specifically, we can identify upper and lower bounds of E[φi ] by measuring the revealed preferences of the subset of decisionmakers who choose consistently and making the most extreme assumptions possible regarding the preferences of those who do not. Proposition 2 Frame separability (1), the consistency principle (2), frame monotonicity (3), and unconfoundedness (5) imply that E[φi ] ∈ [E[y|dx ] , E[y|dy ]]. Proof: See Appendix. 15 For a thoughtful discussion of this issue, refer to Fischhoff [1991]. 11 Discussion of Proposition 2 Proposition 2 offers a conservative approach for identifying the range of possible population preference parameter values consistent with the observed choice data. To illustrate, using the hypothetical choice data in Table 1, we have E[φi ] ∈ [0.4 , 0.7]. The bounds themselves are quite intuitive; the present analysis highlights that interpreting these population moments as bounds is correct only to the extent that our assumptions, notably including frame monotonicity, are satisfied. When the fraction of decision-makers failing to optimize is large, the bounds will be relatively uninformative and further assumptions or information will be required to shed light on population preferences.16 4.2 Characterizing the Full Identification Problem Proposition 3 characterizes the primary difficulty in fully identifying the preferences of two groups of interest: the full population and the population of inconsistent choosers. Proposition 3 Frame separability (1), the consistency principle (2), frame monotonicity (3), the existence of consistent choosers (4), and unconfoundedness (5) imply the following: (3.1) The fraction of the population who prefer y, E[φi ], is given by E[φi ] = YA − cov(φi , ψi ) E[ψi ] (12) (3.2) The fraction of inconsistent choosers who prefer y is given by E[φi |ψi = 0] = YA − cov(ψi , φi ) E[ψi ](1 − E[ψi ]) (13) Proof of Proposition 3 (3.1): This result follows directly from equation (10). (3.2): By the law of iterated expectations, E[φi ] = p(ψi = 1)E[φi |ψi = 1] + p(ψi = 0)E[φi |ψi = 0]. Substituting this into (12) and applying (2.1) yields the result. Discussion of Proposition 3 The key problem with extrapolating from the preferences of consistent choosers to other populations is that preferences and optimizing behavior may be correlated. Individuals 16 If choices are observed under multiple decision-quality states, as in Section 6, the bounds derived from the highest decisionquality state will yield the tightest bounds, because the preferences of a greater fraction of the population will be identified from consistent choice behavior. 12 who choose consistently might be more likely to prefer y to x than individuals who choose inconsistently. Proposition 2.1 and 2.2 show that a sufficient statistic for the difference between preferences of consistent choosers (given by YA by Proposition 1) and the preferences of other groups is the covariance between preferences and optimizing behavior (as well as the size of the consistent population, which we can recover using Proposition 1). When this covariance is not negligible, consistent choosers are not representative of the full population, so YA would be a biased estimate of their preferences, and the magnitude of the bias is largest when very few individuals choose consistently. Similarly, YA would be a biased estimate of the preferences of the inconsistent choosers alone, and the magnitude of the bias in this case is even larger for this group than for the full population.17 Crucially, any number of behavioral models could potentially be generating inconsistent choices across frames in a particular application. But for the purposes of identifying E[φi ] and E[φi |ψi = 0] from population choice data, the underlying positive model matters only to the extent that it shapes cov(φi , ψi ). Given knowledge of cov(φi , ψi ), one does not need to take a stance on the exact behavioral model explaining behavior. This result is important because it focuses the problem of preference identification on understanding this quantity, which one can examine empirically with a variety of reduced-form methods.18 4.3 Decision-Quality Independence Proposition 3 highlights that in one special set of cases, population preferences may be recovered by simply extrapolating the preferences of the consistent choosers. We refer to the necessary assumption as decisionquality independence. It states that the variation in whether decision-makers optimize is not systematically related to the variation in decision-makers’ preferences over x and y: cov(φi , ψi ) = 0 (14) Decision-quality independence is a strong assumption, likely to be unrealistic in many important applications. When it holds, however, we can recover population preferences via a straightforward application of Proposition 3: 17 Note that the denominator of (13) is equal to the variance of ψ . Using the definition of the correlation between φ and ψ , i i i we could also write this expression as E[φi |ψi = 0] = YA − corr(ψi , φi )E[φi ](1 − E[φi ]) which indicates that whenever preferences are highly variable in the population or the correlation between φi and ψi is large, the bias from using YA as an estimate of the preferences of inconsistent choosers will be large. 18 As in other reduced form empirical contexts, understanding the underlying behavioral model is still important for determining which observables need to be included when matching on observables (Section 5) or for determining what type of variation meets the requirements of a decision-quality state (Section 6). 13 Proposition 4: Assuming frame separability (1), the consistency principle (2), frame monotonicity (3), the existence of consistent choosers (4), and unconfoundedness (5), decision-quality independence (14) implies that Y A = E[φi ] = E[φi |ψi = 0]. Proof: The result follows directly from (2.1) and the assumption of decision-quality independence (14). Discussion of Proposition 4 Proposition 4 identifies sufficient conditions for the recovery of population preferences in situations where some decision-makers systematically fail to optimize. Intuitively, when individuals’ propensity to optimize is not systematically related to their underlying preferences, the preferences for the inconsistent decision-makers can be inferred from the revealed preferences of those decision-makers who are consistent, which we know from Proposition 1.Whether decision-quality independence holds in a particular setting is an empirical question, one whose answer will vary depending on the choice being made and the population of decision-makers. In Section B in the Appendix, we show that in certain positive models of framing effects, decision-quality independence obtains only under strong assumptions. For example, if variation in decision-making consistency across individuals stems only from variation in cognitive costs, decision-quality independence holds only when that variation is uncorrelated with variation in their preferences over the options being chosen.. Decision-quality independence thus provides a useful reference point for identifying population preferences, but as an assumption it should not be adopted uncritically. 5 Preference Recovery By Matching on Observables In some cases, decision-makers’ preferences may be systematically related to their propensity to optimize, based on observable factors. For example, it could be that the rich are more likely to optimize than the poor, and also more likely to prefer y to x.19 In such cases, decision-quality independence may hold after conditioning on the factors that are related to both propensity to optimize and underlying preferences. That is, it could be that the propensity to optimize is uncorrelated with one’s preferences, looking only within the sub-population of rich decision-makers (and similarly if one conditions on the poor decision-makers). When the factors that cause decision-quality independence to fail are observable, Proposition 5 shows how population preferences may be recovered. Suppose that decision-makers exhibit some observable characteristic wi ∈ {w0 , ..., wJ }, which is potentially associated with their preferences for x and y as well as their optimizing behavior. We assume that 19 For example, a rapidly growing literature investigates the relationship between income and decision-quality across a range of contexts. See Spears [2011], Goldin and Homonoff [2013], and Syngjoo Choi and Silverman [2013] for some recent examples and Mullainathan and Shafir [2013] for a comprehensive treatment of the subject. 14 some individuals with each realization of w choose consistently: ∀w, ∃i s.t. wi = w, ψi = 1 (15) We will condition our unconfoundedness assumption (5) on wi : (φi , ψi ) ⊥ di | wi (16) Note that for the above example, our approach will also be valid in the case where we are more likely to observe the rich under dx than under dy .Finally, we assume conditional decision-quality independence, i.e. that decision-quality independence holds conditional on the observable characteristic wi . cov(φi , ψi ) | wi = 0 Proposition 5 Let YA (w) = E[y|dy ,w]−E[y|dx ,w] E[y|dy ]−E[y|dx ] Pw . E[yi |dx ,w] E[yi |dx ,w]+1−E[yi |dy ,w] . (17) Let Pj = P r(w = j) for j ∈ {0, ..., J} and Sw = Assume that frame separability (1), the consistency principle (2), frame monotonicity (3), conditional unconfoundedness (16), the existence of consistent choosers (15), and conditional decisionquality independence (17) hold. Then (5.1) E[φi ] = Σj Pj YA (j) (5.2) E[φi |ψi = 0] = Σj Sj YA (j) Proof: See Appendix. Corollaries to Proposition 5: Distribution of Types Under Assumptions (1), (2), (3), (5), and (4), the distribution of type j among the consistent and inconsistent decision-makers is as follows: (5.3) The fraction of type j among the inconsistent decision-makers is given by P r(wi = j|ψi = 0) = E[y|dy ,w=j]−E[y|dx ,w=j] Pj . E[y|dy ]−E[y|dx ] (5.4) The fraction of type j among the consistent decision-makers is given by P r(wi = j|ψi = 1) = E[y|dx ,w=j]+1−E[y|dy ,w=j] Pj E[y|dx ]+1−E[y|dy ] Proof: See Appendix. Discussion of Proposition 5 Proposition 5 can be understood as applying the intuition of a matching estimator from the program evaluation literature to the case of preference recovery of inconsistent decisionmakers.20 For example, suppose that decision-makers are either rich or poor. Suppose that 70 percent of the 20 Consider the methods proposed by Abadie [2003] and Angrist and Fernandez-Val [2010] for identifying the fraction of compliers associated with an instrument and extrapolating the treatment effect for those compliers to a different population. In 15 consistent decision-makers are rich, but that only 40 percent of the inconsistent decision-makers are rich. The approach is to measure the revealed preferences of the consistent rich and poor decision-makers separately, and then extrapolate that information to the associated group of inconsistent decision-makers. Because we can identify the fraction of rich and poor decision-makers among the optimizers and non-optimizers (respectively),21 we can re-weight the revealed preference information to recover the aggregate preferences for the non-optimizers. The conditional independence assumption guarantees the validity of the extrapolation of preferences from rich optimizers to rich non-optimizers (and from poor optimizers to poor non-optimizers).22 The estimates of E[φi ] and E[φi |ψi = 0] are both weighted averages of the estimated mean preferences for the various w subgroups. The weighted averages differ to the extent that the distribution of w differs between the non-optimizers and the aggregate population; Pw measures the distribution of w in the general population whereas Sw measures the distribution of w in the population of inconsistent decision-makers. Intuitively, when preferences are independent of w, we will have YA (0) = YA (1) and E[φi ] = E[φi |ψi = 0]. When the propensity to optimize is independent of w, we will have S0 = S1 , and E[φi |ψi = 0] = E[φi ]. Consequently, note that the matching estimator admits a test of the null hypothesis that the unconditional decision-quality independence assumption is satisfied, provided that the conditional independence assumption is satisfied and YA (0) 6= YA (1). Specifically, one can test whether E[φi ] = E[φi |ψi = 0]. As discussed in the Appendix, the types of variables that must be accounted for in order for conditional decision-quality independence to hold will vary based on the underlying positive model that generates behavior. To illustrate the technique, consider the hypothetical choice data described in Table 2, in which individuals are categorized based on whether they graduated high school. Because the population moments, aggregated across education groups, are equal to the moments in Table 1, the average preferences for the consistent decision-makers is the same as well, E[φi |ψi = 1] ≈ 0.57. However, applying Proposition 5 suggests that preferences over x and y – e.g. preferences over whether a company is permitted use their data – are strongly correlated with education, E[φi |ψi = 1, HSi = 1] = 0.62 and E[φi |ψi = 1, HSi = 0] = 0.40. Additionally, decision-makers’ propensity to choose consistently across frames – e.g. the propensity to choose the same option regardless of whether the company adopts an opt-in or opt-out privacy policy – is strongly correlated with education: E[ψi |HSi = 1] = 0.90 and E[ψi |HSi = 1] = 0.40, implying that high school graduates that application, as in ours, extrapolation based on observables is made more difficult by the fact that observers cannot determine whether any particular individual is a member of the relevant group (compliers in their context, consistent decision-makers in ours). 21 To identify these fractions, we exploit the fact that only optimizers “pick against the frame,” and measure the fraction of each w type among that group. This is an application of a conditional version of the corollary to Proposition 1. Because we can observe the overall distribution of w in the population, we can use Bayes rule to back out the distribution of w among the inconsistent decision-makers. 22 In applications to survey questions, one can think of the weights as consistency weights which correct for inconsistent response bias, applied exactly as one applies propensity score weights to correct for survey response bias. We explore this problem in Goldin and Reck [in progress]. 16 constitute 80 percent of the inconsistent population.23 As a result, the fraction of inconsistent decisionmakers who prefer y to x is estimated to be E[φi |ψi = 0] = 0.44. Intuitively, high school graduates constitute a disproportionately large share of the consistent decision-makers, so their contribution is scaled down when calculating the preferences of the inconsistent group. Note that re-weighting the estimates this way changes the optimal policy in this example: the results now suggest that a majority of inconsistent choosers would be better off under opt-out privacy controls (see Section 8), in contrast to what we would conclude if we wrongly imposed decision-quality independence. Weighting in the manner suggested by Proposition 5, we can see that aggregate preferences for the population are estimated to be E[φi ] = 0.53. Table 2: Average Choices by Frame and High School Education 6 HS = 1 HS = 0 Total Fraction choosing y under dy , E[yi |dy ] 0.66 0.76 0.70 Fraction choosing y under dx , E[yi |dx ] 0.56 0.16 0.40 Fraction of population, P (w) 0.60 0.40 1.00 Fraction consistent, E[ψ] 0.90 0.40 0.70 Fraction of consistent population, P (w|ψi = 1) 0.77 0.23 1.00 Fraction of inconsistent population, P (w|ψi = 0) 0.20 0.80 1.00 Fraction of consistent who prefer y, E[φi |ψi = 1] 0.62 0.40 0.57 Fraction of inconsistent who prefer y, E[φ|ψi = 0] 0.62 0.40 0.44 Fraction of population who prefer y, E[φi ] 0.62 0.40 0.53 Exploiting Variation in Decision Quality to Identify Preferences In many cases, neither decision-quality independence (14) nor its weakened version, conditional decisionquality independence (17), will obtain. In such cases, we can make inferences about population preferences by directly examining the empirical relationship between agents’ preferences and their propensity to optimize. This section develops tools for learning about that relationship when one can observe variation in decisionquality states. Intuitively, decision-quality states make individuals more or less likely to optimize, but are not systematically related to individuals preferences. A change in decision-quality states allows us to observe empirically the preferences of individuals induced to optimize by that change. 23 That is, SHS=0 ≡ E[y|dy ,HSi =0]−E[y|dx ,HSi =0] PHS=0 E[y|dy ]−E[y|dx ] = 0.76−0.16 (.4) 0.3 17 = 0.80. 6.1 Setup We redefine a choice situation g ∈ G to consist of a menu X = {x, y} and a frame vector (d, z), where d ∈ {dx , dy } is a frame like before, and z ∈ Z is a decision-quality state. For simplicity, we initially focus on the case in which Z is binary, Z = {zh , zl }. Choice behavior for agent i is described by the choice function ci : G −→ X. As before, we assume that each decision-maker’s choice over X is observed only once, but now there are four possible choice-situations (one for each d by z combination). Let dij and zki (respectively) indicate whether individual i is observed under biasing frame dj and decision-quality state zk , for j ∈ {x, y} and k ∈ {h, l}. Agents have well-defined preferences over the elements of X summarized by mi : G −→ X. We will assume frame separability for both d and z: mi (X, dx , zk ) = m(X, dy , zk ) ∀i, zk (18) mi (X, dj , zh ) = mi (X, dj , zl ) ∀i, dj (19) Let φi denote whether individual i prefers option y as before,, and let yi (dj , zk ) denote whether individual i chooses option y in choice situation (dj , zk ). We continue to employ the consistency principle and frame monotonicity: ∀zk , ci (X, dx , zk ) = ci (X, dy , zk ) ⇐⇒ yi (dx , zk ) = yi (dy , zk ) = φi (20) yi (dy , zk ) ≥ yi (dx , zk ) ∀i, zk (21) Let ψi (zk ) indicate whether agent i chooses consistently when z = zk , ψi (zk ) = 1 ⇐⇒ yi (dx , zk ) = yi (dy , zk ). The decision-quality state affects the propensity of agents to optimize but does not affect an agent’s behavior conditional on whether she optimizes or not, which we call decision-quality state exclusivity: yi (dj , zh ) 6= yi (dj , zl ) ⇐⇒ ψi (zh ) 6= ψi (zl ) (22) Finally, we assume that the decision-quality state affects whether an individual optimizes monotonically, decision-quality state monotonicity: ψi (zh ) ≥ ψi (zl ) ∀i 18 (23) Moreover, we assume that this inequality is strict for a non-zero subset of the population, which implies the existence of contingent optimizers: ∃ i s.t. ψi (zh ) > ψi (zl ) (24) Examples of z include the time pressure for making a decision, the cost of obtaining or processing information about the various available choices, the opportunity cost of cognitive resources at the time of decision-making, or the degree to which one alternative is more salient than another. As discussed in the Appendix, the specific forms of variation that will satisfy these assumptions depends on the underlying model of behavior that generates framing effects in a particular application. Before proceeding, it will be useful to simplify the notation. Because X is held fixed throughout, we will typically suppress it as an argument in the various choice functions. As before, let φ ≡ E[φi ] denote the fraction of individuals who prefer y. Let E[yi | dj , zk ] = E[yi |dij = dj , zki = zk ] denote the fraction of individuals observed in situation (dj , zk ) who choose y. Finally, we assume that individuals are not assigned to choice-situations in ways that are systematically correlated with either their preferences or their propensity to optimize. Thus we assume that (φi , ψi (zh ), ψi (zl )) ⊥ dij , zki (25) for j ∈ {x, y} and k ∈ {h, l}. As before, this unconfoundedness assumption would be satisfied if individuals are randomly assigned to choice-situations. 6.2 Identifying the Preferences of Sometimes-Inconsistent Choosers Although we cannot observe exactly which decision-makers optimize under each decision-quality state, Proposition 6 allows us to recover the average preferences of the group of decision-makers who are inconsistent at one decision-quality state and consistent at the other.24 This information could be useful for three reasons. First, it allows for a test of the decision-quality independence assumption, described in Section 6.3 below. Second, it is useful for tracing out the relationship between preferences and propensity to optimize, which can be used to make inferences about preferences for other groups in the population. Third, we show in Section 8 that the preferences of this group will affect the optimal choice of the decision-quality environment. 24 There is a clear analogy between this result and that of Imbens and Angrist [1994], who show that under similar assumptions, an IV identifies the average parameter of interest for compliers. 19 Proposition 6 Define YC ≡ E[yi |dx ,zh ]−E[yi |dx ,zl ] E[yi |dx ,zh ]+(1−E[yi |dy ,zh ])−{E[yi |dx ,zl ]+(1−E[yi |dy ,zl ])} . Frame separability (18 and 19), the consistency principle (20), frame monotonicity (21), decision-quality exclusivity (22), (23), the existence of contingent optimizers (24), and unconfoundedness (25) imply that YC = E[φi | ψi (zh ) = 1, ψi (zl ) = 0]. Proof: See Appendix. Discussion of Proposition 6 One way to understand Proposition 6 is by analogy to the technique of instrumental variables in empirical economics. In particular, under our framework, z serves as an instrument for optimizing. The “effect” of optimizing (i.e. the difference in aggregate choice behavior among decisionmakers who optimize versus those who do not) is related to the fraction of the population that prefer y to x.25 The assumption of frame separability for z (19) and decision-quality state exclusivity (22) correspond to the standard exclusion restriction for instrumental variables: the instrument must affect the outcome only through the desired channel. Similarly, unconfoundedness (25) corresponds to the assumption that the instrument is uncorrelated with unobserved confounding variables. Finally, (23) corresponds to the monotonicity assumption required for the IV estimator to recover the local average treatment effect [Imbens and Angrist, 1994]. Given these similarities, it is not surprising that the estimator (YC ) itself corresponds to a standard Wald statistic: the numerator measures the aggregate change in choice behavior induced by the instrument, and the denominator scales this value by the change in the fraction of optimizers between the two levels of z.26 If one were interested only in measuring the aggregate preferences of the largest possible group, applying Proposition 1 to decisions observed under zh would accomplish that goal. In contrast, the primary value of Proposition 6 is that it offers a method for shedding light on the empirical relationship between preferences and propensity to optimize in the population. That is, rather than simply evaluating consistently revealed preferences under a single decision-quality state, YC is identified from the change in aggregate choice behavior between zl and zh . And as a result, it provides preference information about a group of decision-makers selected based on the level of z at which they begin to optimize. Another use for Proposition 6 is motivated by the optimal decision-quality model set out in Section 8. In that model, the welfare benefits associated with moving from a lower to a higher decision-quality state (thus inducing more decision-makers to optimize) 25 That is, when no one optimizes, no one picks y under d . When everyone optimizes, then φ fraction of decision-makers x choose y. The “effect” of optimizing (under frame dx ) corresponds to moving from the former situation to the latter. 26 Another way to understand the identification in Proposition 4 is as follows. If one could observe the effect on aggregate behavior of moving from a state in which no one optimizes to a state in which everyone optimizes, it would be straightforward to back out φ. The problem in practice is that states of the world in which all decision-makers optimize are rarely observed. By observing one state that is “closer” to full optimization than another, we can recover preferences by scaling the difference in aggregate behavior between states by the difference in optimizing behavior induced by the decision-quality instrument. We analyze this method for recovering population preferences formally in Section 6.4. 20 depends on the preferences of the decision-makers whose optimizing behavior is affected by the change. Proposition 4 shows that this quantity corresponds to YC . Given decision-quality state monotonicity, we can divide the population of decision-makers into three groups:27 Always-optimizers (A) who optimize at zl and zh , Never-optimizers (N) who do not optimize at zl or zh , and Contingent-optimizers (C) who optimize at zh but not at zl . Let φj denote average preferences for each group j ∈ {A, N, C}. That is, φA = E[φi |ψi (zl ) = 1], φN = E[φi |ψi (zh ) = 0], and φC = E[φi |ψi (zl ) = 0 , ψi (zh ) = 1]. Note that we can identify φA using Proposition 1 (restricted to choices observed at zh or zl ). Thus the difficulty for welfare analysis lies in making inferences about φN . Although φN may never be directly observed (since by assumption we do not observe that group of decision-makers choosing consistently), we may make inferences about it by extrapolating the observed relationship between preferences and propensity to optimize among decision-makers that we do observe. To take a simple example, if φA < φC , we may be willing to assume that φN will similarly be greater or equal to φC . If so, we can infer a lower bound for φ by setting φN = φC . In contrast, if we had instead observed that φA > φC , then setting φN = φC would generate an upper bound on φ instead. Table 3 illustrates the technique described in Proposition 6 for hypothetical data for the privacy controls example. We suppose that when privacy settings can be changed only by navigating through several web pages, individuals choose according to the moments in Table 1. When privacy settings are one click away from the home page and alterable when a user sets up her account, individuals are less susceptible to framing effects, and E[yi |dy ] = 0.55 and E[yi |dx ] = 0.45.28 We can back out the fraction of consistent choosers and the fraction of the consistent choosers at either zl or zh who prefer y using Proposition 1, as before. Note that the fraction of consistent choosers who prefer y is lower under zh than zl , because E[yi |dy ] changed by more than E[yi |dx ]. This observation would suggest, intuitively, that the 20 percent of choosers on the margin of optimizing between zh and zl are less likely to prefer y than the typical individual optimizing at either zh or zl . When we apply Proposition 6, we see that, indeed, the fraction of contingent optimizers who prefer y is 0.35. In the privacy controls example, these results would imply that contingent optimizers are, far less likely than the always-optimizers to prefer that a company be allowed to collect and use their data. 27 As mentioned above, our notation emphasizes the analogy to the Imbens and Angrist [1994] framework. this example, one can also imagine useful variation in decision quality state coming from the length and readability of the privacy policy, whether the policy frames the company’s use of user data as a loss or a gain (i.e. whether this enhances or mitigates the bias coming from the default), or the amount of extraneous information visible when users are manipulating their privacy settings. 28 In 21 Table 3: Average Choices by Frame and Difficulty of Changing Privacy Settings Hard to Change (zl ) Easy to Change (zh ) Fraction choosing y under dy , E[yi |dy ] 0.70 0.55 Fraction choosing y under dx , E[yi |dx ] 0.40 0.45 Fraction consistent, E[ψ(z)] 0.70 0.90 Fraction of consistent preferring y, E[φi |ψi (z) = 1] 0.57 0.50 Fraction of contingent optimizers preferring y, φ̄C 0.35 As the above discussion reveals, any extrapolation from φA and φC will be quite coarse with just two data points. The following corollary illustrates how observing choice behavior at a wider range of decision-quality states can provide a more precise understanding of the relationship between preferences and propensity to optimize, and hence, a more reliable basis for extrapolation to φ. Corollary 6.1: Multiple Observed Decision-Quality States Suppose choices are observed under decision-quality states z0 , z1 , ... zN , such that for k = 0, ..., N − 1, we have ψi (zk ) ≤ ψi (zk+1 ) ∀i and i for j ∈ (x, y), k = 0, ..., N , m = 0, ..., N . Let ∃i s.t. ψi (zk ) < ψi (zk+1 ). Assume (φi , ψi (zk )) ⊥ dij , zm Yk,m ≡ E[yi |dx ,zm ]−E[yi |dx ,zk ] E[yi |dx ,zm ]+(1−E[yi |dy ,zm ])−{E[yi |dx ,zk ]+(1−E[yi |dy ,zk ])} for 0 ≤ j < k ≤ N and let Y0 ≡ E[yi |dx ,z0 ] E[yi |dx ,z0 ]+1−E[yi |dy ,z0 ] . Assumptions (18), (19), (21),(20), and (22) imply that Yk,m = φk,m and Y0 = φ0 , where φk,m ≡ E[φi |ψi (zk ) = 0 , ψi (zm ) = 1] and φ0 = E[φi |ψi (z0 ) = 1]. Proof: See Appendix. Discussion of Corollary 6.1 Corollary 6.1 provides a method to trace out differences in the preferences of decision-makers who begin to optimize at different decision-quality states. For example one could observe qualitative features of the relationship between decision-makers’ propensity to optimize and their preferences for x versus y, such as whether it appears linear, concave, etc. One could also estimate this relationship econometrically in order to extrapolate it out of sample, to the preferences of those decision-makers who were inconsistent at each observed z. Section 7 describes an approach along these lines. 6.3 Over-Identifying Test of Decision-Quality Independence This section develops a test for whether decision-quality independence holds in a particular setting. The test may be applied in situations in which one can observe exogenous variation in some factor that affects the propensity of decision-makers to optimize. Later in the paper, Section 7 will show how such variation may be exploited to estimate population preferences when decision-quality independence fails to hold. 22 Now that the decision-quality state has been added into the model, the decision-quality independence assumption becomes: φi ⊥ ψ(zk ) , k = h, l (26) Assuming that Assumptions (18) – (25) are satisfied, decision-quality independence may be tested based on the following condition: Proposition 7 Under Assumptions (18) – (25), decision-quality independence (26) is satisfied only if the following quantities are equal: 1. E[yi |dx ,zh ] E[yi |dx ,zh ]+1−E[yi |dy ,zh ] , 2. E[yi |dx ,zl ] E[yi |dx ,zl ]+1−E[yi |dy ,zl ] , 3. E[yi |dx ,zh ]−E[yi |dx ,zl ] E[yi |dx ,zh ]+(1−E[yi |dy ,zh ])−{E[yi |dx ,zl ]+(1−E[yi |dy ,zl ])} and Proof: See Appendix. Discussion of Proposition 7 Proposition 7 states a necessary condition for decision-quality independence to hold, which may be tested given observed variation in some factor that satisfies the assumptions to be a decision-quality state. To understand the rationale behind the test, note that the numerator of the first of the three quantities measures the fraction of decision-makers at zh who choose y consistently (i.e. under both frames). Similarly, the denominator of the first quantity measures the fraction of decision-makers at zh who consistently choose x or y. Thus the first quantity denotes the fraction of optimizing decision-makers who prefer y fraction of those who choose consistently, evaluated at zh . The second quantity denotes the corresponding quantity, evaluated at zl . Moving from zl to zh increases the number of decision-makers who optimize. But if decision-quality independence holds, moving from zl to zh should not affect the fraction of decision-makers preferring y to x. In contrast, when this assumption fails to hold, optimizing behavior is systematically related to decision-makers’ underlying preferences for x versus y, causing these two quantities to diverge. As discussed above, the third quantity is the preferences of individuals on the margin of optimizing between zh and zl . Under decision-quality independence, these individuals should also have the same preferences as all individuals optimizing at zh or zl . The comparison of the third quantity with either of the first two is an analogue of Hausman’s [1978] test of endogeneity using instrumental variables (see also Wu [1973]). Note, however, that that the test is necessary and not sufficient. Even if decision-quality independence fails to hold, it could be the case that individuals induced to optimize by a change to zh happen to have the 23 same preferences as those optimizing at zl . Also note that, from a statistical perspective, if the number of individuals induced to optimize or the change in the fraction who prefer x to y at a different z is sufficiently small compared to the overall size of the optimizing population, then we will not have sufficient statistical power to test the decision-quality independence assumption by comparing the first two quantities in Proposition 5. In this sense, the test comparing the third quantity to the first two is the higher-power test, since it requires only that the change in the population of optimizers be large enough to identify YC precisely, not that the population of optimizers be much larger at zh than at zl . When decision-quality independence does not hold, recovering population preferences is harder because one cannot impute the revealed preferences of those who optimize to the rest of the population. However, the recovery of the preferences of several localized subsets of individuals, as in Propositions 6 and 6.1, suggests a way forward even in the absence of decision-quality independence. The next section explores such an approach, in which we impose more structure on the relationship between preferences and optimizing behavior to overcome this difficulty. 7 Using the Observed Preference-Decision-Quality Relationship to Estimate Preferences of Inconsistent Decision-Makers In the previous sections, we sought to impose as little positive structure as possible on the relationship between preferences and choices. We have shown that under relatively straightforward assumptions, it will be possible to recover the distribution of preferences in any group of individuals so long as 1) we can infer their preferences by comparing them to a subset of individuals who optimize (Sections 4.3 through 5), or 2) we can induce those individuals to optimize in some environment (Section 6). In this section, we impose more structure on the relationship between preferences and optimization, which yields an approach for recovering population preferences using a decision-quality instrument. 7.1 Setup Assume that for any individual i, Pi = P + θzi + εi ψi = 1 ⇐⇒ Pi > 0 24 where θ is a parameter, zi is an observable decision-quality variables like before. The variable εi captures idiosyncratic variation in the propensity to optimize, whose distribution is characterized below. An individual optimizes only given a sufficiently high value of the propensity to optimize variable Pi . Intuitively, we could think of zi as an instrument which increases cognitive costs, or increases the strength of the biasing frame. When the cognitive costs are sufficiently low or the biasing frame sufficiently weak, or the individuals’ own propensity to optimize sufficiently high, an individual optimizes. Note that Pi does not depend on the frame d, which reflects that individuals optimize if an only if they are consistent across frames in our model (the consistency principle). Note also that the decision-quality monotonicity assumption (assumption 23) and, (because εi will have strictly positive density for all εi ∈ R) the assumption that all changes in z affect some individuals (assumption 24) will be satisfied here when θ 6= 0. Next, we assume that preferences are determined according to the following latent variable model. Mi = M + νi φi = 1 ⇐⇒ Mi > 0 where νi captures idiosyncratic variation in the preference for y over x. We could also add observables wi to equation for Mi and/or Pi , which would allow for examination of conditional decision-quality independence like in Section 5. The assumption that Mi does not depend on zi corresponds to the exclusion restriction assumption, (19). We will assume that εi and νi have a bi-variate standard normal distribution, where the normalization is without loss of generality. For all individuals i, εi 0 1 ρ ∼ N , νi 0 ρ 1 where ρ ∈ (−1, 1) is the correlation between the error terms. Note that the independence of the error terms and zi and the frame d embeds assumption (25) from before. Note that decision-quality independence is satisfied if and only if ρ = 0. Note also that the assumption that changes in z are related with choices only if they cause an individual to optimize, assumption (20), is also embedded here. The final assumptions which close the model correspond to directly earlier assumptions on the relationship between biased choices in different frames. We maintain assumptions about frames, assuming that frames do not affect preferences (18) (since Mi does not depend on d), that frames affect all individuals in a known 25 fashion (21), and that consistent choices reveal preferences (20). Together these three justify the following identifying assumption: ci (X, dx , zi ) = y =⇒ ψi (zi ) = 1; φi = 1 ci (X, dy , zi ) = x =⇒ ψi (zi ) = 1; φi = 0 Re-formulating these in the form of the latent variables Pi and Mi , we have ci (X, dx , zi ) = y =⇒ εi > −P − θzi ; νi > −M ci (X, dy , zi ) = x =⇒ εi > −P − θzi ; νi < −M 7.2 Population Preference Recovery with Binary z For each individual we have data on a binary choice, yi , under frame dij and decision-quality variable zki . We wish to use this information to identify the parameters θ, M , P , and ρ. The following provides a reduced form method for recovering these parameters and testing decision-quality independence using methods previously described. Proposition 8 Suppose ψi = 1 ⇐⇒ P i > 0 and φi = 1 ⇐⇒ Mi > 0, where Pi = P + θzi + εi , εi 0 1 ρ Mi = M + νi , ∼ N , , and zk ∈ {0, 1}. Then under frame-monotonicity (21) and the νi 0 ρ 1 assumption that consistent choices reveal preferences (20): (6.1) One may identify the parameters P , M , θ, and ρ. (6.2) E[φi ] = Φ(M ), where Φ(.) is the standard normal cumulative density function. ´ P −θz ´ ∞ BV SN 1 (6.3) E[φi |ψi (zi ) = 0] = ψ(z) φ (ε, ν; ρ)∂ν∂ε, where φBV SN (a, b; ρ) is the density func−∞ −M tion for a bivariate standard normal with correlation coefficient ρ and evaluated at (a, b). Proof: See Appendix. Discussion of Proposition 8 Intuitively, we can think of the model here as an analogue of the Heckman [1979] bi-variate normal model of selection bias. Instead of selecting in and out of the sample, or, in the most common application, electing to work or not to work, individuals in our model end up choosing consistently or not. We wish to restrict our analysis to an individual’s choices conditional on her choosing consistently, because these choices are informative of preferences. Failing to account for selection induces bias if the determinants of optimizing and the determinants of preferences are correlated. The restriction that z does 26 not affect preferences amounts to an exclusion restriction of the form necessary to avoid identifying the parameters of the model on functional form alone [Puhani, 2000]. 7.3 Population Preference Recovery with Multiple Decision-Quality States With two values of z, the parameters of the latent variable model were exactly identified. With multiple values of z, we may employ maximum likelihood estimation to recover the underlying parameters of the model. In particular, suppose we observe decisions under N + 1 decision-quality states where N > 0; label these z0 , z1 , ...zN . Suppose that each enters into the propensity to optimize equation but satisfies the PN i exclusion restriction with respect to preferences. ψi = 1 ⇐⇒ Pi > 0, Pi = P + m=1 θj zm + εi , where i zm is an indicator equal to one if individual i chooses under decision-quality state m εi 0 1 before, we assume φi = 1 ⇐⇒ Mi > 0, Mi = M + νi , and ∼ N , 0 ρ νi Under these assumptions, an individual’s likelihood contribution is li = I{di = dx ; yi = 1}P r(εi > −P − N X and zero otherwise. As ρ . 1 i θj zm ; M + νi > 0) m=1 +I{di = dx ; yi = 1} 1 − P r(εi > −P − N X ! i θj zm ; M + νi > 0) m=1 +I{di = dy ; yi = 0}P r(εi > −P − N X i θj zm ; M + νi < 0) m=1 +I{di = dy ; yi = 1} P r(εi > −P − N X ! i θj zm ; M + νi < 0) m=1 where P r(εi > −P − N X ˆ∞ i θj zm ; P r(εi > −P − φBV SN (e, v; ρ)dvde νi > −M ) = m=1 N X −P − PN m=1 ˆ∞ i θj zm ; m=1 ˆ∞ i −M θj zm −M ˆ φBV SN (e, v; ρ)dvde νi < −M ) = −P − PN m=1 i −∞ θj zm Intuitively, the first and second term in li represent individuals who optimize and choose against the frame, and the second and fourth terms combine individuals who do not optimize with individuals who optimize but do not choose against the frame. 27 Variation in choices and the consistency of choices identifies each parameter as follows: the parameter P is implied by what fraction of the population are consistent at z0 . Each θk is identified by observing how the population of consistent choosers,ψ̄(z), changes as z moves from z0 to zk . The key parameter ρ is identified by the degree to which changes in z, which cause more individuals to be consistent, also cause individuals to be more likely to choose y. Given a value of all other parameters, the parameter M is implied by the fraction of consistent individuals choosing y over x at a given value of z. From these parameters, we can identify E[φi ] and E[φi |ψi = 0] using Proposition 2. 7.4 Population Preference Recovery under Flexible Functional Form Assumptions If we we observe multiple decision-quality states z, we can measure the preferences of decision-makers who optimize at each z using our earlier results. In this section, we illustrate a technique for out-of-sample prediction of the preferences of the population, and the preferences of optimizers, under the assumption that the preferences of optimizers can be written as a flexible polynomial in the fraction of optimizers.29 Formally, we assume, as in the Corollary 6.1, that we observe behavior in one of N + 1 decision-quality states, indexed z0 , z1 , ...zN . Suppose each of these decision-quality states comes from an ordered, continuous set of decision-quality states, [z, z̄] ⊂ R. For each individual, let zi∗ denote the value of z at which they begin to optimize, with probability density function f (z ∗ ) and cumulative density function F (z ∗ ). The decisionquality monotonicity assumption means that for any i, z > zi∗ =⇒ ψi (z) = 1, so ψ(z) = F (z). We will also assume that f (z) > 0 for all z ∈ [z, z̄], so that F (z ∗ ) is strictly increasing and has a well-defined inverse function. Finally, we will assume that F (z) and the preferences of marginal compliers φ̄C (z) = E[φi |zi∗ = z] are D-times differentiable, so that we can use Taylor Series approximations of degree D. Lemma 1 Under these assumptions, there exists a function g(ψ̄) such that given an accurate Taylor Series expansion of g(ψ̄) of degree D, there exist constants b0 , b1 , ..., bD such that E[φi |ψ(zk ) = 1] = b0 + b1 ψ̄k + b2 (ψ̄k )2 + ...bD (ψ̄k )D (27) Proof: See Appendix. This lemma implies that we can define an out-of-sample prediction problem strictly in terms of ψ̄(zk ) ≡ ψ̄k . 29 Our approach here shares some similarity to the literature on marginal treatment effects and local average treatment effects[Heckman and Vytlacil, 2005]. However, note that the non-parametric identification techniques in that literature require instrumental variables that drive the propensity to participate in the treatment over a range from 0 to 1. But in our context, if we were able to observe decisions at a decision-quality state variable that induced everyone to optimize, we could simply look at preferences revealed in that state to recover average preferences for the population. 28 Note that in equation 27, when ψ̄(z) = 1 for some z, we will have E[φ|ψi (zm ) = 1] = E[φi ] can be written as simply φ̄ = b0 + b1 + ... + bD . This insight forms the basis of the extrapolation procedure we use. Intuitively, when N = D, we will have D + 1 equations in D + 1 unknowns, so that b0 , ..., bD are just identified. When N > D, we will have more equations than unknowns, and we can use a best-fit technique such as least squares to estimate b0 , ..., bD . To illustrate the technique, we will solve analytically the extrapolation to population preferences when N = D = 1. Equation 27 then becomes simply:30 E[φ|ψi (z) = 1, z] = α + β ψ̄(z) Proposition 9 (28) Suppose choices are observed under decision-quality states zh and zl . For any z, let YA (z) ≡ E[y|dx ,z] E[y|dx ,z]+1−E[y|dy ,z] and let ψ̄(z) = E[yi |dx , z] + 1 − E[yi |dy , z]. Under assumptions (18) – (25) and (28), 1. Average preferences in the population as a function of YA (zh ) and YA (zl ) are given by E[φ] ≈ YA (zh ) + 1 − ψ̄(zh ) [YA (zh ) − YA (zl )] ψ̄(zh ) − ψ̄(zl ) (29) 2. At any level of ψ̄(z), conditional preferences of optimizers and non-optimizers are given by ψ̄(zh )YA (zl ) − ψ̄(zl )YA (zh ) YA (zh ) − YA (zl ) + ψ̄(z) ψ̄(zh ) − ψ̄(zl ) ψ̄(zh ) − ψ̄(zl ) (30) ψ̄(zh )YA (zl ) − ψ̄(zl )YA (zh ) YA (zh ) − YA (zl ) + [1 − ψ̄(z)] ψ̄(zh ) − ψ̄(zl ) ψ̄(zh ) − ψ̄(zl ) (31) E[φ|ψi (z) = 1] ≈ E[φ|ψi (z) = 0] ≈ 3. Average preferences of the population as a function of YC and YA (zh ) are given by E[φi ] ≈ YA (zh ) + 1 − ψ̄(zh ) [YC − YA (zh )] ψ̄(zl ) (32) 4. Average preferences of the population as a function of YC and YA (zl ) are given by E[φi ] ≈ YA (zl ) + 30 This 1 − ψ̄(zl ) [YC − YA (zl )] ψ̄(zh ) (33) extrapolation will be exact if E[φ|ψi (zh ) = 1] − E[φi |ψi (zl ) = 1] E[φ] − E[φi |ψi (zl ) = 1] = ∆ψ̄(zh ) 1 − ψ̄(zh ) Proof: Suppose the above condition is satisfied. Let β = E[φ|ψi (zh )=1]−E[φi |ψi (zl )=1] , ∆ψ̄(zh ) and let α = E[φ|ψi (zl ) = 1] − β ψ̄(zl ). Solving these three conditions (the equations for α and β and the assumed condition) yields E[φ|ψi (zl ) = 1] = α + β ψ̄(zl ), E[φi |ψi (zh ) = 1] = α + β ψ̄(zh ), and E[φi ] = α + β. 29 Proof: See Appendix. Discussion of Proposition 9 Proposition 9 shows how to recover population preferences using a func- tional form assumption on the relationship between the average preferences of consistent choosers and the fraction of consistent choosers. Intuitively, changes in the decision-quality state increase the fraction of consistent choosers, which allows us to recover the relationship between preferences and the propensity to optimize. To facilitate its interpretation, we can rewrite equation (29) as follows φ̄ = YA (zh ) + πN [YA (zh ) − YA (zl )] πC where πN and πC are the fraction of the population who are never optimizers and contingent optimizers, respectively. When everyone optimizes at zh , πN = 0 and φ̄ = YA (zh ) as expected. The larger is πN relative to πC , the more weight we put on the differences between the average preferences of those optimizing at zh and those optimizing at zl . Note that equations (30) and (31) apply at any value of ψ̄(z), even values that are not observed in the data. For example, given data from decision-quality states where 30 percent of individuals choose consistently and then 80 percent of individuals choose consistently, we could extrapolate to the preferences of optimizers from continuing to change the decision-quality state until 90 percent of individuals choose consistently (so ψ̄(z) = 0.9). The next proposition shows that we can also use the estimated preferences of the contingentoptimizers, YC , to identify population preference parameters given a functional form assumption like (28). The third and fourth equations in this proposition provide intuitive formulas for extrapolation from the average preferences of contingent-optimizers and optimizers in a given decision-quality state to population preferences. In addition, Equations (32) and (33) formally justify our intuition that when individuals on the margin of optimizing between two decision-quality states have substantially different preferences than the average preference of all consistent individuals, we can obtain only an upper or lower bound on population preferences φ̄ from YA (z) alone.31 From equation (49), we can see that YC = φ̄C tells us the average preference of individuals on the margin between optimizing and not between zh and zl , and ∆ψ tells us how many individuals are on that margin. With the additional functional form assumption embedded in (47), this information will allow us to extrapolate to obtain the average preferences of never-optimizers, yielding the average preferences in the population. Finally, equation (32) has intuitive properties. We can re-write it as φ̄ = YA (zh ) + 31 Note πN [YC − YA (zl )] πA that the intuition here does rely on the monotonicity of equation 47. 30 Figure 1: Extrapolation from Preferences of Marginal and Average Optimizers φ̄ α β 0.57 0.50 E[φ] ≈ 0.48 2β E[φi |ψi = 1] 0.35 φ̄C 0.7 0.9 1 ψ̄ When there are no never-optimizers, equation (32) implies that we will have φ̄ = YA (zh ), as in the previous proposition. When there few individuals who always optimize, πA is small and we will have that φ̄ is significantly larger than YA (zh ). When YC and YA (zl ) are very different, population preferences should be expected to be very different from the average preferences of individuals who optimize at zh . Figure 1 depicts the proposed extrapolation graphically, using moments from Table 3. When E[φi |ψi = 1] is linear in ψ̄(z), the approximations suggested by Proposition 7 imply that the fraction of the population preferring y is E[φi ] ≈ 0.48. We show in the appendix that φ̄C is also linear, with twice the slope of E[φi |ψi = 1]. The figure shows that the preference of the average optimizers is pulled down by the preference of the marginal optimizer at a given level of ψ̄, and we can use this information, or the change in E[φi |ψi = 1] directly, to extrapolate to E[φi |ψi = 1] at ψ̄ = 1, which equals E[φi ] since everyone optimizes. 8 Application to the Optimal Default Problem This section shows how the parameters we focus on in the previous section are relevant for the selection of an optimal default and an optimal decision-quality state. We deliberately take a broad approach relative to others who have examined the optimal default problem, such as Carroll et al. [2009], seeking to impose as little positive structure as possible. We show how a planner, such as a regulator deciding whether privacy policies should be required to be opt-in or opt-out and how clearly privacy policies must be written or a benevolent employer selecting a default retirement plan and the width of the enrollment window, can use choice data to maximize the planners objective. 31 This section derives three key results. First, when decision-makers’ welfare depends only on the outcome they end up selecting, the optimal frame depends solely on the average preferences of the inconsistent decision-makers – that is, the quantity E[φi |ψi = 0]. Intuitively, the choice of frame does not affect the outcomes experienced by the consistent decision-makers, and consequently, the planner should ignore the preferences of that group when determining the optimal policy. Second, when decision-makers experience transaction costs associated with selecting an option other than the default, the preferences of the consistent decision-makers become relevant as well. In particular, the optimal frame depends on the weighted average of preferences between the consistent and inconsistent decision-makers, where the weights depend on the size of the transaction costs and the fraction of consistent choosers in the population. Third, we consider the problem faced by a planner who must decide whether to adopt a (potentially more expensive) decisionquality state. We show that the benefits of doing so depend on the difference between the preferences of the inconsistent decision-makers at the high decision-quality state, and the preferences of the decision-makers who would be induced to optimize by the policy change. Intuitively, when this difference is large, more sometimes-consistent decision-makers benefit from the increase in the decision quality state and the social planner may be able to provide the never-optimizers with a better default. 8.1 Setup Assume a continuum population of measure N chooses from a fixed menu X = {x, y}. A benevolent planner chooses between two frames, dx , and dy . The decision environment is given by z ∈ Z, which could be fixed (if Z is a singleton set) or chosen at some cost κ(z). For simplicity we assume that the planner cares only about giving an individual the option she prefers, any transaction cost that individual incurs, and the cost of implementing the decision environment z.32 The social planner seeks to maximize ˆ max d∈{dx dy },z∈Z I{ci (X, d, z) = φi } − γI{ci (X, d, z) 6= d} di − κ(z) i where I{} is an indicator function equal to 1 when the function inside the brackets is satisfied and zero otherwise; the first term indicates whether individual i chose her preferred option φi ; γ is a transaction cost incurred if an individual deviates from the default option; ci (X, d, z) 6= d when the individual does not choose the default option (so d = x when x is the default and d = y when y is the default), and κ(z) is the 32 Alternatively, one could specify the planner’s objective function to account for the intensity of decision-makers’ preferences, rather than just the ordinal preferences between x and y. That is, the planner would maximize a weighted sum of each individual’s (interpersonally-comparable) utility from her chosen option, ui (ci (X, d, z)). Implementing the solution to the planner’s problem in this case would require an estimate of the distribution of relative valuation of y compared to x, Mi = ui (y) − ui (x), which would require estimating the distribution of Mi as in the model in Section 7. In this case, the units of transactions cost and decision-quality-environment costs would be the same as the units of the utility function. We could also estimate the transactions cost directly from choice data in this model, rather than take it as a primitive parameter. 32 cost of implementing decision-quality state z. The units of γ and κ(z) are the number of individuals the planner would need to give their preferred option to justify incurring a cost of γ or κ(z). 8.2 Results We prove two simple propositions characterizing the solution to the planner’s problem. The first considers the optimal default when the decision-quality state is fixed. The second considers the joint choice of the optimal default and the optimal decision-quality state, assuming for simplicity that there are no transactions cost. Proposition 10 Suppose Z is singleton, Z = {z}. Assume the planner observes choices and assumes frame separability, the consistency principle, frame monotonicity, and unconfoundedness. Let φ̄N = E[φi |ψi (z) = 0], let φ̄A = E[φi |ψi (z) = 1], and let ψ̄(z) = E[ψi (z)]. The planner should choose dy iff φ̄N (1 − ψ̄(z)) + γ(1 − φ̄A )ψ̄(z) > 1 + (γ − 1)ψ̄(z) 2 (34) Proof: See Appendix. Discussion of Proposition 10 The optimal default will be option y when 1) the number of non-optimizers who prefer y is large, and 2) the number of optimizers who prefer y is large. The first group is helped by the default being y, since the default directly influences their choice. The second group is harmed by the default being y, since they will not incur a transaction cost to receive their preferred option. The first term of the left-hand side of equation (34) is the size of the first of these groups, and the second term is the size of the second group. The right-hand side tells us how large the number of individuals helped by the default’s being y must be for dy to be optimal. Note that when γ = 0, the condition for optimality of dy simplifies to simply φ̄N > 12 . When the planner does not care about transaction costs borne by optimizers, she seeks only to give as many of the non-optimizers as possible their optimal choice. The larger are γ or ψ̄(z), the more weight the planner’s decision places on the welfare of optimizers. One limitation of the approach taken here is that we assume the planner knows γ. None of our methods speak to how γ may or may not be revealed by choice data. We discuss this issue and the related problem of cognitive costs in the conclusion to the paper. 33 Proposition 11 Suppose the planner has recourse to two decision environments, Z = {zh , zl }. Suppose ψi (zh ) ≥ ψi (zl ) for all i, and ∃i, ψi (zh ) > ψi (zl ). Suppose κ(zh ) > κ(zl ), and let ∆κ = 1 N [κ(zh ) − κ(zl )] be the change in per-person cost of the increasing decision-quality state to zh . Assume frame separability over z and frame exclusivity. Continue to assume frame irrelevance over d, the consistency principle, frame monotonicity, and unconfoundedness. Suppose that γ = 0. Then solution to the planner’s problem is given case-wise by 1. (dy , zl ) if (a) φ̄N > 1 2 and φ̄N πN +φ̄C πC πN +πC > 21 , and ∆κ > (1 − φ̄C )πC , OR if (b) φ̄N > 1 2 and φ̄N πN +φ̄C πC πN +πC < 1 2 and ∆κ > φ̄C πC + (2φ̄N − 1)πN (a) φ̄N < 1 2 and φ̄N πN +φ̄C πC πN +πC < 1 2 and ∆κ > φ̄C πC , OR if (b) φ̄N < 1 2 and φ̄N πN +φ̄C πC πN +πC > 1 2 and ∆κ > πC (1 − φ̄C ) + (1 − 2φ̄N )πN , (a) φ̄N < 1 2 and φ̄N πN +φ̄C πC πN +πC < 1 2 and ∆κ < φ̄C πC (b) φ̄N > 1 2 and φ̄N πN +φ̄C πC πN +πC < 1 2 and ∆κ < φ̄C πC + (2φ̄N − 1)πN (a) φ̄N > 1 2 and φ̄N πN +φ̄C πC πN +πC > 1 2 and ∆κ < (1 − φ̄C )πC , OR if (b) φ̄N < 1 2 and φ̄N πN +φ̄C πC πN +πC > 1 2 and ∆κ < (1 − φ̄C )πC + (1 − 2φ̄N )πN 2. (dx , zl ) if 3. (dx , zh ) if 4. (dy zh ) if where φ̄A = E[φi |ψi (zl ) = ψi (zh ) = 1], φN = E[φi |ψi (zl ) = ψi (zh ) = 0], φC = E[φi |ψi (zl ) = 0, ψi (zh ) = 1], πA = ψ̄(zl ), πN = 1 − ψ̄(zh ), and πC = ψ̄(zh ) − ψ̄(zl ). Proof: See Appendix. Discussion of Proposition 11 The planner should switch to zh from zl if the number of individuals who receive their preferred option increases by enough to justify the increase in implementation cost ∆κ. In general there are two possibilities for the solution to the problem in this proposition: the optimal choice of default either depends on the choice of decision-quality environment or it does not. In the latter case, switching to zh from zl helps only those individuals who optimize at zh but not zl (group C), and who prefer 34 the non-default option. Parts (1a), (2a), (3a), and (4a) correspond to this situation. In the second situation, the optimal default changes as the planner increases from zl to zh . This occurs if the individuals who switch to optimizing at zh (group C) have different average preferences from the group who never optimize (group N). In this case, moving from zh to zl not only gives individuals in group C who prefer the non-default option their preferred option, but it also allows the planner to set a better default for Group N. For example, suppose the planner would want to set dx in zl but dy in zh . This corresponds to Parts (1b) and (3b) of the proposition. Then we must have φC < 1 2 and φN > 12 , and the preferences of the C group dominate when determining optimal policy under zl , which would occur if there are more of them or their preferences are more homogenous. In this case the benefit of switching to zh includes not only the benefit of giving those in group C who prefer the non-default option their preferred option, but also the benefit of setting a default which is more in accordance with the preferences of the remaining group who do not optimize, group N. How large this benefit is depends on how far φN is from 1 2 (i.e. how bad the previous default was for this group) and the size of group N. 9 Generalization to Ordered Choices with Two Frames This section develops an approach for preference recovery over larger menus, generalizing the theory from earlier in the paper, to illustrate that our results are useful outside the context of binary choices. There are many interesting possibilities for generalizations, but we focus here on choice situations g ∈ G consisting of a fixed, finite menu of ordered choices X = {x1 , ..., xK } and one of two frames, d ∈ {dh , dl }. Intuitively, one can think of a “high” frame and a “low” frame. For example, we might suppose that an individual chooses from a menu of insurance plans, ordered from low-cost, low-benefit plans to high-cost, high-benefit plans, and the frame either emphasizes or de-emphasizes the individual’s risk of serious illness. We will assume that we observe each individual i in exactly one frame, denoted dij as before, for j ∈ {h, l}. Recall that in the binary case, the consistency principle and frame monotonicity imply that individuals who choose the “low” option in the “high” frame prefer the low option. We will use this same intuition to develop an identification strategy for the non-binary setting. The preferences of agent i are represented by choice function mi : G −→ X. We continue to assume frame separability: ∀i, mi (X, dh ) = mi (X, dl ) (35) and we will suppress the irrelevant input, writing individual i’s optimal choice as mi (X). Define yi (dj ) as 35 follows for k = 1, ..., K: yi (dj ) = k ⇐⇒ ci (X, dj ) = xk We strengthen the frame monotonicity assumption as follows ∀i, y(dh ) ≥ y(dl ) (36) The frame monotonicity assumption imposes an implicit ordering on the menu and assumes that all individuals are pushed in the same direction by the frames. We introduce notation to encode preferences in a similar fashion to yi (.). Define yi∗ as follows for k = 1, ..., K : y ∗ = k ⇐⇒ mi (X) = xk We also strengthen the consistency principle with the following assumption, which we will call the partition-consistency principle: ∀i, yi (dl ) ≥ k =⇒ yi∗ ≥ k (37) ∀i, yi (dh ) ≤ k =⇒ yi∗ ≤ k (38) The name of this assumption comes from the following: suppose that we partition the menu into X 0 = {xJ , ...xK } and X 00 = X \ X 0 , for some J and K ≥ J. If the individual consistently chooses within X 0 across both frames, so c(X, dh ) ∈ X 0 , and c(X, dl ) ∈ X 0 , then assumptions (37) and (38) imply that m(X) ∈ X 0 . Note also that the partition consistency principle implies the consistency principle used in previous sections: if ci (X, dh ) = ci (X, dl ), then assumption (38) implies that ci (X, dh ) = mi (X). Note also that the partition consistency principle and frame monotonicity together imply that ∀i, yi (dh ) ≥ yi∗ ≥ yi (dl ). Similarly to before, we will indicate whether individual i prefers option k by φki ≡ I{mi (X) = xk }, and denote the fraction of the population preferring option k by φ̄k . For each k = 1, ..., K, we define partition consistency at k, ψik , as follows ψik ≡ I{yi (dh ) ≤ k and yi (dl ) ≤ k} + I{yi (dh ) > k and yi (dl ) > k} Intuitively, ψik captures whether an individual consistently chooses an option above or below k. Note also that frame monotonicity implies that one of the conditions inside each indicator function will be implied by the other condition. We denote the fraction of individuals who are partition consistent at k by ψ̄ k ≡ E[ψik ]. Finally, we assume unconfoundedness, which here requires that frames are independent of preferences 36 and partition consistency at k for every k: ∀k = 1, .., K, φki , ψik ⊥ dij (39) Proposition 12 Let Gj (k) ≡ P (yi (dj ) ≤ k|dij = dj ) for k = 1, ..., N, j = h, l and let Gj (0) ≡ 0. Let Yk ≡ Gh (k) Gh (k)+1−Gl (k) for k = 0, ..., K. Frame separability (35), frame monotonicity (36), partition consistency (38), and unconfoundedness (39) imply that for k = 1, ..., K, (12.1) The fraction of partition-consistent individuals at k with yi∗ ≤ k is given by P (yi∗ ≤ k|ψik = 1) = Yk (12.2) The fraction of partition-consistent individuals at k is given by ψ̄ k = Gh (k) + 1 − Gl (k) (12.3) The fraction of the population who prefer option k is bounded as follows: φk ∈ [Gl (k) − Gh (k − 1), Gh (k) − Gl (k − 1)] (12.4) If we additionally assume strong decision-quality independence: 0 ∀k, k 0 , cov(φki , ψik ) = 0, then the fraction of the population who prefer option k is φk = Yk − Yk−1 Proof: See Appendix. Discussion of Proposition 12 If we partition the menu of choices into options above and below some option xk , then frame monotonicity and the partition-consistency principle transform the problem to a binary problem, allowing us to use earlier propositions to identify individuals whose preferred choice is above or 37 below xk . The first two results, (12.1) and (12.2), are therefore the analogue of Proposition 1 in this setting. With only two observed frames, we cannot identify the fraction of individuals who are consistent across frames or the fraction of consistent individuals who prefer each option. An individual’s choices in this setting will not indicate whether her choices are consistent across frames under the assumptions we make, but in some cases choices indicate that an individual is partition-consistent. Namely, choosing in the lower partition under dh or choosing an option in the upper partition under dl will imply that an individual is partition-consistent with respect to that xk . As such, we can gain insight into preferences of several subsets of the population using the alternative property of partition consistency. Return to the insurance example described above, where the frame either emphasizes or de-emphasizes the risk of serious illness. When some individuals choose a low-benefit, low-cost plan under the frame that emphasizes the risk of serious illness, our assumptions imply that they prefer an option with costs and benefits at least as low as the ones they chose. The first two results allow us to estimate the fraction of decision-makers who consistently choose an insurance plan that is above or below some specified cost-benefit level, and among those people, how many prefer the low-cost plan. As before, we can also bound population preferences, reflected in (12.3). In this case, the many-options problem has a new and interesting structure relative to the binary case. In particular, even if individuals are highly susceptible to framing effects when they prefer some option far away from xk , our estimate for the fraction of people preferring some option xk can still be precise because we are able to use the partition consistency principle to ignore individuals highly subject to framing effects far away from k. Finally, with a stronger version of the decision-quality independence assumption, we can recover the distribution of preferences for the full population. Strong decision-quality independence guarantees that the tendency to be partition consistent for any partition is unrelated to an individuals’ preferences. This assumption implies our earlier definition of decision-quality independence, since individuals who are consistent across frames will be partition consistent for all partitions. However, the previous concept of decision-qualityindependence is insufficient for the recovery of population preferences in the two-frames situation because the only useful notion of consistency in this setting is partition consistency. If we were to assume that individuals are partition consistent only if they are fully consistent across frames, which is trivially true in the binary case, then the two assumptions about decision-quality independence would be equivalent. Under strong decision-quality independence, obtaining the preferences of partition-consistent individuals from (12.1) will yield the cumulative distribution of optimal choices in the population. Using standard statistical techniques, we can then recover the full distribution of population preferences. In the insurance example, consider the individuals choosing the lowest-cost, lowest-benefit plan after having the risk of serious illness emphasized, and individuals not choosing this plan after having the risk of 38 serious illness de-emphasized. The proportion of individuals in the first group to individuals in both groups will be, under strong decision-quality independence, the fraction of the population who prefer the lowest-cost plan. Proceeding similarly for any partition between relatively low-cost plans and relatively high-cost plans yields the fraction of the population who prefer one of the low-cost plans, which is the cumulative distribution of optimal choices. Because we can recover the cumulative distribution at every possible partition, we can recover the distribution of optimal choices in the population. The equivalence of this problem to the binary problem directly implies that we could generalize other identification strategies from the binary case. For example, we can identify the preferences of the population using observables via a conditional strong decision-quality independence assumption (the generalization of Proposition 5), and in the absence of any decision-quality independence assumptions we can recover the preferences of K groups of decision-makers who are contingently-consistent at k between two decision-quality states zh and zl (Proposition 6). 10 Conclusion Recovering preferences from choice data is a fundamental problem in behavioral economics; the presence of systematic “choice-reversals” casts doubt on the revealed preference approach that underlies neoclassical welfare analysis. We relax the standard revealed preference approach to accommodate the evidence that decision-makers sometimes choose differently based on preference-irrelevant features of the choice situation. Like Bernheim and Rangel [2009], there is a sense in which our relaxation of the standard approach is the minimum required to accommodate the observed choice inconsistencies; that is, we assume that decisionmakers who choose consistently across frames are revealing their true preferences. By imposing additional structure on the problem in the form of a frame monotonicity assumption, the problem of preference recovery is transformed into a problem of endogeneity: whether an individual reveals her preferences through choice may depend on her preferences over the objects being chosen. In many ways, this transformed problem is both more familiar and more tractable: over the last 50 years, economists have developed a wide range of tools for dealing with endogeneity in the recovery of parameters of this sort. This paper shows how many of these tools can be adapted to the problem of identifying preferences in the presence of inconsistent decision-making. An important feature of our approach is its reduced-form nature. Within the wide range of models consistent with our frame-monotonicity assumption, the basic identification problem – i.e., understanding the empirical correlation between decision-makers’ preferences and their optimizing behavior – is the same regardless of the specific structural model generating behavior. On the other hand, our approach is not a 39 replacement for traditional behavioral models. As in other areas of empirical economics, the parameters identified by reduced-form approaches depend on the underlying structural model that generates behavior. In particular, understanding the underlying structural model provides guidance about which types of control variables are needed for conditional decision-quality independence to hold and about which types of variation constitute valid decision-quality state variables. The Appendix considers these questions within a range of positive models that could explain framing effects in particular applications. The Appendix also shows, intuitively, that the more rational are the decision-makers, the less likely it is that preferences and the propensity to optimize will be independent. The framework studied here can also be thought of as a special case of a more general approach, in which an observer first identifies the preferences of a reference group of decision-makers whose choices, under some assumptions, will reveal their preferences, and then extrapolates those preferences to the population. In our approach, the reference group consists of those decision-makers who choose consistently across frames. This choice of reference group allows us to avoid ex ante assumptions about which decision-makers are likely to optimize.33 In other applications the reference group might consist of experts, experienced choosers, or those thought to be immune to the framing effect in question [Johnson and Rehavi, 2013, Bronnenberg et al., 2013].34 The approaches we have proposed may be utilized in such contexts; for example, one might want to adjust the recovered preferences of experts based on observable characteristics before extrapolating those preferences to the rest of the population. Similarly, given observed variation in a decision-quality instrument, one could test the assumptions required for extrapolation in such applications along the lines we have proposed. Although our focus has been on choice data, the methods we propose here apply equally well to situations in which survey response data reflect framing effects, such as sensitivity to question phrasing or the order in which answers are displayed. For example, assuming such framing effects satisfy monotonicity, one could apply Corollary 1 to Proposition 1 to recover the responses of those respondents whose answers do not vary by frame. We explore such issues in Goldin and Reck [in progress]. The methods described here are subject to important limitations. First, in certain applications, contrary to our assumptions, consistent choices may not in fact reveal preferences. For example, even decisionmakers who consistently choose one retirement plan over another, regardless of the default option, may still be choosing sub-optimally based on, for example, present bias. Similarly, biases in judgment and 33 Nevertheless, incorporating such ex ante information (when available) into the analysis may be desirable in applications where there is reason to suspect that even consistent decision-makers are choosing sub-optimally, so that the consistency principle fails. 34 Another interesting example is [Handel and Kolstad, 2013]. These authors explicitly make an assumption about the relationship between risk preference and information about insurance which is a parametric version of what we call conditional decision-quality independence. 40 perception – such as over-optimism or a tendency to underweight low-risk events – may manifest themselves consistently across frames. Accurately identifying preferences in such contexts requires moving further away from observed choice behavior, along the lines of the models proposed in Rubinstein and Salant [2012]. Another limitation is that the ordinal preferences over menu objects that our approach identifies may not be the only preferences that are welfare-relevant in a particular application. For example, the analysis in Section 8 indicated that the optimal choice of frame may depend on the relative magnitude of utility costs incurred by choosing “against the frame.” Without further assumptions, our approach cannot identify costs of this nature from choice data. Put differently, we provide methods for identifying one normatively-relevant type of preference; in some contexts, other types of preferences will be relevant as well. Finally, we have focused on the binary choice setting to build intuition about preference identification over relatively simple choices. Apart from the generalization to ordered discrete choices considered here, more work remains to be done on preference identification in more complicated choice settings. For example, with more than two frames, an outside observer would need to impose additional structure beyond frame monotonicity to recover preferences from observed choices. Further work on harder identification problems should explore what may be gained by imposing such structure, perhaps drawing from recent work on revealed attention [Masatlioglu et al., 2012] or salience [Chetty et al., 2009]. Nonetheless, the basic approach we outline here should provide guidance in more complicated settings as well: as long as an outside observer can conclude that certain choices made under certain frames accurately reveal the preferences of a subpopulation of decision-makers, outside observers may gain insight by examining endogenous selection into that sub-population. References Alberto Abadie. Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics, 113(2):231–263, 2003. Hunt Allcott and Dmitry Taubinsky. The lightbulb paradox: Evidence from two randomized experiments. Working paper, National Bureau of Economic Research, 2013. Joshua Angrist and Ivan Fernandez-Val. Extrapolate-ing: External validity and overidentification in the late framework. Working paper, National Bureau of Economic Research, 2010. Robert B Barsky, F Thomas Juster, Miles S Kimball, and Matthew D Shapiro. Preference parameters and behavioral heterogeneity: An experimental approach in the health and retirement study. The Quarterly Journal of Economics, 112(2):537–579, 1997. 41 Daniel J Benjamin, Miles S Kimball, Ori Heffetz, and Alex Rees-Jones. What do you think would make you happier? what do you think you would choose? The American economic review, 102(5):2083, 2012. B Douglas Bernheim. Behavioral welfare economics. Journal of the European Economic Association, 7(2-3): 267–319, 2009. B Douglas Bernheim and Antonio Rangel. Beyond revealed preference: choice-theoretic foundations for behavioral welfare economics. The Quarterly Journal of Economics, 124(1):51–104, 2009. John Beshears, James J Choi, David Laibson, and Brigitte C Madrian. How are preferences revealed? Journal of Public Economics, 92:1787–1794, 2008. Bart Bronnenberg, Jean-Pierre Dube, Matthew Gentzkow, and Jesse Shapiro. Do pharmacists buy bayer? sophisticated shoppers and the brand premium. Working paper, Yale University, 2013. Gabriel D Carroll, James J Choi, David Laibson, Brigitte C Madrian, and Andrew Metrick. Optimal defaults and active decisions. The quarterly journal of economics, 124(4):1639–1674, 2009. Raj Chetty, Adam Looney, and Kory Kroft. Salience and taxation: Theory and evidence. American Economic Review, 99(4):1145–1177, 2009. John Conlisk. Why bounded rationality? Journal of economic literature, 34(2):669–700, 1996. Angus Deaton. The financil crisis and the well-being of americans. 2012. Baruch Fischhoff. Value elicitation: is there anything in there? American Psychologist, 46(8):835, 1991. Marc Fleurbaey and Erik Schokkaert. Behavioral welfare economics and redistribution. American Economic Journal: Microeconomics, 5(3):180–205, 2013. Jacob Goldin. Optimal tax salience. Unpublished working paper, SSRN, 2014. Jacob Goldin and Tatiana Homonoff. Smoke gets in your eyes: Cigarette tax salience and regressivity. American Economic Journal: Economic Policy, 2013. Jacob Goldin and Daniel Reck. Survey response inconsistency. Technical report. Benjamin R Handel and Jonathan T Kolstad. Health insurance for ”humans”: Information frictions, plan choice, and consumer welfare. Working paper, National Bureau of Economic Research, 2013. Jerry A Hausman. Specification tests in econometrics. Econometrica: Journal of the Econometric Society, pages 1251–1271, 1978. 42 James J Heckman. Sample selection bias as a specification error. Econometrica, pages 153–161, 1979. James J Heckman and Edward Vytlacil. Structural equations, treatment effects, and econometric policy evaluation1. Econometrica, 73(3):669–738, 2005. Guido W Imbens and Joshua D Angrist. Identification and estimation of local average treatment effects. Econometrica: Journal of the Econometric Society, pages 467–475, 1994. Eric J Johnson, Steven Bellman, and Gerald L Lohse. Defaults, framing and privacy: Why opting in-opting out1. Marketing Letters, 13(1):5–15, 2002. Erin M Johnson and M Marit Rehavi. Physicians treating physicians: Information and incentives in childbirth. Working paper, National Bureau of Economic Research, 2013. Daniel Kahneman, Peter P Wakker, and Rakesh Sarin. Back to bentham? explorations of experienced utility. The Quarterly Journal of Economics, 112(2):375–406, 1997. Botond Köszegi and Matthew Rabin. A model of reference-dependent preferences. Quarterly journal of economics, 121(4), 2006. Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y Ozbay. Revealed attention. The American Economic Review, 102(5):2183–2205, 2012. Sendhil Mullainathan and Eldar Shafir. Scarcity. Times Books, 2013. Patrick Puhani. The heckman correction for sample selection and its critique. Journal of economic surveys, 14(1):53–68, 2000. Daniel Reck. Taxes and mistakes: What’s in a sufficient statistic? Unpublished working paper, SSRN, 2014. Ariel Rubinstein and Yuval Salant. Eliciting welfare preferences from behavioural data sets. The Review of Economic Studies, 79(1):375–387, 2012. Yuval Salant and Ariel Rubinstein. (a, f): Choice with frames. The Review of Economic Studies, 75(4): 1287–1296, 2008. Norbert Schwarz and Gerald Clore. Mood, misattribution, and judgments of well-being: Informative and directive functions of affective states. Journal of Personality and Social Psychology, 45:512–523, 1987. Strack F. Kommer D. & Wagner D Schwarz, N. Soccer, rooms and the quality of your life: Mood effects on judgments of satisfaction with life in general and with specific life-domains. European Journal of Social Psychology, 17:69–79, 1987. 43 Dean Spears. Economic decision-making in poverty depletes behavioral control. B.E. Journal of Economic Analysis and Policy, 11, 2011. Wieland Muller Syngjoo Choi, Shachar Kariv and Dan Silverman. Who is (more) rational? Working Paper, 2013. Richard H Thaler and Cass R Sunstein. Nudge: Improving decisions about health, wealth, and happiness. Yale University Press, 2008. De-Min Wu. Alternative tests of independence between stochastic regressors and disturbances. Econometrica: journal of the Econometric Society, pages 733–750, 1973. A Proofs of Propositions Proof of Proposition 2 By the law of iterated expectations, we can write: E[φi ] = E[φi |ψi = 1] p(ψi = 1) + E[φi |ψi = 0] p(ψi = 0) Because φi ∈ {0, 1} ∀i, we have E[φi |ψi = 0] ∈ [0, 1]. Consequently, p(ψi = 0) ≥ 0 implies E[φi |ψi = 1] p(ψi = 1) ≤ E[φi ] ≤ E[φi |ψi = 1] p(ψi = 1) + p(ψi = 0) From Proposition 1, we have E[φi |ψi = 1] = YA ≡ E[yi |dx ] E[yi |dx ]+1−E[yi |dy ] and p(ψi = 1) = E[yi |dx ] + 1 − E[yi |dy ]. Applying these results to the above equation yields E[yi |dx ] ≤ E[φi ] ≤ E[yi |dx ] + p(ψi = 0) The result then follows from noting that p(ψi = 0) = 1 − p(ψi = 1) = E[yi |dy ] − E[yi |dx ]. Proof of Proposition 5 Proof of (5.1) By the law of iterated expectations, E(φ) = Σw E[φ|w] Pw Repeating the proof of Proposition 1 while conditioning on w yields YA (w) = E[yi |dx , w] cov(φi , ψi |wi ) = E[φi |wi ] + E[yi |dx , w] + 1 − E[yi |dy , w] E[ψi |wi ] 44 (40) Applying (17) and substituting the resulting expression into (40) yields the intended result. Proof of (5.2) First, note that the law of iterated expectations gives E[φi |ψi = 0] = Σw E[φi |ψi = 0, w]p(w|ψi = 0) (41) We will complete the proof by showing that (1) YA (w) = E[φi |ψi = 0, w], and (2) Sj = p(wi = j|ψi = 0). Lemma 1 : YA (w) = E[φi |ψi = 0, w] By the definition of conditional probability, E[φi |ψi = 0, wi ] = p(φi = 1|ψi = 0, wi ) = E(φi |ψi = 0, wi ) = p(ψi = 0, φi = 1|wi ) p(ψi = 0|wi ) E[(1 − ψi )φi |wi ] E[1 − ψi |wi ] (42) First, we focus on the numerator of this expression. E[(1−ψi )φi |wi ] = E[φi |wi ]−E[ψi φi |wi ]. Using the identity that E[ψi φi |wi ] = E[ψi |wi ]E[φi |wi ]+cov(ψi , φi |wi ), along with the conditional decision-quality independence assumption (17), lets us write E[(1−ψi )φi |wi ] = E[φi |wi ]−E[ψi |wi ]E[φi |wi ] = E[φi |wi ] (1 − E[ψi |wi ]). Substituting this result into (42) yields E(φi |ψi = 0, wi ) = E[φi |wi ]. From Prop 4.1, we also have that WA (w) = E[φi |w], which completes the proof of Lemma 1. Lemma 2: Sj = p(wi = j|ψi = 0) First, using Bayes Rule, we have p(wi = j|ψi = 0) = p(ψi = 0|wi = j) Pj p(ψi = 0) (43) From the Corollary to Proposition 1, we have p(ψi = 1) = E[yi |dx ] + 1 − E[yi |dy ]. Hence p(ψi = 0) = 1 − p(ψi = 1) = E[y|dy ] − E[y|dx ]. Additionally, repeating the proof of the Corollary while conditioning on w yields p(ψi = 1|wi = j) = E[yi |dx , wi = j] + 1 − E[yi |dy , wi = j], so that p(ψi = 0|wi = j) = E[yi |dy , wi = j] − E[yi |dx , wi = j]. Substituting these results into (43) yields p(wi = j|ψi = 0) = E[yi |dy , wi = j] − E[yi |dx , wi = j] Pj E[yi |dy ] − E[yi |dx ] which is the definition of Sj . Proof of (5.3) This statement is identical to Lemma 2. 45 Proof of (5.4) First, write p(w) = p(w|ψ = 1)p(ψ = 1) + p(w|ψ = 0)p(ψ = 0) Applying Proposition 1, (5.3), and the definition of Pw , we have: Pw = p(w|ψ = 1) (E[y|dx ] + 1 − E[y|dy ])+ E[yi |dy ,w]−E[yi |dx ,w] P (E[yi |dy ] − E[yi |dx ]) w E[yi |dy ]−E[yi |dx ] Rearranging terms yields the intended result: p(w|ψ = 1) = E[y|dx , w] + 1 − E[y|dy , w] Pw E[y|dx ] + 1 − E[y|dy ] Proof of Proposition 3 Note that we can re-write the numerator of (10) as E[φi |ψi = 1] p(ψi = 1) = p(φi = 1|ψi = 1) p(ψi = 1) = p(φi = 1 , ψi = 1)= E[φi ψi ]. Exploiting the identity E[φi ψi ] = E[φi ] E[ψi ] + cov(φi , ψi ) and re-writing yields (2.1). By the law of iterated expectations, E[φi ] = p(ψi = 1)E[φi |ψi = 1] + p(ψi = 0)E[φi |ψi = 0]. Substituting this into (12) and applying Proposition 1 yields (2.2). Proof of Proposition 6 Step 0: Decision-quality monotonicity (23) allows us to partition the population into three groups based on whether they optimize. The Always-optimizers (A), the Never-Optimizers (N), and the Contingent-optimizers (C). Group A will have ψ(zh ) = ψi (zl ) = 1. Group N will have ψi (zh ) = ψi (zl ) = 0. Group C will have ψi (zl ) = 0 and ψi (zh ) = 1. Denote the share of each group in the population by πA , πN , and πC . Note that (23) rules out the possibility that ψi (zl ) = 1 but ψi (zh ) = 0. Note also that πC = p(ψ(zh ) = 1, ψ(zl ) = 0) = E[ψi (zh )(1 − ψi (zl )] = E[ψ(zh )] − E[ψ(zh )ψ(zl )] = E[ψ(zh )] − E[ψ(zl )], where the last equality follows from (23).35 By the existence of contingent optimizers (24), πC > 0. Step 1: Assume the biasing frame is dx . By frame separability (18), frame monotonicity (21) and the consistency principle (20), for any z ∈ {zh , zl }, ci (X, dx , z) = y ⇐⇒ ψi (z) = 1, φi = 1 Consequently, p(yi = 1|dx , z) = E[yi |dx , z] = E[φi ψi (z)|z] By the law of iterated expectations, E[yi |dx , z] = E[φi ψi (z)|ψi (zh ) = 1, ψi (zl ) = 1]πA +E[φi ψi (z)|ψi (zh ) = 1, ψi (zl ) = 0]πC +E[φi ψi (z)|ψi (zh ) = 0, ψi (zl ) = 0]πN , 35 Specifically, (23) guarantees that ψ(zh ) = 1 and ψ(zl ) = 1 occur if and only if ψ(zl ) = 1. 46 By direct calculation on this formula where z = zh or z = zl , we have that E[yi |dx , zh ] = φ̄A πA E[yi |dx , zl ] = φ̄A πA + φ̄C πC where E[φi |ψi (zh ) = 1, ψi (zl ) = 1] ≡ φ̄A , E[φi |ψi (zh ) = 1, ψi (zl ) = 0] ≡ φ̄C . Step 2: Assume the frame is dy . We proceed similarly to the previous step. By (18), (21) and (20), for any z ∈ {zh , zl }, ci (X, dy , z) = x ⇐⇒ ψi (z) = 1, φi = 0 Consequently, p(yi = 0|dy , z) = 1 − E[yi |dx , z] = E[(1 − φi )ψi (z)|z] Using the same approach as the in the previous step, we use the law of iterated expectations and the definitions of groups A, C, and N to write 1 − E[yi dx , zh ] = 1 − E[ψ(zh )] + φ̄A πA 1 − E[yi dx , zl ] = 1 − E[ψ(zl )] + φ̄A πA + φ̄C πC Step 3: Now we construct the statistic YC , YC ≡ E[yi |dx , zh ] − E[yi |dx , zl ] E[yi |dx , zh ] + (1 − E[yi |dy , zh ]) − {E[yi |dx , zl ] + (1 − E[yi |dy , zl ])} Substituting for the expressions for E[yi |dx , z] and 1−E[yi dx , z] in this expression and simplifying, we obtain YC = φ̄C πC E[ψ(zh )] − E[ψ(zl )] Using the previously derived fact that πC = E[ψ(zh )] − E[ψ(zl )], we have YC = φ̄C . Proof of Corollary 6.1: Fix some zk and zm . Note that the first three assumptions here will imply that assumptions (23), (24), and (25) for Proposition 6 obtain. Proposition 6 then implies that Wk,m = φ̄k,m . The result that φ0 = E[φi |ψi (z0 ) = 1] follows directly from Proposition 1. Proof of Proposition 7 Suppose that decision-quality independence is satisfied. Then by Proposition 1, E[yi |dx ,zh ] E[yi |dx ,zh ]+1−E[yi |dy ,zh ] = E[φi ] = φ and E[yi |dx ,zl ] E[yi |dx ,zl ]+1−E[yi |dy ,zl ] 47 = φ̄. By Proposition 6, E[yi |dx , zh ] − E[yi |dx , zl ] = E[φi |ψi (zh ) = 1, ψi (zl ) = 0]. E[yi |dx , zh ] + (1 − E[yi |dy , zh ]) − {E[yi |dx , zl ] + (1 − E[yi |dy , zl ])} Under decision-quality independence, E[φi |ψi (zh ) = 1, ψi (zl ) = 0] = φ̄. Proof of Proposition 8: First, note that for any z ∈ {0, 1}, E[ψi |z] = p(εi > −P −θz) = 1−Φ(−P −θz) = Φ(P + θz). Note that our setup satisfies the conditions in Proposition 1 for each z, so that we can measure E[ψi |z] by ψ(z) ≡ E[yi |dx , z] + 1 − E[yi |dy , z]. Thus we have ψ(z) = Φ(P + θz) for z ∈ {0, 1}. This implies that P = Φ−1 (ψ(0)) θ = Φ−1 (ψ(1)) − Φ−1 (ψ(0)) . Similarly, note that E[φi |ψi (z) = 1] = p(εi > −P − θzi |νi > −M ) = 1 = ψ(z) ˆ ˆ ∞ p(εi >−P −θzi , νi >−M ) p(εi >−P −θzi ) ∞ φBV SN (ε, ν; ρ)∂ν∂ε −P −θz −M where φBV SN (a, b; ρ) is the density function for a bivariate standard normal with correlation coefficient ρ and evaluated at (a, b). By applying Proposition 1 for each z we also have that YA (z) ≡ E[y|dx , z] = E[φi |ψi (z)] E[y|dx , z] + 1 − E[y|dy , z] Combining these results yields YA (z) = 1 ψ(z) ˆ ∞ ˆ ∞ φBV SN (ε, ν; ρ)∂ν∂ε −P −θz −M for z = 0, 1. With two equations, we can solve for the two remaining unknowns (M and ρ). (6.2) and (6.3) follow directly from the distributions of normally distributed and jointly normally distributed random variables. Proof of Lemma 1: Let ni (ψ) indicate whether individual i optimizes when the overall fraction of opti- mizers is ψ. That is, ni (ψ) = 1 ⇐⇒ ψi (F −1 (ψ)) = 1 ⇐⇒ zi∗ ≤ F −1 (ψ) 48 Let h(z) ≡ E[φi |zi∗ = z] = p(φi = 1|zi∗ = z), be the average preferences for the marginal optimizers at z. The object of interest is E[φi |ψi (z) = 1] = E[φi |ni (ψ) = 1], which we can write as ´ z0 =F −1 (ψ) E[φi |zi∗ ≤F −1 (ψ)] == p (φi = 1 , zi∗ = z 0 ) ∂z 0 p zi∗ ≤ F −1 (ψ) z 0 =z Using the definition of conditional probability and the definition of F , this becomes ´ z0 =F −1 (ψ) = z 0 =z h(z 0 )f (z 0 )∂z 0 (44) ψ Now we can use a change of variables, using F (z 0 ) = ψ̄ 0 and f (z 0 )dz = dψ̄ 0 , to write the numerator as ˆ ˆ z 0 =F −1 (ψ) 0 0 0 ψ =ψ 0 h(z )f (z )∂z = z 0 =z 0 0 g(ψ )∂ψ 0 ψ =0 where g(ψ̄) = h(F −1 (ψ̄)). Now, we can approximate g(ψ) as a polynomial of degree D using Taylor’s theorem: D g(ψ) ≈ a0 + a1 ψ + ... + aD ψ ,36 Substituting this into equation (44) and evaluating the integral yields 1 1 1 2 D E[φi |ψi (z) = 1] ≈ a0 + a1 ψ + a2 ψ + ... + ak ψ 2 3 k+1 Letting bj ≡ aj j+1 , we obtain the desired result. Proof of Proposition 9: By Proposition 1, given any z, ψ̄(z)=E[ψi (z)] = E[yi |dx ] + 1 − E[yi |dy ] YA (z) = E[φi |ψi (z) = 1] By equation (28) evaluated at zh and then at zl E[φi |ψi (zh ) = 1] = α + β ψ̄(zh ) (45) φ̄A = E[φi |ψi (zl ) = 1] = α + β ψ̄(zl ) (46) 36 We know these constants a ....a exist and Taylor’s Theorem applies by the assumption that F (z) and φ (z) = E[φ |z ∗ = z] 0 i i D C are D-times differentiable, along with familiar properties of the derivatives of inverse functions and composite functions. Taylor’s theorem also indicates that this approximation will be accurate as ψ̄ becomes large, which is intuitive. We will for the moment ignore the issue of bounding the accuracy of the approximation, which for practical purpose may be relevant. 49 Derivation of 7.1 Solving (45) and (46) for α and β yields β= α= YA (zh ) − YA (zl ) ψ̄(zh ) − ψ̄(zl ) ψ̄(zh )YA (zl ) − ψ̄(zl )YA (zh ) ψ̄(zh ) − ψ̄(zl ) Note that, since ψi = 1 for all individuals when ψ̄ = 1, E[φ] = α + β. (47) E[φ|ψi (z) = 0, z] = α + β[1 − ψ̄(z)] (48) Derivation of 7.2 Equation (28) also implies37 Substituting for α and β in equations (47), (28), and (48) and re-arranging yield the desired results. Derivation of 7.3 and 7.4 Divide the population into always-optimizers, never-optimizers and contingent-optimizers as in Proposition 6. By the law of iterated expectations on E[φi |ψi (zh ) = 1] we can write (πC + πA )E[φi |ψi (zh ) = 1] = φ̄C πC + φ̄A πA Substituting what we know about πA and πC , (see the proof of Proposition 4), we can write this as E[φi |ψi (zh ) = 1]ψ̄(zh ) = φ̄C ∆ψ̄ + φ̄A ψ̄(zl ) (49) where ∆ψ̄ = ψ̄(zh ) − ψ̄(zl ). Plugging (45) and (46) into (49) yields [α + β ψ̄(zh )] = ψ̄(zh ) = φ̄C ∆ψ̄ + [α + β ψ̄(zl )]ψ̄(zl ) 37 Proof : By the law of iterated expectations, φ̄ = E[φ|ψi (z) = 1]ψ̄(z) + E[φ|ψi (z) = 0](1 − ψ̄(z)) Equation (28) and the fact that φ̄ = α + β imply equation (48). 50 (50) If we use (50) and (45) to solve for α and β, noting that E[φ|ψi (zh ) = 1] = YA (zh ) and φ̄C = YC we obtain α = YA (zh ) − β= ψ̄(zh ) [YC − YA (zh )] ψ̄(zl ) YC − YA (zh ) ψ̄(zl ) As in the previous proposition, we use that E[φ] = α + β to arrive at equation (32). If instead we use (50) and (46) to solve for α and β, we obtain α = YA (zl ) − β= ψ̄(zl ) [YC − YA (zl )] ψ̄(zh ) YC − YA (zl ) ψ̄(zh ) and by adding these two together we obtain equation (33). Proof of Proposition 10 Note that the planner’s problem above is equivalent to the following max d∈{dx dy },z∈Z p(ci (X, d, z) = φi ) − γp(ci (X, d, z) 6= d}) − 1 κ(z) N Since z is fixed by assumption, the solution to the planner’s problem simplifies to the comparison of the objective function evaluated at dx and dy . We will have that dy is superior if and only if p(ci (.) = φi |d = dy ) − γp(ci (.) 6= y|d = dy ) > p(ci (.) = φi |d = dx ) − γp(ci (.) 6= x|d = dx ) {z } | {z } | {z } | {z } | 1 3 2 (51) 4 We next derive each of these probabilities. Term 1 of Equation (52) By the law of iterated expectations, p(ci (.) = φi |d = dy ) = p(ci (.) = φi |d = dy , ψi = 1)p(ψi = 1|d = dy )+p(ci (.) = φi |d = dy , ψi = 0)p(ψi = 0|d = dy ) When ψi = 1, ci (X, dy , z) = φi always, by the consistency principle. So, p(ci (.) = φi |d = dy , ψi = 1) = 1 51 When ψi = 0, and d = dy , ci (X, dy , z) = φi ⇐⇒ φi = 1 by frame monotonicity. So p(ci (.) = φi |d = dy , ψi = 0) = p(φi = 1|dy , ψi = 0) By unconfoundedness and frame separability, p(φi = 1|dy , ψi = 0) = p(φi = 1|ψi = 0). By unconfoundedness, p(ψi = 1|d = dy ) = p(ψi = 1) and p(ψi = 0|d = dy ) = p(ψi = 0). Collecting terms, we have that the first term of (51) is p(ci (.) = φi |d = dy ) = p(ψi = 1) + p(φi = 1|ψi = 0)p(ψi = 0) Term 3 of Equation 51 We obtain this term symmetrically to the first term. By the law of iterated expectations p(ci (.) = φi |d = dx ) = p(ci (.) = φi |d = dx , ψi = 1)p(ψi = 1|d = dx )+p(ci (.) = φi |d = dx , ψi = 0)p(ψi = 0|d = dx ) When ψi = 1 ci (.) = φi always, by the consistency principle. When ψi = 0 and d = dx , ci (.) = φi ⇐⇒ φi = 0 by frame monotonicity, thus: p(ci (.) = φi |d = dy , ψi = 0) = p(φi = 0|dy , ψi = 0) By unconfoundedness and frame separability, p(φi = 0|dy , ψi = 0) = p(φi = 0|ψi = 0). Finally, we apply unconfoundedness to p(ψi = 0|d = dx ) and p(ψi = 0|d = dx ) to obtain that the third term of (51) is: p(ci (.) = φi |d = dx ) = p(ψi = 1) + p(φi = 0|ψi = 0)p(ψi = 0) Term 2 of Equation 51 By the law of iterated expectations, γp(ci (.) 6= y|d = dy ) = γ[p(ci (.) 6= y|d = dy , ψi = 1)p(ψi = 1|d = dy )+p(ci (.) 6= y|d = dy , ψi = 0)p(ψi = 0|d = dy )] Since ci (X, dy , z) = φi when ψi = 1 by the consistency principle, we have ci (X, dy , z) = y ⇐⇒ φi = 1. So p(ci (.) = d|d = dy , ψi = 1) = p(φi = 0|ψi = 1, d = dy ) Since ψi = 0 =⇒ ci (X, dy , z) = y by frame monotonicity, we know that p(ci (.) = d|d = dy , ψi = 0) = 0. 52 By unconfoundedness and frame separability, p(φi = 0|d = dy , ψi = 1) = p(φi = 0|ψi = 1) and p(ψi = 1|d = dy ) = p(ψi = 1). So the second term of (51) becomes γp(ci (.) = φi |d = dy ) = γp(φi = 0|ψi = 1)p(ψi = 1) Term 4 of Equation (51) We obtain this term symmetrically to the second term. By the law of iterated expectations γp(ci (.) 6= x|d = dx ) = γ p(ci (.) 6= x|d = dx , ψi = 1)p(ψi = 1|d = dx )+p(ci (.) 6= x|d = dx |ψi = 0)p(ψi = 0|d = dx ) The second term of this equation will be zero by frame monotonicity. By the consistency principle, p(ci (.) 6= x|d = dx , ψi = 1) = p(φi = 1|d = dx , ψi = 1) By unconfoundedness and frame separability, p(φi = 1|d = dx , ψi = 1) = p(φi = 1|ψi = 1) and p(ψi = 1|d = dx ) = p(ψi = 1). So the fourth term becomes γp(ci (.) 6= x|d = dx ) = γp(φi = 1|ψi = 1)p(ψi = 1) Last Step Combining terms and simplifying, we have that dy is optimal if and only if p(φi = 1|ψi = 0)p(ψi = 0) + γp(φi = 0|ψi = 1)p(ψi = 1) > 1 + (γ − 1)p(ψi = 1) 2 Now note that by definition p(φi = 1|ψi = 0) = φ̄N , p(φi = 1|ψi = 1) = φ̄A , p(ψi = 1) = ψ̄(z), and p(ψi = 0) = 1 − ψ̄(z). Substituting these terms yields the desired result. Proof of Proposition 11 When γ = 0 the planner’s objective evaluated at each of the four possible d by z combinations is dy , zl : p(ci (.) = φi |d = dy , z = zl ) − 1 κ(zl ) N dx , zl : p(ci (.) = φi |d = dx , z = zl ) − 1 κ(zl ) N dy , zh : p(ci (.) = φi |d = dy , z = zh ) − 1 κ(zh ) N dz , zh : p(ci (.) = φi |d = dx , z = zh ) − 1 κ(zh ) N 53 In the proof of Proposition 10, we showed that these four expressions can be re-written as dy , zl : p(ψi (zl ) = 1|zl ) + p(φi = 1|ψi (zl ) = 0, zl )p(ψi (zl ) = 0|zl ) − 1 κ(zl ) N dx , zl : p(ψi (zl ) = 1|zl ) + p(φi = 0|ψi (zl ) = 0|zl )p(ψi (zl ) = 0|zl ) − 1 κ(zl ) N dy , zh : p(ψi (zh ) = 1|zh ) + p(φi = 1|ψi (zh ) = 0, zh )p(ψi (zh ) = 0|zh ) − 1 κ(zh ) N dz , zh : p(ψi (zh ) = 1|zh ) + p(φi = 0|ψi (zh ) = 0, zh )p(ψi (zh ) = 0|zh ) − 1 κ(zh ) N By decision-quality monotonicity and the existence of continent optimizers, we can divide the population into always optimizers (A), never optimizers (N) and sometimes optimizers (C), exactly as in Proposition 6. The average preferences in each population are given by φA , φN , and φC , respectively, and the size of each population is given by πA , πN , and πC , respectively. By unconfoundedness with respect to z, p(ψi (zl ) = 1|zl ) = p(ψi (zl ) = 1) = πA , p(ψi (zh ) = 0|zh ) = p(ψi (zh ) = 0) = πN , p(ψi (zh ) = 0|zh ) = p(ψi (zh ) = 0) = 1 − πA = πN + πC . By the law of iterated expectations p(φi = 1|ψi (zl ) = 0, zl ) = p(φi = 1|ψi (zl ) = 0, ψi (zh ) = 0)p(ψi (zh ) = 0|ψi (zl ) = 0)+p(φi = 1|ψi (zl ) = 0, ψi (zh ) = 1)p(ψi (zh ) = which using the definition of conditional probability and various π’s and φ’s will yield p(φi = 1|ψi (zl ) = 0, zl ) = p(φi = 0|ψi (zl ) = 0, zl ) = φN πN + φC πC πN + πC (1 − φN )πN + (1 − φC )πC πN + πC Our four conditions simplify to dy , zl : πA + φN πN + φC πC − 1 κ(zl ) N dx , zl : πA + (1 − φN )πN + (1 − φC )πC − dy , zh : πA + πC + φN πN − 1 κ(zl ) N 1 κ(zh ) N dx , zh : πA + πC + (1 − φN )πN − 1 κ(zh ) N (52) (53) (54) (55) Note that the first two terms in each of these will be the total number of individuals who receive their 54 preferred option when the planner chooses that (d, z) combination. First, consider situations where the planner chooses dy regardless of z. This requires (52)¿(53) and (54)¿(55), which simplify to the first two conditions in (1a) and (4a). The planner will set zh if (54)¿(52), which simplifies to ∆κ < (1 − φC )πC , which yields (4a). With the inequality reversed, we get (1a). Second, consider situations where the planner chooses dx regardless of z. This requires (52)¡(53) and (54)¡(55), which simplify to the first two conditions in (2a) and (3a). Then the planner chooses zh if (55)¿(53), which simplifies to ∆κ < φC πC . This yields the final condition in (2a) and (3a). Third, consider the situation where the planner would want to choose dy under zh and dx under zl . This requires (52)¡(53) and (54)¿(55), which provides the first two conditions in (1b) and (3b). In this situation, the planner chooses zh if (54)¿(53) and zl otherwise. Performing this comparison, we have that the planner chooses zh if ∆κ < φC πC +(2φN −1)πN , which is the final condition in (3b). When the inequality is reversed, we obtain the final condition in (1b). Finally, consider the situation where the planner would want to choose dx under zh and dy under zl . This requires (52)¿(53) and (54)¡(55), which provide the first two conditions in (2b) and (4b). In this situation, the planner chooses zh if (55)¿(52). Comparing these, we see that the planner chooses zh if ∆κ < (1 − φC )πC + (1 − 2φN )πN ,which is the final condition in (4b). When the inequality is reversed, we obtain the final condition in (2b). Proof of Proposition 12 Proof of (12.1) and (12.2) Fix some k ∈ {1, ..., K − 1}. Let X 0 = {x1 , ...xk }and X 00 = {xk+1 , .., XK } Note that we can write the many-choices problem into a binary menu choice problem between X 0 and X 00 . Similarly, note that frame separability (35), frame monotonicity (36), partition consistency (37)/(38), and partition unconfoundedness (39) imply the binary analogues to these assumptions: (1), (3), (2), and (5). As such, (10.1) and (10.2) follows directly from the application of Proposition 1 to this problem. Proof of (10.3) First suppose that k = 1. Applying Proposition 2 to the binary menu choice problem with X 0 = {x1 } and X 00 = {x2 , ..., xK } implies that E[φ1 ] ∈ [Gl (1), Gh (1)] (56) Note that this confirms the desired result for k = 1 since Gh (0) = Gl (0) = 0 by definition. Next, applying the same proposition for k = 2, we have φ1 + φ2 ∈ [Gl (2), Gh (2)]. Combined with (56), this implies φ2 ∈ [Gl (2) − Gh (1), Gh (2) − Gl (1)] 55 (57) Similarly with k = 3, we have that φ1 + φ2 + φ3 ∈ [Gl (3), Gh (3)], and applying (56) and (57) implies that φ3 ∈ [Gl (3) − Gh (2), Gh (3) − Gl (2)]. Proceeding recursively, suppose that for some k, we know that for k 0 < k, φk0 ∈ [Gl (k 0 ) − Gh (k 0 − 1), Gh (k 0 ) − Gl (k 0 − 1)] (58) Then application of proposition 3 to the binary menu choice problem with X 0 = {x1 , ..., xk } yields φ1 + φ2 + ... + φk ∈ [Gl (k), Gh (k)], so φk ∈ [Gl (k) − (φ1 + φ2 + ... + φ̄k+1 ), Gh (k) − (φ1 + φ2 + ... + φ̄k+1 )]. Applying the lower and upper bounds from (58) and simplifying yields the desired result. Proof of (10.4) Along with (10.1), strong decision-quality independence implies that for any k, P (yi∗ ≤ k|ψik = 1) = P (yi∗ ≤ k) = Yk (59) φ̄1 = Y1 (60) Applying (59) at k = 1 yields Applying (59) at k = 2 yields φ̄1 + φ̄2 = Y2 and substituting equation (60) yields φ̄2 = Y2 − Y1 As in the Proof of (10.3), we proceed recursively to obtain the desired result. Given some k, suppose that for any k 0 < k we have φ̄k0 = Yk0 − Yk0 −1 (61) Applying (59) at k yields φ̄1 + φ̄2 + ..., +φ̄k = Yk . Applying (61) for φ̄1 , ..., φ̄k−1 and simplifying yields the desired result. B Positive Models of Framing Effects In this section, we describe several different positive models of frame-sensitivity, and discuss how the various methods described in the body of the paper apply to each. We proceed roughly from models imposing the least rationality on choices, such as models where the variation in whether individuals optimize depends solely on individual characteristics unrelated to the choice at hand, to models imposing complete rationality, in which framing effects stem from the presence of neoclassical transaction costs. Any of these models could potentially explain observed framing effects. Because each model satisfies 56 the assumptions of frame-monotonicity, the irrelevance of frames for preferences over menu items, and that consistent choices reveal preferences, the methods described in the paper may be applied to each as well. But as is generally the case, the parameters identified by our reduced form techniques depend on which model is generating behavior. In all cases, we assume each decision-maker (DM) i chooses from a menu {x, y}. DM’s valuations of the two options are given by ui (x) and ui (y), and we write ui (y) − ui (x) ≡ ūi .38 We continue to denote φi = I{ui (y) > ui (x) and ψi = I{c(X, dx ) = c(X, dy )}. We denote the frame facing DM i by dij ∈ {dx , dy }. B.1 Optimization Based on Individual Characteristics In this model, decision-makers optimize whenever the costs of optimizing Ci are below a threshold value C. Decision-makers who optimize make the same (optimal) choice regardless of the frame whereas decisionmakers who do not optimize choose according to the frame (they select x under dx and y under dy ).39 What we call the costs of optimizing are very general: Ci may reflect the decision-maker’s expertise (or ignorance) in the choice being made, the opportunity cost of attention, the cognitive cost of expending mental effort on the decision, or psychological susceptibility to the frame. In contrast to later models, the variation in whether decision-makers optimize (i.e. the variation in Ci ) is driven by variation in individual characteristics among decision-makers as opposed to the specific benefits to optimizing in the particular decision at hand. Assuming that Ci is distributed in the population with a cumulative distribution function G(.), we will have that E[ψi ] = G(C̄). Whether decision-quality independence holds in this model depends on the empirical correlation between the determinants of optimization behavior, Ci , and individuals’ preferences, represented by ui . In particular, we will have cov(ψi , φi ) = 0 ⇐⇒ E[Ci |ui ≥ 0] = E[Ci |ui < 0]. Thus a sufficient condition for decisionquality independence is if Ci is distributed independently of ui .40 Assessing decision-quality independence in the context of such models thus requires considering the individual characteristics associated with framesensitivity and whether those same characteristics are also associated with preferences over x and y. Whether these conditions hold will depend upon the application. For example, with regard to choices over retirement savings plans, preferences over savings may be correlated with financial literacy, which may also be correlated with the latent characteristics driving the variation in optimization (such as cognitive ability). In such settings, the matching approach of Proposition 2 is most likely to succeed when one can observe the individual 38 The statistics in the body of the paper are concerned only with ordinal preferences of individuals, but in some models differences in relative utility between individuals will drive some individuals to optimize. Comparing optimizing individuals to frame-sensitive individuals in these models requires a utility concept comparable across individuals. 39 Note that this directly imposes the assumption of frame monotonicity and that consistent choices reveal preferences. 40 Note that this condition is sufficient but not necessary. For example, decision-quality independence will hold when the joint distribution of ui and Ci is symmetric around ui = 0. This may hold, for example, when Ci is correlated with the utility “stakes” of the decision, |ui |, as explored in the next model. 57 characteristics driving the endogeneity problem, such as financial literacy in the previous example. When a decision-quality instrument, such as a treatment offering information on the choice, is available, this setting readily admits testing of decision-quality independence or conditional decision-quality independence using decision-quality state variation, as in Proposition 7. Furthermore, when ūi and Ci are correlated and appropriate observables for a matching estimator are unavailable, models like the one in Section 7 provide a natural way to examine the joint distribution of the two variables. With a joint normal distribution of ūi and log(Ci ) and a homogeneous effect of a change in z on log(Ci ), this becomes exactly identical to the latent variable model outlined in that section, so we can trace out the propensity to optimize as a function of the fraction of optimizers at a given level of z, and extrapolate to recover the full distribution of ūi and Ci . B.2 Revealed Attention Using the approach of Masatlioglu et al. [2012], we show here how the assumptions of frame monotonicity and the consistency principle are implied by an intuitive assumption in a revealed attention framework. Assume that an individual pays attention only to some subset of the menu X, but that she maximizes her preferences over the alternatives she notices. In order to incorporate framing effects from variation in the choice situation (which does not come from variation in X itself), we must specify an attention filter Γ which depends on dj . Denote the attention filter by Γ(X, dj ).41 Given a utility function representation of individual i’s preferences, ui (.), we could write the consumer’s choice as the solution to the utility-maximization problem restricted to Γi (X, dij ). max c∈Γi (X,dij ) ui (c) (62) Claim: When X = {x, y} is binary, frame monotonicity and the consistency principle will be satisfied if the individual always pays attention to the option favored by the frame. Formally, ∀i, x ∈ Γi (X, dx ) andy ∈ Γi (X, dy ) (63) Proof: Suppose condition (63) is satisfied. We proceed in two cases. First, suppose that y(dy ) = 0, i.e. c(X, dy ) = x. Since c(X, dy ) ∈ Γ(X, dj ) by (62), we must have x ∈ Γ(dy ), which together with condition (63) implies Γ(X, dy ) = {x, y}. By (62), u(x) > u(y). Given that x ∈ Γ(X, dx ) ⊆ {x, y} by (63) again, we know that y(dx ) = 0. Second, suppose that y(dx ) = 1. Then we must have y ∈ Γ(X, dx ). Similar to before, 41 It will have the property that ∀X, Γ (X, d) = Γ (X\x, d) whenever x ∈ / Γi {X, d}. Masatlioglu et al’s assumption is not i i directly relevant in our setting because we examine binary choices. However, in the non-binary case it will place additional restrictions on when preferences are revealed by choices in the presence of frame monotonicity. 58 we know that Γ(X, dy ) = {x, y}. So u(y) > u(x). Then since y ∈ Γ(X, dy ) ⊆ {x, y}, y(dy ) = 1. These two conditions are sufficient to prove frame monotonicity, ∀i, y(dy ) ≥ y(dx ).42 Note that whenever c(X, dx ) = y, we also have c(X, dy ) = y and u(y) > u(x), and whenever c(X, dy ) = x, we also have c(X, dx ) = x and u(x) > u(y). This guarantees that consistent choices reveal preferences. Granted this assumption (and unconfoundedness and frame separability), the results of this paper will all obtain in the revealed attention framework of Masatlioglu et al. [2012]. Intuitively, when individuals choose y under dx , they are “revealing” that they pay attention to y under dx , since an individual cannot choose an alternative not in the attention set Γ(.). Given the assumption that all individuals also pay attention to the favored choice in a given frame, by choosing y a DM reveals that she prefers x to y. Variation in which individuals are consistent is equivalent to variation across individuals’ in Γi (X, dj ) for each frame dj , which could be endogenized using the similar approaches to those in other sections in this appendix. Proposition 1 will apply. If Γi is independent of preferences φi , we will have that decision-quality independence is satisfied, and if we allow Γi to depend on individual characteristics or if we allow Γi to expand based on a decision-quality instrument, later propositions in the paper may be applied. Finally, note that, like frame monotonicity, property (63) could be tested if we are able to observe individuals’ choices across frames. In addition, note that when we move beyond the binary case, assumption (63) would justify the assumption that active choices reveal a preferences for the chosen option over the default option, but not the stronger assumption that active choices reveal preferences over the entire menu. B.3 Bounded Rationality Models The models in this section work towards stronger and stronger notions of bounded rationality, by which we mean that whether decision-makers optimize depends (in some form) on the gains to doing so. The “meta-optimization” problems used in these models create an infinite regress problem [Conlisk, 1996]. It is likely costly to perform a cost-benefit calculation to decide whether to incur some cost of optimizing, which may lead us to wonder how the individual acquires and processes information to solve the meta-optimization problem. Assumptions on what exogenous knowledge the individuals posses (and account for costlessly) about costs and benefits of optimizing in this and later models bypass the infinite regress issue in an ad hoc fashion. These assumptions are common in the literature, but we take no stance on which, if any, are appropriate. Note that these conceptual difficulties grows more severe the more that decision-makers are assumed to optimize or not based on the true utilities associated with the available menu items in the choice decision at hand. 42 The other two cases, where c(X, dx ) = x and c(X, dy ) = y, will trivially satisfy y(dy ) ≥ y(dx ). 59 B.3.1 Stakes-Based Optimization Assume the DM knows the utility “stakes” of the meta-decision of whether to optimize, |ūi |, and must decide whether to incur the cost of optimizing, Ci . As before, we assume that individuals who pay the cost to optimize select their most-preferred option under both frames whereas individuals who do not optimize select the option associated with the frame under which they choose. To motivate the inclusion of utility stakes into the decision of whether to optimize, consider an employee selecting a retirement savings plan. The employee may know how much selecting the right retirement savings plan matters to her and how costly it is to learn about the menu of plans, but she may not actually know which plan is best for her without incurring the optimization cost. Similarly, an individual may have a general sense that accounting for low-salience taxes when making purchasing decisions is more important for large purchases than for small purchases. Suppose that individuals believe that x is best with probability ωx , which is homogenous for simplicity.43 In the frame dx , DM decides whether to optimize based on the solution to the following problem: max ψi [|ūi | − Ci ] + (1 − ψi )[ωx |ū| − (1 − ωx )|ū|] ψi ∈{0,1} The individual pays the cost to learn the best option only if Ci < 2(1 − ωx )|ūi | Symmetrically, when the frame is dy , the individual pays the cost only if Ci < 2ωx |ūi | The individual is consistent if and only if ψi = 1 ⇐⇒ Ci < min {2(1 − ωx )|ūi |, 2ωx |ūi |} (64) and whenever the individual is consistent, she chooses her preferred option, so the consistency principle applies. The assumption of frame monotonicity is embedded in the optimization problem. As such Propositions 1, 2, and 3 are readily applied to this situation. Decision-quality independence will not hold in general in this setting, since individuals with high |ū| tend to choose consistently and may also be more likely to prefer y. More formally, the event in equation (64) will 43 Letting ω depend on ū would bring this model closer in line with stronger versions of bounded rationality presented in x later sections. 60 not be independent of the event ui > 0 without strong assumptions. One set of such assumptions is that Ci is independent of ū, and the distribution of ū is symmetric about 0. The latter is tantamount to assuming that E[φi ] = 0.5 ex ante. In spite of the likely failure of decision-quality independence, conditional decision-quality independence will hold when one can observe sufficient characteristics to control for both Ci and |ūi |, in which case Proposition 5 will allow us to recover population preferences. That is, under stakes-based optimization models, the observer should control for variation among decision-makers associated with the utility stakes in the underlying decision. For example, in the retirement savings plan context, one could solicit and control for 1) the individual’s knowledge of the definitions of various aspects of retirement plans, and 2) for the self-reported importance of the savings decision to this person. Propositions 1, 2, and 3 are applicable in general for this model. Valid decision-quality instruments consist of any variation in the choice environment that change the cost of optimizing or the perceived stakes of the decision monotonically for all individuals. B.3.2 Optimal Decision Rule Assume DM chooses whether to optimize in a given situation, or a given class of situations she may encounter multiple times. DM knows her preferences ui (x) and ui (y) when she decides whether to optimize, but not when she actually chooses from the menu {x, y}. This phenomenon may arise because when she actually chooses there are some environmental influences that she cannot avoid without paying some cost ex ante (e.g. additional mental effort to stay focused or exercise self-control). Suppose the DM decides whether to optimize given her beliefs about how likely she is to encounter a given framedj and the cost of optimizing Ci . The likelihood of encountering frame dj is homogeneous across decision-makers and denoted αj . The choice of whether to optimize is given by: max (1 − ψi ){αx ui (x) + (1 − αx )ui (y)} + ψi [max{ui (x), ui (y)} − Ci ] ψi ∈{0,1} Note that frame monotonicity is embedded inside the first term in curly brackets. Solution of this optimization problem yields that ψi = 1 ⇐⇒ Ci < min {−(1 − αx )ūi , αx ūi } Decision quality independence will not in general obtain in this model, for similar reasons to the previous model: underlying utility over x and y, ū, affects both ψi and φi . We will not have in general that Cov(φi , ψi ) = 0 without strong assumptions like the symmetry assumption discussed in the previous section. 61 When αx = 0.5, this model becomes exactly like the previous model (with ωx = 0.5), in which case controlling for variation in costs Ci and stakes |ūi | may yield conditional decision-quality independence. Without reason to believe that the DM believes she is equally likely to face either frame, so αx = 0.5, conditional decision-quality independence is unlikely to be satisfied in this case, since “controlling” for how ū enters the optimization decision requires knowing whether ū is positive or negative, i.e. whether φi = 1 or φi = 0. However, Proposition 2 can be applied in this setting to recover bounds on E[φi ]. We could apply Proposition 6 in this setting via a treatment affecting the cost of optimizing Ci monotonically for all individuals, and recover the fraction of individuals who switch to optimizing who prefer y. B.3.3 Framing Effects as Pure Transaction Costs In this model, we assume a fully rational DM is affected by the default option due to a transaction cost. We will analyze this model in somewhat more detail to illustrate the relationship between the structural approach commonly used in the literature and our thinking, as well as the two-sided selection issue alluded to in the body of the paper. We assume that choosing the option that is not the default incurs a transaction cost γi . DM’s choice ci solves max ui (ci ) − γI{ci 6= dij } ci ∈{x,y} When dij = dx , the solution to this problem is given by ci = y ⇐⇒ ūi > γi . When d = dy , the solution is given by ci = y ⇐⇒ −ūi < γi . Note that individuals who choose x under dy prefer x, and they will also choose x when y is the default, so consistent choices reveal preferences and frame monotonicity is satisfied. Note also that although we focus on the case where γi is a real transactions cost, we could also think of this model as a model of bounded rationality similar to the previous one, but where the individual can condition her choice of whether to optimize on the frame.44 We can summarize the three distinct possibilities for the choices of individual i as follows: (ci (X, dx ), ci (X, dy )) = (x, x) (x, y) (y, y) if − ūi > γi if -ūi < γi , ūi < γi (65) if ūi > γi Note that the above implies the consistency principle and frame monotonicity obtain. The two statistics 44 Formally, we would assume that the individual i updates αx in the previous model to αx |dij = I{dij = dx }. 62 studied in the bulk of the paper will be given in this model by φi = 1 ⇐⇒ ūi > 0 ψi = 1 ⇐⇒ ūi ∈ [−∞, −γi ] ∪ [γi , ∞] Decision-quality independence will not generally be satisfied:45 cov(φi , ψi ) = p(u > γi ) − p(ū > 0)p(ūi < −γi or ūi > γi ) which will not generally equal zero.46 Proposition 4 cannot be applied in a model such as this one, which is not surprising: whether an individual is consistent in this model depends strongly on her preferences. However, all the assumptions of Proposition 1-3 are satisfied, so, for example, we can apply Proposition 1: E(ψi ) = E[yi |dx ] + E[yi |dy ] = p(ūi > γi ) + 1 − p(−ūi > γi ) E[φi |ψi = 1] = p(ūi > γi ) E[y|dx ] = . E[y|dx ] + 1 − E[y|dy ] p(ūi > γi ) + 1 − p(−ūi > γi ) It is unlikely that conditioning on observables will help us identify population preferences, for the same reason as in the last section. One exception occurs when variation in benefits is negligible compared to variation in costs, which transforms this model (to some approximation) into the one in the first model presented, in which optimization is a latent characteristic. This transformation is especially useful in situations such as the one where x and y are non-transparently the same good, such as the example of store-brand versus brand name pharmaceuticals with identical chemical components in Bronnenberg et al [2013]. We can use Proposition 2, here, because when γi ≥ 0 for all i, φ̄ = p(ūi > 0) ∈ [p(ūi > γi ), p(ūi > −γi )] = [E[y|dx ], E[y|dy ]]. However, we can identify the preferences of a larger number of decision-makers in the presence of varying decision-quality environments, and changes in transactions cost provide a natural example of such variation. Reductions could be obtained by easing the administrative requirements (such as paperwork) for choosing the non-default option. Suppose that transactions costs change from γi to γi0 ≤ γi , with γi0 < γi for some i. 45 cov(φ 46 In i , ψi ) = E[φi ψi ] − E[φi ]E[ψi ] = p(φi = ψi = 1) − p(φi = 1) p(ψi = 1). particular, this condition holds if and only if p(ui > γi |ui > 0) = p(ui < −γi |ui < 0) 63 Then the conditions for Proposition 6 are satisfied and we will have: (c(X, dx , γ), c(X, dx , γ 0 ), c(X, dy , γ 0 ), c(X, dy , γ)) = (x, x, x, x) if ūi < −γi (x, x, x, y) if ūi ∈ [−γ i , −γi0 ] (x, x, y, y) (x, y, y, y) (y, y, y, y) if ūi ∈ [−γi0 , γi0 ] (66) if ūi ∈ [γi0 , γ i ] if ūi > γi The second and fourth cases correspond to the contingent optimizers whose ordinal preferences are captured by the statistic in Proposition 6. However, the two-sided nature of selection into observables in this model suggest that we might attempt to recover more meaningful parameters, such as those governing the distribution of ūi and γi . The remainder of this section describes such a model. Assume that transaction costs are described by log(γi ) = µν + θI{z = zh } + νi where I{z = zh } is an indicator for being in the high-quality decision state zh . Suppose also that ūi = µu + i ε 0 1 ρ i ∼ N , νi 0 ρ 1 Note the similarity between the setup of this model and the one in Section 7. We can use equation (66) to calculate likelihood contributions, replacing γi by eµν +νi and γi0 by eµν +θ+νi . 64