Preference Identification Under Inconsistent Choice: A Reduced-Form Approach ∗ Jacob Goldin

advertisement
Preference Identification Under Inconsistent Choice: A
Reduced-Form Approach∗
Jacob Goldin†
Daniel Reck‡
June 13, 2014
Abstract
Behavioral economics has documented numerous settings in which behavior varies according to
seemingly-arbitrary features of the choice environment such as defaults, salience, or framing effects.
Optimal policy design requires accounting for the preferences of inconsistent decision-makers but traditional revealed preference analysis breaks down when individuals exhibit systematic choice reversals. We
consider binary choice problems in which preference-irrelevant “frames” affect the behavior of a subset of
decision-makers in monotonic directions. In such settings, we show that preference identification hinges
upon understanding the empirical relationship between decision-makers’ preferences and their propensity to optimize. We provide a range of tools for examining this relationship and identifying preferences,
drawing on techniques analogous to those commonly employed in the program evaluation literature. We
illustrate the usefulness of these techniques in an application to the optimal default problem.
∗ The authors wish to thank Jason Abaluck, Roland Benabou, Charlie Brown, James Hines, Bo Honore, Miles Kimball, Alvin
Klevorick, David Lee, Alex Mas, Wolfgang Pesendorfer, Kareen Rozen, Joel Slemrod, Jesse Shapiro, and seminar participants
at Princeton and the University of Michigan for helpful discussion and comments. Any errors are our own.
† Department of Economics, Princeton University, email: jgoldin@princeton.edu
‡ Department of Economics, University of Michigan, email: dreck@umich.edu
1
1
Introduction
Suppose policymakers wish to require that companies give their customers discretion over how their personal
data, such as internet usage data, is collected and analyzed. Regulations of this kind are a subject of active
debate in many countries. A key feature of these regulations is that privacy controls can be opt-in - so that
customers must actively give a company permission to collect and/or use their personal data - or they can
be opt-out - so that customers must actively tell a company not to use their data.1 Some consumers may
prefer that their data be used by companies, in order to improve the quality of some online service or to
see advertisements that are relevant to their interests. Other consumers may wish that their personal data
remain private. Suppose that with opt-in regulation, 40 percent of individuals allow a given company to use
their data, but that with opt-out regulation, 70 percent do so (and 30 percent opt out). If regulators’ only
goal is to maximize individual welfare, how should they decide whether privacy controls should be opt-in or
opt-out?
Answering questions such as this one through the lens of economic theory is complicated by the difficulty in
measuring preferences when behavior varies based on seemingly irrelevant features of the choice environment
– such as defaults, salience, or framing effects. For example, if decision-makers’ preferences over a menu do
not depend on which option is the default, observing that individuals’ choices are sensitive to the default casts
doubt on the standard revealed preference approach to welfare economics. The question of how to conduct
welfare analysis in such situations lies at the heart of important controversies in behavioral economics. In
particular, for a benevolent planner to design a choice environment in a way that maximizes well-being, he
or she must first have some means of identifying the preferences of those individuals whose choices will be
affected.
Most prior work takes one of two approaches to addressing the problem of preference recovery under
inconsistent choice. First, one may utilize a positive model of behavior that fully specifies the mapping from
a decision-maker’s preferences to her (potentially sub-optimal) behavior [e.g. Rubinstein and Salant, 2012,
Carroll et al., 2009, Kahneman et al., 1997, Köszegi and Rabin, 2006]. Such approaches yield important
insights but in many cases the resulting welfare conclusions are sensitive to the modeler’s choice between
competing positive models that are observationally similar [Bernheim and Rangel, 2009, Bernheim, 2009].
An alternative approach is to restrict preference inferences to a subset of observed choice situations in which
decision-makers choose consistently [Bernheim and Rangel, 2009]. However, by design, such approaches yield
no information on the preferences of those decision-makers whose choices are influenced by the frame – the
very group whose preferences are most relevant for selecting the optimal policy regarding which frame to
1 For
a more detailed discussion of this issue, see Johnson et al. [2002].
2
implement.2 Further “refinements” can provide a path forward for behavioral welfare analysis in contexts
where choice situations are observed in which an observer is willing to assume that all decision-makers have
chosen optimally.3 However, in many contexts, such as sensitivity to default options, there will be little
reason to believe that any of the observed choice situations satisfy these strict requirements.4
In this paper we develop a framework for preference identification over binary choices when some – but
not all – of the observed decision-makers optimize. In particular, we provide conditions under which one
can recover the preferences of the optimizing decision-makers and methods for extrapolating those recovered
preferences to the preferences of the population. To begin, we follow Salant and Rubinstein [2008] and
Bernheim and Rangel [2009] by modeling decisions in terms of menus and frames – preference-irrelevant
features of the choice environment that may affect behavior such as the default, the presence of irrelevant
alternatives or irrelevant information, or the order in which choices are presented. When decision-makers
choose consistently across frames, we assume that those choices reflect their preferences, an assumption we
call the consistency principle. We also allow decision-makers to choose inconsistently across frames, but we
limit our analysis to choice situations in which the frames affect all decision-makers in a uniform direction, an
assumption we label frame monotonicity. Crucially, our approach does not require that an outside observer
can identify ex ante which individual decision-makers are optimizing, nor that one can observe an individual
decision-maker in multiple choice situations. Instead, we exploit the fact that frame monotonicity and the
consistency principle imply that decision-makers who choose “against the frame” prefer the options that
they choose. This insight, along with a statistical assumption concerning the assignment of decision-makers
to frames, allows us to recover the preferences of consistent decision-makers – the subset of the population
whose choices are unaffected by the frame.5
We next consider what can be learned about population preferences from the distribution of preferences
among consistent choosers. We first show that population preferences may be partially identified using
worst-case bounds. That is, once we recover the preferences of the consistent choosers, we can bound
population preferences by alternatively assuming that the inconsistent decision-makers have the most extreme
preferences possible (in either direction).
2 The observed choice data may still admit to an incomplete preference ordering [Bernheim and Rangel, 2009, Bernheim,
2009], useful for various normative applications [Fleurbaey and Schokkaert, 2013]. We formalize the claim that the optimal
frame depends on the preferences of the frame-sensitive decision-makers in Section 8.
3 Applications of this refinement approach can be found in the tax salience literature [Chetty et al., 2009, Goldin, 2014, Reck,
2014, Allcott and Taubinsky, 2013].
4 Another alternative is to turn from choice to survey data, either on hypothetical choice situations designed to elicit preference
parameters [Barsky et al., 1997], or from surveys about subjective well-being [Benjamin et al., 2012]. While useful, survey
approaches are also subject to numerous potential framing effects [Schwarz and Clore, 1987, Schwarz, 1987, Deaton, 2012]. A
useful discussion of other approaches to preference recovery is provided in Beshears et al. [2008].
5 An alternative interpretation of the empirical evidence concerning “preference reversals” is to conclude that inconsistent
decision-makers simply lack normatively-relevant preferences in the first place. For someone who takes that view as a starting
point, the contribution of our paper is that it provides a method for backing out the preferences of the consistent decision-makers
from the aggregate observed choice data.
3
Turning to full identification of population preferences, we show how, under our assumptions, the problem of population preference recovery under inconsistent choice boils down to understanding the empirical
relationship between decision-makers’ preferences over the menu items and their propensity to optimize.
When using the term “optimize” here and elsewhere in the paper, we mean that a decision-maker chooses
her preferred option in both frames, so that optimizing decision-makers and consistent decision-makers are
one and the same. When optimizing behavior is uncorrelated with variation in decision-makers’ preferences
– a condition which we refer to as decision-quality independence – population preferences may be recovered
by extrapolating the preferences of consistent decision-makers directly to those whose behavior is sensitive
to the frame. Whether decision-quality independence holds in a given context is an empirical question,
one whose answer will vary based on the specific decision being observed. For the many situations in which
decision-quality independence does not hold, we provide two types of tools for shedding light on the empirical
relationship between optimizing behavior and decision-makers’ preferences. This information can then be
used to make inferences about the aggregate preferences of the population. As in other empirical contexts,
the more structure one is willing to impose on this relationship, the weaker the informational requirements
are for making inferences about the parameter of interest.
First, rather than simply extrapolating from consistent to inconsistent decision-makers – an invalid approach when decision-quality independence fails – we can adjust for observable differences between the two
groups. That is, even though we cannot observe which individuals are consistent and which are inconsistent,
we are able to recover the aggregate preferences of the consistent choosers as well as the aggregate observable characteristics of each group. If decision-quality independence holds conditional on these observable
characteristics, we can recover population preferences by separately estimating the average preferences for
each group of decision-makers and re-weighting those cell averages based on the distribution of observable
characteristics among the inconsistent decision-makers. For example, it may be that the rich are more likely
to optimize than the poor, and also that the rich have on average different preferences than the poor, but
conditional on income, decision-makers’ optimizing behavior is uncorrelated with their preferences. As in
other empirical contexts, the plausibility of this matching-on-observables estimator depends on what information about decision-makers can be observed. The more potential determinants of optimizing behavior
that can be observed, the more likely it is that conditional decision-quality independence will hold.
Second, we develop techniques that utilize variation in the decision-making environment through a
decision-quality instrument with the following properties: (1) the variation affects decision-makers’ propensity to optimize monotonically, and (2) conditional on whether an individual optimizes, the variation does
not affect choice behavior. For example, this variation could take the form of differences in the time pressure
under which a decision must be made, the costs of comparing the available options, or the presence or ab4
sence of other drains on the decision-maker’s “cognitive load.” Under these assumptions, we show how one
may recover the preferences of decision-makers whose optimizing behavior varies between states.6 Imposing
additional structure on the relationship between preferences and the propensity to optimize allows for recovery of the full distribution of preferences and extrapolation to the preferences of decision-makers who never
optimize. We demonstrate this using a latent variable model imposing functional form assumptions on the
distribution of preferences and the propensity to optimize. Variation in a decision-quality instrument identifies the correlation between decision-makers’ preferences and their propensity to optimize, the key unknown
necessary for recovering population preferences. With sufficient observed variation in decision-quality instruments, one can also model the relationship between decision-makers’ preferences and their propensity to
optimize in a more flexible manner, which allows us to extrapolate to population preferences under weaker
functional form assumptions. Finally, we describe how variation in the decision-quality instrument lends
itself to an over-identification test of the decision-quality independence assumptions discussed above.
In describing these various approaches to preference identification under inconsistent choice, we do not
claim to have discovered a one-size-fits-all solution to this fundamental problem. Rather, we view our
primary contribution as showing how imposing modest structure can transform the problem into one that
is both more familiar and more tractable. The tools we introduce for recovering population preferences
require additional assumptions relative to Bernheim-Rangel; the payoff to this additional structure is that
our approach can be applied in a much broader range of situations – namely those in which only a subset
of decision-makers are optimizing and in which each individual’s choice is observed only once. Even with
this additional structure, our approach retains an important “reduced-form” flavor that allows us to draw
conclusions about welfare without specifying the exact positive model that generates behavior.
To illustrate the usefulness of our techniques, we formalize a planner’s choice of frame problem, such as
that faced by a government seeking to “nudge” its citizens in a beneficial direction by choosing which of two
options should be the default [Thaler and Sunstein, 2008]. We show that the solution depends on a weighted
average of the preferences of the consistent and the inconsistent decision-makers, so that determining the
optimal frame requires separately identifying the preferences of the consistent and the inconsistent decisionmakers. The tools we provide allow one to estimate these quantities from observed choice data.
Limiting our analysis to binary choices simplifies the analysis considerably, but the tools we develop are
useful outside of that domain as well. Before concluding, we provide one generalization of our approach,
focusing on situations where decision-makers choose from a discrete,ordered set of options.. We briefly
discuss additional generalizations to other choice situations.
6 As discussed below, these assumptions and results parallel the identification of a Local Average Treatment Effect using
instrumental variables [Imbens and Angrist, 1994].
5
The remainder of the paper proceeds as follows: Section 2 sets up the model, Section 3 shows how to estimate the preferences of consistent choosers, Section 4 discusses the identification of population preferences,
Section 5 describes the matching-on-observables approach, Section 6 describes the decision-quality instrumental variables approach, and Section 7 describes the recovery of inconsistent decision makers’ preferences
using the observed relationship between preferences and decision-quality. Section 8 shows how the optimal
choice of default depends on the parameters our methods identify from choice data, and Section 9 generalizes
the results beyond the binary choice setting. Section B in the Appendix shows how several positive models
of framing effects relate to our assumptions.
2
Setup
Consider a population of decision-makers of density 1, with individuals denoted by i. Each individual chooses
from a fixed menu of two items, X = {x, y}. A choice-situation g ∈ G consists of a menu X = {x, y} and a
frame d ∈ {dx , dy }. Choice behavior for agent i is described by choice function ci : G −→ X.
Each decision-maker’s choice over X is observed once, under one of the two possible frames (dx or dy ).
Let yi (d) indicate whether individual i would choose y from X under frame d, yi (d) = 1 ⇐⇒ ci (X, d) = y.
Let dij indicate whether individual i is observed under frame dj , for j ∈ {x, y}. Let E[yi |dj ] denote the
population average of choices observed under frame dj , E[yi |dj ] ≡ E[yi |dij = dj ]. We assume throughout
that these population moments are directly observable, putting aside issues of finite sample size. To illustrate
the notation using the privacy controls example described in the introduction, we may suppose that yi (dj )
indicates whether the individual allowed a company to use her data for the case whether privacy controls
are opt-in (dx ) or opt-out (dy ). The hypothetical data from the introduction are E[yi |dy ] = 0.70 and
E[yi |dx ] = 0.40. Moments from this hypothetical example used to illustrate our techniques are contained in
Tables 1 through 3.
We assume that agents have well-defined preferences over the elements in X. The preferences of agent i
are represented by choice function mi : G −→ X. We assume that agents’ preferences are insensitive to the
frame, which we call frame separability:
mi (X, dx ) = mi (X, dy ) ∀i
(1)
As such, it will be convenient to write mi (X, d) = mi (X). Absent an assumption along these lines, observed
differences in choices at dx versus dy pose no problem for standard revealed preference welfare analyses;
that is, the fact that a decision-maker chooses differently in dx and dy could simply reflect the fact that her
6
preferences over x and y are contingent on information contained in the frame.7 Examples of frames might
include: (1) which option is framed as the default; (2) the order in which options are displayed; (3) whether
the consequence of selecting an option is framed as a loss or a gain; (4) whether the menu of options includes
an irrelevant alternative; (5) the point in time at which a decision is made; or (6) whether various features
of the choice are made salient.
Let φi indicate whether individual i prefers y to x, φi = 1 ⇐⇒ mi (X) = y. Let E[φi ] denote average
preferences for the population.
Because preferences are not sensitive to frames, an agent who optimizes over the elements in X (i.e.
according to m(.)) will make the same choice regardless of the frame. To accommodate the possibility
that choices may depend upon the frame, we do not impose (as is done in conventional revealed preference
analysis) that ci (X, d) = mi (X) ∀i, d. Rather, we allow agents to either choose consistently or to choose in
a way that is sensitive to the frame. We assume that, when individuals choose consistently, those choices
reveal their preferences – an assumption which we refer to as the consistency principle:
ci (X, dx ) = ci (X, dy ) =⇒ mi (X) = ci (X, dx ) = ci (X, dy )
(2)
As in Bernheim and Rangel [2009], (2) represents a weakening of the standard instrumental rationality
assumption behind revealed preference analyses; to the extent that an agent makes sub-optimal choices
consistently across frames, our approach will incorrectly treat such choices as revealing the agent’s true
preferences.8 For example, if individuals choose whether or not to enroll in an individual retirement account,
and the frame manipulates whether the default is enrollment or non-enrollment, then some individuals might
consistently choose not to enroll in a retirement plan due to a “present bias” toward consuming in the present
rather than in the future. To the extent that we believe that this present bias causes the individual to act
against her own interests, the consistency principle would not be applicable.
9
In addition to allowing the frame to affect choice, we will impose frame monotonicity, which requires that
when the frame affects choice, it does so in the same direction for each decision-maker:
7 For example, if an agent chooses hot chocolate from {hot chocolate, ice cream} under d and ice cream from {hot chocolate,
x
ice cream} under dy , there would be no apparent deviation from rationality if the frame indicated whether the season was winter
or summer. This assumption is explicit in Salant and Rubinstein [2008] and implicit in Bernheim and Rangel [2009], who require
it for determining when two potentially conflicting choice situations differ in terms of the frame or in terms of the available
menu items. In this sense, frame separability is the property that distinguishes variation in frames from variation in menu
items.
8 Although our discussion of the results focuses on the case in which decision-makers’ inconsistency across frames represents
a failure of rational choice, our approach works equally well if framing effects are due to neoclassical factors such as the presence
of transaction costs associated with selecting an option other than the default. See Appendix Section B.
9 For additional criticism of the consistency principle, refer to Masatlioglu et al. [2012], whose relationship to our work we
describe in the Appendix.
7
yi (dy ) ≥ yi (dx ) ∀i
(3)
where dx and dy are labeled without loss of generality. This assumption rules out the possibility that some
agents choose y if and only if x is the default. In conjunction with the consistency principle (2), frame
monotonicity (3) implies ci (X, dx ) 6= ci (X, dy ) =⇒ ci (X, dx ) = x , ci (X, dy ) = y. With data on both frames,
which we assume not to have for the analysis in this paper, this implication is a testable hypothesis.10 In
the privacy controls example, frame monotonicity implies that when an individual is not consistent about
whether she allows a company to collect and use her data, she will always let the company use her data
under opt-out policies and never under opt-in policies.11
We can embed frame monotonicity and the consistency principle in the following expression: yi (dy ) ≥
φi ≥ yi (dx ). Intuitively, our assumptions imply that the two frames lead inconsistent decision-makers away
from their preferred choice in opposite directions.
Let ψi indicate whether agent i chooses consistently across frames, ψi = 1 ⇐⇒ ci (X, dx ) = ci (X, dy ).
Under (1), ψi = 0 implies that the decision-maker fails to choose her preferred option in at least one of the
available frames. In contrast, when ψi = 1, (2) allows us to conclude that the agent’s (consistent) choice
behavior is optimal. Since we wish to study situations where some individuals optimize, we assume that a
non-empty set of decision-makers choose consistently across frames:12
∃i s.t. ψi = 1
(4)
Finally, in order to avoid conflating the effects of d with heterogeneity in decision-makers’ preferences,
it must be the case that individuals are not systematically assigned to choice-situations in ways that are
correlated with their preferences or their propensity to optimize (“unconfoundedness”). Thus we assume
that
(φi , ψi ) ⊥ dix
(5)
Unconfoundedness might fail, for example, if new employees at a firm were presented with a different default
10 Revising our estimators to allow for the presence of a known fraction of non-conformist inconsistent choosers is a straightforward exercise.
11 One way that frame monotonicity would fail in this example is if using an opt-in privacy policy signaled to users that a
company was trustworthy in its respect for privacy, which could cause her to allow data use under opt-in and not opt-out.
However, this situation would also constitute a violation of frame separability, since the individuals preferences over whether
she allows the company to use her data change when she learns about the company’s policies.
12 More formally, we assume that there is a subset i∗ of the population with strictly positive measure, such that ∀i ∈ i∗ , ψ = 1.
i
We write the assumption in terms of the existence of a consistent chooser for clarity. The same goes for assumptions 15 and 24
later on in the paper.
8
than more senior employees when choosing health plans. Unconfoundedness is guaranteed when individuals
are randomly assigned to frames.
3
Identification of Preferences of Consistent Choosers
Our first result shows how the above assumptions allow one to recover the preferences of the decision-makers
who choose consistently across frames, despite observing each individual’s choice under a single frame only.
Proposition 1 Let YA ≡
E[yi |dx ]
E[yi |dx ]+1−E[yi |dy ] .
Frame separability (1), the consistency principle (2), frame
monotonicity (3), the existence of consistent choosers (4), and unconfoundedness (5) imply the following:
(1.1) The fraction of the population that chooses consistently, E[ψi ], is given by E[ψi ] = E[yi |dx ] + 1 −
E[yi |dy ].
(1.2) The fraction of consistent choosers who prefer y, E[φi |ψi = 1], is given by E[φi |ψi = 1] = YA .
Proof of Proposition 1: The proof uses the fact that under our assumptions, choosing “against
the frame” reveals preferences. We first analyze the case where the frame is dx . Use the law of iterated
expectations to write:
E[yi (dx )|dij = dx ] = E[yi (dx )|dij = dx , ψi = 1] p(ψi = 1|dij = dx )+E[yi (dx )|dij = dx , ψi = 0] p(ψi = 0|dij = dx )
(6)
By the consistency principle (assumption 2), we know ψi = 1
=⇒
yi (dx ) = φi , which implies
E[yi (dx )|dij = dx , ψi = 1] = E[φi |dij = dx , ψi = 1]. Unconfoundedness (5) then implies E[φi |dij = dx , ψi =
1] = E[φi |ψi = 1], and p(ψi = 1|dij = dx ) = p(ψi = 1).
Similarly, frame monotonicity (assumption (3)) and the definition of ψi jointly imply that individuals
who do not optimize will choose x under dx . Formally, ψi = 0 ⇒ yi (dx ) = 0. Hence we have E[yi (dx )|dij =
dx , ψi = 0] = 0.
Substituting these results into (6) yields:
E[yi |dx ] = E[φi |ψi = 1] p(ψi = 1)
(7)
Now we apply a similar set of steps to E[yi |dy ] = E[yi (dy )|dij = dy ] to obtain that:13
13 By
the law of iterated expectations:
E[yi (dy )|dix = 0] = E[yi (dy )|dij = dy , ψi = 1] p(ψi = 1|dij = dy ) + E[yi (dy )|dij = dy , ψi = 0] p(ψi = 0|dij = dy )
9
(8)
E[yi |dy ] = E[φi |ψi = 1] p(ψi = 1) + p(ψi = 0)
(9)
Solving (7) and (9) for E[ψi ] = p(ψi = 1), and applying the identity p(ψi = 1) = 1 − p(ψi = 0) yields
(1.1). By (4), p(ψi = 1) > 0, so we can substitute (7) and (9) into the expression for YA to obtain:
YA ≡
E[yi |dx ]
E[φi |ψi = 1] p(ψi = 1)
=
E[yi |dx ] + 1 − E[yi |dy ]
p(ψi = 1)
(10)
The existence of consistent choosers (4) guarantees this quantity is well-defined. Simplifying the expression
yields (1.2).
Discussion of Proposition 1: An outside observer cannot identify exactly which individual decisionmakers are optimizing because each individual’s choice is observed only under a single frame However,
Proposition 1 follows from the insight that, under frame monotonicity, only consistent decision-makers choose
against the frame (i.e. choose x when confronted with dy or choose y when confronted with dx ). Formally,
the useful property of choices under frame monotonicity is:14
ci (X, dx ) = y or ci (X, dy ) = x ⇐⇒ ψi = 1
(11)
By (11) and the consistency principle (2), we can conclude that individuals who choose against the
frame are revealing their preferences. Individuals’ frames are independent of preferences and optimizing by
unconfoundedness, so we can regard the set of consumers choosing against the frame as a representative
sample of all consistent choosers. The fraction of individuals choosing against the frame yields the size of
the consistent population, and the subset of those individuals who choose y under dx yields the fraction
of consistent choosers who prefer y. Note that both frame-monotonicity and unconfoundedness would be
unnecessary (and testable) were we able to observe individual choice data on each decision-maker under each
frame.
Applying Proposition 1 to hypothetical choice data for our running example of opt-in versus opt-out
privacy policies, in Table 1, we have E[ψi ] = 0.7, so that 70 percent of the population chooses consistently
across frames. Similarly, we have YA =
0.4
0.4+0.3
=
4
7,
which implies that approximately 57 percent of the
As before, (2), guarantees ψi = 1 ⇒ yi (dy ) = φi , which implies E[yi (dy )|dij = dy , ψi = 1] = E[φi |dij = dy , ψi = 1]. Similarly,
(3) and (2) imply that ψi = 0 ⇒ yi (dy ) = 1, which allows us to write E[yi (dy )|dij = dy , ψi = 0] = 1. Finally, unconfoundedness
(5) guarantees E[φi |dij = dy , ψi = 1] = E[φi |ψi = 1], p(ψi = 1|dij = dy ) = p(ψi = 1), and p(ψi = 0|dix = 0) = p(ψi = 0).
Substituting these results into (8) yields equation (9).
14 Proof : Suppose c (X, d ) = y, so y (d ) = 1. By frame monotonicity, we must also have y (d ) = 1. Now suppose instead
x
i
i x
i y
that ci (X, dy ) = x, so yi (dy ) = 0. By frame monotonicity, we must also have yi (dx ) = 0. In either case, we therefore have
y(dy ) = y(dx ). For the opposite direction, note that if y(dx ) = y(dy ), we must have that either c(X, dx ) = c(X, dy ) = x or
c(X, dx ) = c(X, dy ) = y. So either ci (X, dy ) = x or ci (X, dy ) = y.
10
consistent decision-makers prefer y to x. Just over half of consistent choosers in this example prefer to have
the company collect and use their data.
Table 1: Average Choices by Frame
Fraction choosing y under dy , E[yi |dy ]
0.70
Fraction choosing y under dx , E[yi |dx ]
0.40
Fraction consistent, E[ψi ]
0.70
Fraction of consistent that prefer y, E[φ|ψi=1 ]
0.57
It is important to note that identifying the preferences of the consistent decision-makers may be important
in its own right. First, as described in Section 8, the key parameter needed for implementing the optimal
frame is the average preferences of the inconsistent decision-makers. Thus even if one is able to successfully
implement a refinement approach (as described in the introduction) to recover the preferences of the aggregate
population, one must still have some technique for isolating the preferences of the consistent decision-makers
in order to back out the preferences of the inconsistent decision-makers (using Bayes Rule). Second, contrary
to our starting point one might assume that individuals whose choices are inconsistent across frames simply
lack normatively-relevant preferences over the available options.15 In that case, Proposition 1 provides a
method for isolating the preferences of those decision-makers who do choose in a consistent way.
4
Identification of Population Preferences
The remainder of the paper focuses on the question of how to recover the distribution of preferences of
individuals who choose inconsistently across frames. This section describes what barriers must be overcome
to solve this problem.
4.1
Bounds on Population Preferences
First, we note that we can partially identify population preferences using Proposition 1. Specifically, we can
identify upper and lower bounds of E[φi ] by measuring the revealed preferences of the subset of decisionmakers who choose consistently and making the most extreme assumptions possible regarding the preferences
of those who do not.
Proposition 2 Frame separability (1), the consistency principle (2), frame monotonicity (3), and unconfoundedness (5) imply that E[φi ] ∈ [E[y|dx ] , E[y|dy ]].
Proof: See Appendix.
15 For
a thoughtful discussion of this issue, refer to Fischhoff [1991].
11
Discussion of Proposition 2 Proposition 2 offers a conservative approach for identifying the range of
possible population preference parameter values consistent with the observed choice data.
To illustrate, using the hypothetical choice data in Table 1, we have E[φi ] ∈ [0.4 , 0.7]. The bounds
themselves are quite intuitive; the present analysis highlights that interpreting these population moments as
bounds is correct only to the extent that our assumptions, notably including frame monotonicity, are satisfied.
When the fraction of decision-makers failing to optimize is large, the bounds will be relatively uninformative
and further assumptions or information will be required to shed light on population preferences.16
4.2
Characterizing the Full Identification Problem
Proposition 3 characterizes the primary difficulty in fully identifying the preferences of two groups of interest:
the full population and the population of inconsistent choosers.
Proposition 3
Frame separability (1), the consistency principle (2), frame monotonicity (3), the existence
of consistent choosers (4), and unconfoundedness (5) imply the following:
(3.1) The fraction of the population who prefer y, E[φi ], is given by
E[φi ] = YA −
cov(φi , ψi )
E[ψi ]
(12)
(3.2) The fraction of inconsistent choosers who prefer y is given by
E[φi |ψi = 0] = YA −
cov(ψi , φi )
E[ψi ](1 − E[ψi ])
(13)
Proof of Proposition 3
(3.1): This result follows directly from equation (10).
(3.2): By the law of iterated expectations, E[φi ] = p(ψi = 1)E[φi |ψi = 1] + p(ψi = 0)E[φi |ψi = 0].
Substituting this into (12) and applying (2.1) yields the result.
Discussion of Proposition 3
The key problem with extrapolating from the preferences of consistent
choosers to other populations is that preferences and optimizing behavior may be correlated. Individuals
16 If choices are observed under multiple decision-quality states, as in Section 6, the bounds derived from the highest decisionquality state will yield the tightest bounds, because the preferences of a greater fraction of the population will be identified
from consistent choice behavior.
12
who choose consistently might be more likely to prefer y to x than individuals who choose inconsistently.
Proposition 2.1 and 2.2 show that a sufficient statistic for the difference between preferences of consistent
choosers (given by YA by Proposition 1) and the preferences of other groups is the covariance between
preferences and optimizing behavior (as well as the size of the consistent population, which we can recover
using Proposition 1). When this covariance is not negligible, consistent choosers are not representative of
the full population, so YA would be a biased estimate of their preferences, and the magnitude of the bias
is largest when very few individuals choose consistently. Similarly, YA would be a biased estimate of the
preferences of the inconsistent choosers alone, and the magnitude of the bias in this case is even larger for
this group than for the full population.17
Crucially, any number of behavioral models could potentially be generating inconsistent choices across
frames in a particular application. But for the purposes of identifying E[φi ] and E[φi |ψi = 0] from population choice data, the underlying positive model matters only to the extent that it shapes cov(φi , ψi ). Given
knowledge of cov(φi , ψi ), one does not need to take a stance on the exact behavioral model explaining behavior. This result is important because it focuses the problem of preference identification on understanding
this quantity, which one can examine empirically with a variety of reduced-form methods.18
4.3
Decision-Quality Independence
Proposition 3 highlights that in one special set of cases, population preferences may be recovered by simply
extrapolating the preferences of the consistent choosers. We refer to the necessary assumption as decisionquality independence. It states that the variation in whether decision-makers optimize is not systematically
related to the variation in decision-makers’ preferences over x and y:
cov(φi , ψi ) = 0
(14)
Decision-quality independence is a strong assumption, likely to be unrealistic in many important applications. When it holds, however, we can recover population preferences via a straightforward application of
Proposition 3:
17 Note that the denominator of (13) is equal to the variance of ψ . Using the definition of the correlation between φ and ψ ,
i
i
i
we could also write this expression as
E[φi |ψi = 0] = YA − corr(ψi , φi )E[φi ](1 − E[φi ])
which indicates that whenever preferences are highly variable in the population or the correlation between φi and ψi is large,
the bias from using YA as an estimate of the preferences of inconsistent choosers will be large.
18 As in other reduced form empirical contexts, understanding the underlying behavioral model is still important for determining which observables need to be included when matching on observables (Section 5) or for determining what type of variation
meets the requirements of a decision-quality state (Section 6).
13
Proposition 4: Assuming frame separability (1), the consistency principle (2), frame monotonicity (3), the
existence of consistent choosers (4), and unconfoundedness (5), decision-quality independence (14) implies
that Y A = E[φi ] = E[φi |ψi = 0].
Proof: The result follows directly from (2.1) and the assumption of decision-quality independence (14). Discussion of Proposition 4
Proposition 4 identifies sufficient conditions for the recovery of population
preferences in situations where some decision-makers systematically fail to optimize.
Intuitively, when individuals’ propensity to optimize is not systematically related to their underlying
preferences, the preferences for the inconsistent decision-makers can be inferred from the revealed preferences
of those decision-makers who are consistent, which we know from Proposition 1.Whether decision-quality
independence holds in a particular setting is an empirical question, one whose answer will vary depending
on the choice being made and the population of decision-makers. In Section B in the Appendix, we show
that in certain positive models of framing effects, decision-quality independence obtains only under strong
assumptions. For example, if variation in decision-making consistency across individuals stems only from
variation in cognitive costs, decision-quality independence holds only when that variation is uncorrelated with
variation in their preferences over the options being chosen.. Decision-quality independence thus provides a
useful reference point for identifying population preferences, but as an assumption it should not be adopted
uncritically.
5
Preference Recovery By Matching on Observables
In some cases, decision-makers’ preferences may be systematically related to their propensity to optimize,
based on observable factors. For example, it could be that the rich are more likely to optimize than the
poor, and also more likely to prefer y to x.19 In such cases, decision-quality independence may hold after
conditioning on the factors that are related to both propensity to optimize and underlying preferences. That
is, it could be that the propensity to optimize is uncorrelated with one’s preferences, looking only within
the sub-population of rich decision-makers (and similarly if one conditions on the poor decision-makers).
When the factors that cause decision-quality independence to fail are observable, Proposition 5 shows how
population preferences may be recovered.
Suppose that decision-makers exhibit some observable characteristic wi ∈ {w0 , ..., wJ }, which is potentially associated with their preferences for x and y as well as their optimizing behavior. We assume that
19 For example, a rapidly growing literature investigates the relationship between income and decision-quality across a range
of contexts. See Spears [2011], Goldin and Homonoff [2013], and Syngjoo Choi and Silverman [2013] for some recent examples
and Mullainathan and Shafir [2013] for a comprehensive treatment of the subject.
14
some individuals with each realization of w choose consistently:
∀w, ∃i s.t. wi = w, ψi = 1
(15)
We will condition our unconfoundedness assumption (5) on wi :
(φi , ψi ) ⊥ di | wi
(16)
Note that for the above example, our approach will also be valid in the case where we are more likely to
observe the rich under dx than under dy .Finally, we assume conditional decision-quality independence, i.e.
that decision-quality independence holds conditional on the observable characteristic wi .
cov(φi , ψi ) | wi = 0
Proposition 5 Let YA (w) =
E[y|dy ,w]−E[y|dx ,w]
E[y|dy ]−E[y|dx ] Pw .
E[yi |dx ,w]
E[yi |dx ,w]+1−E[yi |dy ,w] .
(17)
Let Pj = P r(w = j) for j ∈ {0, ..., J} and Sw =
Assume that frame separability (1), the consistency principle (2), frame monotonicity
(3), conditional unconfoundedness (16), the existence of consistent choosers (15), and conditional decisionquality independence (17) hold. Then
(5.1) E[φi ] = Σj Pj YA (j)
(5.2) E[φi |ψi = 0] = Σj Sj YA (j)
Proof: See Appendix.
Corollaries to Proposition 5: Distribution of Types Under Assumptions (1), (2), (3), (5), and (4),
the distribution of type j among the consistent and inconsistent decision-makers is as follows:
(5.3) The fraction of type j among the inconsistent decision-makers is given by P r(wi = j|ψi = 0) =
E[y|dy ,w=j]−E[y|dx ,w=j]
Pj .
E[y|dy ]−E[y|dx ]
(5.4) The fraction of type j among the consistent decision-makers is given by P r(wi = j|ψi = 1) =
E[y|dx ,w=j]+1−E[y|dy ,w=j]
Pj
E[y|dx ]+1−E[y|dy ]
Proof: See Appendix.
Discussion of Proposition 5
Proposition 5 can be understood as applying the intuition of a matching
estimator from the program evaluation literature to the case of preference recovery of inconsistent decisionmakers.20 For example, suppose that decision-makers are either rich or poor. Suppose that 70 percent of the
20 Consider
the methods proposed by Abadie [2003] and Angrist and Fernandez-Val [2010] for identifying the fraction of
compliers associated with an instrument and extrapolating the treatment effect for those compliers to a different population. In
15
consistent decision-makers are rich, but that only 40 percent of the inconsistent decision-makers are rich. The
approach is to measure the revealed preferences of the consistent rich and poor decision-makers separately,
and then extrapolate that information to the associated group of inconsistent decision-makers. Because
we can identify the fraction of rich and poor decision-makers among the optimizers and non-optimizers
(respectively),21 we can re-weight the revealed preference information to recover the aggregate preferences for
the non-optimizers. The conditional independence assumption guarantees the validity of the extrapolation of
preferences from rich optimizers to rich non-optimizers (and from poor optimizers to poor non-optimizers).22
The estimates of E[φi ] and E[φi |ψi = 0] are both weighted averages of the estimated mean preferences for
the various w subgroups. The weighted averages differ to the extent that the distribution of w differs between
the non-optimizers and the aggregate population; Pw measures the distribution of w in the general population
whereas Sw measures the distribution of w in the population of inconsistent decision-makers. Intuitively,
when preferences are independent of w, we will have YA (0) = YA (1) and E[φi ] = E[φi |ψi = 0]. When the
propensity to optimize is independent of w, we will have S0 = S1 , and E[φi |ψi = 0] = E[φi ]. Consequently,
note that the matching estimator admits a test of the null hypothesis that the unconditional decision-quality
independence assumption is satisfied, provided that the conditional independence assumption is satisfied and
YA (0) 6= YA (1). Specifically, one can test whether E[φi ] = E[φi |ψi = 0]. As discussed in the Appendix, the
types of variables that must be accounted for in order for conditional decision-quality independence to hold
will vary based on the underlying positive model that generates behavior.
To illustrate the technique, consider the hypothetical choice data described in Table 2, in which individuals
are categorized based on whether they graduated high school. Because the population moments, aggregated
across education groups, are equal to the moments in Table 1, the average preferences for the consistent
decision-makers is the same as well, E[φi |ψi = 1] ≈ 0.57. However, applying Proposition 5 suggests that
preferences over x and y – e.g. preferences over whether a company is permitted use their data – are strongly
correlated with education, E[φi |ψi = 1, HSi = 1] = 0.62 and E[φi |ψi = 1, HSi = 0] = 0.40. Additionally,
decision-makers’ propensity to choose consistently across frames – e.g. the propensity to choose the same
option regardless of whether the company adopts an opt-in or opt-out privacy policy – is strongly correlated
with education: E[ψi |HSi = 1] = 0.90 and E[ψi |HSi = 1] = 0.40, implying that high school graduates
that application, as in ours, extrapolation based on observables is made more difficult by the fact that observers cannot determine
whether any particular individual is a member of the relevant group (compliers in their context, consistent decision-makers in
ours).
21 To identify these fractions, we exploit the fact that only optimizers “pick against the frame,” and measure the fraction of
each w type among that group. This is an application of a conditional version of the corollary to Proposition 1. Because we
can observe the overall distribution of w in the population, we can use Bayes rule to back out the distribution of w among the
inconsistent decision-makers.
22 In applications to survey questions, one can think of the weights as consistency weights which correct for inconsistent
response bias, applied exactly as one applies propensity score weights to correct for survey response bias. We explore this
problem in Goldin and Reck [in progress].
16
constitute 80 percent of the inconsistent population.23 As a result, the fraction of inconsistent decisionmakers who prefer y to x is estimated to be E[φi |ψi = 0] = 0.44. Intuitively, high school graduates constitute
a disproportionately large share of the consistent decision-makers, so their contribution is scaled down when
calculating the preferences of the inconsistent group. Note that re-weighting the estimates this way changes
the optimal policy in this example: the results now suggest that a majority of inconsistent choosers would
be better off under opt-out privacy controls (see Section 8), in contrast to what we would conclude if we
wrongly imposed decision-quality independence. Weighting in the manner suggested by Proposition 5, we
can see that aggregate preferences for the population are estimated to be E[φi ] = 0.53.
Table 2: Average Choices by Frame and High School Education
6
HS = 1
HS = 0
Total
Fraction choosing y under dy , E[yi |dy ]
0.66
0.76
0.70
Fraction choosing y under dx , E[yi |dx ]
0.56
0.16
0.40
Fraction of population, P (w)
0.60
0.40
1.00
Fraction consistent, E[ψ]
0.90
0.40
0.70
Fraction of consistent population, P (w|ψi = 1)
0.77
0.23
1.00
Fraction of inconsistent population, P (w|ψi = 0)
0.20
0.80
1.00
Fraction of consistent who prefer y, E[φi |ψi = 1]
0.62
0.40
0.57
Fraction of inconsistent who prefer y, E[φ|ψi = 0]
0.62
0.40
0.44
Fraction of population who prefer y, E[φi ]
0.62
0.40
0.53
Exploiting Variation in Decision Quality to Identify Preferences
In many cases, neither decision-quality independence (14) nor its weakened version, conditional decisionquality independence (17), will obtain. In such cases, we can make inferences about population preferences
by directly examining the empirical relationship between agents’ preferences and their propensity to optimize.
This section develops tools for learning about that relationship when one can observe variation in decisionquality states. Intuitively, decision-quality states make individuals more or less likely to optimize, but are
not systematically related to individuals preferences. A change in decision-quality states allows us to observe
empirically the preferences of individuals induced to optimize by that change.
23 That
is, SHS=0 ≡
E[y|dy ,HSi =0]−E[y|dx ,HSi =0]
PHS=0
E[y|dy ]−E[y|dx ]
=
0.76−0.16
(.4)
0.3
17
= 0.80.
6.1
Setup
We redefine a choice situation g ∈ G to consist of a menu X = {x, y} and a frame vector (d, z), where
d ∈ {dx , dy } is a frame like before, and z ∈ Z is a decision-quality state. For simplicity, we initially focus on
the case in which Z is binary, Z = {zh , zl }. Choice behavior for agent i is described by the choice function
ci : G −→ X.
As before, we assume that each decision-maker’s choice over X is observed only once, but now there
are four possible choice-situations (one for each d by z combination). Let dij and zki (respectively) indicate
whether individual i is observed under biasing frame dj and decision-quality state zk , for j ∈ {x, y} and
k ∈ {h, l}. Agents have well-defined preferences over the elements of X summarized by mi : G −→ X. We
will assume frame separability for both d and z:
mi (X, dx , zk ) = m(X, dy , zk ) ∀i, zk
(18)
mi (X, dj , zh ) = mi (X, dj , zl ) ∀i, dj
(19)
Let φi denote whether individual i prefers option y as before,, and let yi (dj , zk ) denote whether individual
i chooses option y in choice situation (dj , zk ). We continue to employ the consistency principle and frame
monotonicity:
∀zk , ci (X, dx , zk ) = ci (X, dy , zk ) ⇐⇒ yi (dx , zk ) = yi (dy , zk ) = φi
(20)
yi (dy , zk ) ≥ yi (dx , zk ) ∀i, zk
(21)
Let ψi (zk ) indicate whether agent i chooses consistently when z = zk , ψi (zk ) = 1 ⇐⇒ yi (dx , zk ) =
yi (dy , zk ). The decision-quality state affects the propensity of agents to optimize but does not affect an agent’s
behavior conditional on whether she optimizes or not, which we call decision-quality state exclusivity:
yi (dj , zh ) 6= yi (dj , zl ) ⇐⇒ ψi (zh ) 6= ψi (zl )
(22)
Finally, we assume that the decision-quality state affects whether an individual optimizes monotonically,
decision-quality state monotonicity:
ψi (zh ) ≥ ψi (zl ) ∀i
18
(23)
Moreover, we assume that this inequality is strict for a non-zero subset of the population, which implies the
existence of contingent optimizers:
∃ i s.t. ψi (zh ) > ψi (zl )
(24)
Examples of z include the time pressure for making a decision, the cost of obtaining or processing information
about the various available choices, the opportunity cost of cognitive resources at the time of decision-making,
or the degree to which one alternative is more salient than another. As discussed in the Appendix, the
specific forms of variation that will satisfy these assumptions depends on the underlying model of behavior
that generates framing effects in a particular application.
Before proceeding, it will be useful to simplify the notation. Because X is held fixed throughout, we
will typically suppress it as an argument in the various choice functions. As before, let φ ≡ E[φi ] denote
the fraction of individuals who prefer y. Let E[yi | dj , zk ] = E[yi |dij = dj , zki = zk ] denote the fraction of
individuals observed in situation (dj , zk ) who choose y.
Finally, we assume that individuals are not assigned to choice-situations in ways that are systematically
correlated with either their preferences or their propensity to optimize. Thus we assume that
(φi , ψi (zh ), ψi (zl )) ⊥ dij , zki
(25)
for j ∈ {x, y} and k ∈ {h, l}. As before, this unconfoundedness assumption would be satisfied if individuals
are randomly assigned to choice-situations.
6.2
Identifying the Preferences of Sometimes-Inconsistent Choosers
Although we cannot observe exactly which decision-makers optimize under each decision-quality state, Proposition 6 allows us to recover the average preferences of the group of decision-makers who are inconsistent at
one decision-quality state and consistent at the other.24 This information could be useful for three reasons.
First, it allows for a test of the decision-quality independence assumption, described in Section 6.3 below.
Second, it is useful for tracing out the relationship between preferences and propensity to optimize, which
can be used to make inferences about preferences for other groups in the population. Third, we show in
Section 8 that the preferences of this group will affect the optimal choice of the decision-quality environment.
24 There is a clear analogy between this result and that of Imbens and Angrist [1994], who show that under similar assumptions,
an IV identifies the average parameter of interest for compliers.
19
Proposition 6 Define YC ≡
E[yi |dx ,zh ]−E[yi |dx ,zl ]
E[yi |dx ,zh ]+(1−E[yi |dy ,zh ])−{E[yi |dx ,zl ]+(1−E[yi |dy ,zl ])} .
Frame separability (18
and 19), the consistency principle (20), frame monotonicity (21), decision-quality exclusivity (22), (23),
the existence of contingent optimizers (24), and unconfoundedness (25) imply that YC = E[φi | ψi (zh ) =
1, ψi (zl ) = 0].
Proof: See Appendix.
Discussion of Proposition 6
One way to understand Proposition 6 is by analogy to the technique of
instrumental variables in empirical economics. In particular, under our framework, z serves as an instrument
for optimizing. The “effect” of optimizing (i.e. the difference in aggregate choice behavior among decisionmakers who optimize versus those who do not) is related to the fraction of the population that prefer y to
x.25 The assumption of frame separability for z (19) and decision-quality state exclusivity (22) correspond
to the standard exclusion restriction for instrumental variables: the instrument must affect the outcome
only through the desired channel. Similarly, unconfoundedness (25) corresponds to the assumption that
the instrument is uncorrelated with unobserved confounding variables. Finally, (23) corresponds to the
monotonicity assumption required for the IV estimator to recover the local average treatment effect [Imbens
and Angrist, 1994]. Given these similarities, it is not surprising that the estimator (YC ) itself corresponds
to a standard Wald statistic: the numerator measures the aggregate change in choice behavior induced by
the instrument, and the denominator scales this value by the change in the fraction of optimizers between
the two levels of z.26
If one were interested only in measuring the aggregate preferences of the largest possible group, applying
Proposition 1 to decisions observed under zh would accomplish that goal. In contrast, the primary value of
Proposition 6 is that it offers a method for shedding light on the empirical relationship between preferences
and propensity to optimize in the population. That is, rather than simply evaluating consistently revealed
preferences under a single decision-quality state, YC is identified from the change in aggregate choice behavior
between zl and zh . And as a result, it provides preference information about a group of decision-makers
selected based on the level of z at which they begin to optimize. Another use for Proposition 6 is motivated by
the optimal decision-quality model set out in Section 8. In that model, the welfare benefits associated with
moving from a lower to a higher decision-quality state (thus inducing more decision-makers to optimize)
25 That is, when no one optimizes, no one picks y under d . When everyone optimizes, then φ fraction of decision-makers
x
choose y. The “effect” of optimizing (under frame dx ) corresponds to moving from the former situation to the latter.
26 Another way to understand the identification in Proposition 4 is as follows. If one could observe the effect on aggregate
behavior of moving from a state in which no one optimizes to a state in which everyone optimizes, it would be straightforward
to back out φ. The problem in practice is that states of the world in which all decision-makers optimize are rarely observed.
By observing one state that is “closer” to full optimization than another, we can recover preferences by scaling the difference
in aggregate behavior between states by the difference in optimizing behavior induced by the decision-quality instrument. We
analyze this method for recovering population preferences formally in Section 6.4.
20
depends on the preferences of the decision-makers whose optimizing behavior is affected by the change.
Proposition 4 shows that this quantity corresponds to YC .
Given decision-quality state monotonicity, we can divide the population of decision-makers into three
groups:27 Always-optimizers (A) who optimize at zl and zh , Never-optimizers (N) who do not optimize at zl
or zh , and Contingent-optimizers (C) who optimize at zh but not at zl . Let φj denote average preferences for
each group j ∈ {A, N, C}. That is, φA = E[φi |ψi (zl ) = 1], φN = E[φi |ψi (zh ) = 0], and φC = E[φi |ψi (zl ) =
0 , ψi (zh ) = 1]. Note that we can identify φA using Proposition 1 (restricted to choices observed at zh or zl ).
Thus the difficulty for welfare analysis lies in making inferences about φN . Although φN may never be directly
observed (since by assumption we do not observe that group of decision-makers choosing consistently), we
may make inferences about it by extrapolating the observed relationship between preferences and propensity
to optimize among decision-makers that we do observe. To take a simple example, if φA < φC , we may be
willing to assume that φN will similarly be greater or equal to φC . If so, we can infer a lower bound for φ
by setting φN = φC . In contrast, if we had instead observed that φA > φC , then setting φN = φC would
generate an upper bound on φ instead.
Table 3 illustrates the technique described in Proposition 6 for hypothetical data for the privacy controls
example. We suppose that when privacy settings can be changed only by navigating through several web
pages, individuals choose according to the moments in Table 1. When privacy settings are one click away
from the home page and alterable when a user sets up her account, individuals are less susceptible to framing
effects, and E[yi |dy ] = 0.55 and E[yi |dx ] = 0.45.28 We can back out the fraction of consistent choosers and
the fraction of the consistent choosers at either zl or zh who prefer y using Proposition 1, as before. Note
that the fraction of consistent choosers who prefer y is lower under zh than zl , because E[yi |dy ] changed
by more than E[yi |dx ]. This observation would suggest, intuitively, that the 20 percent of choosers on the
margin of optimizing between zh and zl are less likely to prefer y than the typical individual optimizing at
either zh or zl . When we apply Proposition 6, we see that, indeed, the fraction of contingent optimizers who
prefer y is 0.35. In the privacy controls example, these results would imply that contingent optimizers are,
far less likely than the always-optimizers to prefer that a company be allowed to collect and use their data.
27 As
mentioned above, our notation emphasizes the analogy to the Imbens and Angrist [1994] framework.
this example, one can also imagine useful variation in decision quality state coming from the length and readability of
the privacy policy, whether the policy frames the company’s use of user data as a loss or a gain (i.e. whether this enhances or
mitigates the bias coming from the default), or the amount of extraneous information visible when users are manipulating their
privacy settings.
28 In
21
Table 3: Average Choices by Frame and Difficulty of Changing Privacy Settings
Hard to Change (zl )
Easy to Change (zh )
Fraction choosing y under dy , E[yi |dy ]
0.70
0.55
Fraction choosing y under dx , E[yi |dx ]
0.40
0.45
Fraction consistent, E[ψ(z)]
0.70
0.90
Fraction of consistent preferring y, E[φi |ψi (z) = 1]
0.57
0.50
Fraction of contingent optimizers preferring y, φ̄C
0.35
As the above discussion reveals, any extrapolation from φA and φC will be quite coarse with just two data
points. The following corollary illustrates how observing choice behavior at a wider range of decision-quality
states can provide a more precise understanding of the relationship between preferences and propensity to
optimize, and hence, a more reliable basis for extrapolation to φ.
Corollary 6.1: Multiple Observed Decision-Quality States
Suppose choices are observed under
decision-quality states z0 , z1 , ... zN , such that for k = 0, ..., N − 1, we have ψi (zk ) ≤ ψi (zk+1 ) ∀i and
i
for j ∈ (x, y), k = 0, ..., N , m = 0, ..., N . Let
∃i s.t. ψi (zk ) < ψi (zk+1 ). Assume (φi , ψi (zk )) ⊥ dij , zm
Yk,m ≡
E[yi |dx ,zm ]−E[yi |dx ,zk ]
E[yi |dx ,zm ]+(1−E[yi |dy ,zm ])−{E[yi |dx ,zk ]+(1−E[yi |dy ,zk ])}
for 0 ≤ j < k ≤ N and let Y0 ≡
E[yi |dx ,z0 ]
E[yi |dx ,z0 ]+1−E[yi |dy ,z0 ] .
Assumptions (18), (19), (21),(20), and (22) imply that Yk,m = φk,m and Y0 = φ0 , where φk,m ≡ E[φi |ψi (zk ) =
0 , ψi (zm ) = 1] and φ0 = E[φi |ψi (z0 ) = 1].
Proof: See Appendix.
Discussion of Corollary 6.1
Corollary 6.1 provides a method to trace out differences in the preferences
of decision-makers who begin to optimize at different decision-quality states. For example one could observe
qualitative features of the relationship between decision-makers’ propensity to optimize and their preferences
for x versus y, such as whether it appears linear, concave, etc. One could also estimate this relationship
econometrically in order to extrapolate it out of sample, to the preferences of those decision-makers who
were inconsistent at each observed z. Section 7 describes an approach along these lines.
6.3
Over-Identifying Test of Decision-Quality Independence
This section develops a test for whether decision-quality independence holds in a particular setting. The test
may be applied in situations in which one can observe exogenous variation in some factor that affects the
propensity of decision-makers to optimize. Later in the paper, Section 7 will show how such variation may
be exploited to estimate population preferences when decision-quality independence fails to hold.
22
Now that the decision-quality state has been added into the model, the decision-quality independence
assumption becomes:
φi ⊥ ψ(zk ) , k = h, l
(26)
Assuming that Assumptions (18) – (25) are satisfied, decision-quality independence may be tested based
on the following condition:
Proposition 7 Under Assumptions (18) – (25), decision-quality independence (26) is satisfied only if the
following quantities are equal:
1.
E[yi |dx ,zh ]
E[yi |dx ,zh ]+1−E[yi |dy ,zh ] ,
2.
E[yi |dx ,zl ]
E[yi |dx ,zl ]+1−E[yi |dy ,zl ] ,
3.
E[yi |dx ,zh ]−E[yi |dx ,zl ]
E[yi |dx ,zh ]+(1−E[yi |dy ,zh ])−{E[yi |dx ,zl ]+(1−E[yi |dy ,zl ])}
and
Proof: See Appendix.
Discussion of Proposition 7
Proposition 7 states a necessary condition for decision-quality independence
to hold, which may be tested given observed variation in some factor that satisfies the assumptions to be
a decision-quality state. To understand the rationale behind the test, note that the numerator of the first
of the three quantities measures the fraction of decision-makers at zh who choose y consistently (i.e. under
both frames). Similarly, the denominator of the first quantity measures the fraction of decision-makers at zh
who consistently choose x or y. Thus the first quantity denotes the fraction of optimizing decision-makers
who prefer y fraction of those who choose consistently, evaluated at zh . The second quantity denotes the
corresponding quantity, evaluated at zl . Moving from zl to zh increases the number of decision-makers
who optimize. But if decision-quality independence holds, moving from zl to zh should not affect the
fraction of decision-makers preferring y to x. In contrast, when this assumption fails to hold, optimizing
behavior is systematically related to decision-makers’ underlying preferences for x versus y, causing these
two quantities to diverge. As discussed above, the third quantity is the preferences of individuals on the
margin of optimizing between zh and zl . Under decision-quality independence, these individuals should also
have the same preferences as all individuals optimizing at zh or zl . The comparison of the third quantity with
either of the first two is an analogue of Hausman’s [1978] test of endogeneity using instrumental variables
(see also Wu [1973]).
Note, however, that that the test is necessary and not sufficient. Even if decision-quality independence
fails to hold, it could be the case that individuals induced to optimize by a change to zh happen to have the
23
same preferences as those optimizing at zl . Also note that, from a statistical perspective, if the number of
individuals induced to optimize or the change in the fraction who prefer x to y at a different z is sufficiently
small compared to the overall size of the optimizing population, then we will not have sufficient statistical
power to test the decision-quality independence assumption by comparing the first two quantities in Proposition 5. In this sense, the test comparing the third quantity to the first two is the higher-power test, since
it requires only that the change in the population of optimizers be large enough to identify YC precisely, not
that the population of optimizers be much larger at zh than at zl .
When decision-quality independence does not hold, recovering population preferences is harder because
one cannot impute the revealed preferences of those who optimize to the rest of the population. However,
the recovery of the preferences of several localized subsets of individuals, as in Propositions 6 and 6.1,
suggests a way forward even in the absence of decision-quality independence. The next section explores such
an approach, in which we impose more structure on the relationship between preferences and optimizing
behavior to overcome this difficulty.
7
Using the Observed Preference-Decision-Quality Relationship
to Estimate Preferences of Inconsistent Decision-Makers
In the previous sections, we sought to impose as little positive structure as possible on the relationship
between preferences and choices. We have shown that under relatively straightforward assumptions, it will
be possible to recover the distribution of preferences in any group of individuals so long as 1) we can infer
their preferences by comparing them to a subset of individuals who optimize (Sections 4.3 through 5), or 2)
we can induce those individuals to optimize in some environment (Section 6). In this section, we impose more
structure on the relationship between preferences and optimization, which yields an approach for recovering
population preferences using a decision-quality instrument.
7.1
Setup
Assume that for any individual i,
Pi = P + θzi + εi
ψi = 1 ⇐⇒ Pi > 0
24
where θ is a parameter, zi is an observable decision-quality variables like before. The variable εi captures
idiosyncratic variation in the propensity to optimize, whose distribution is characterized below. An individual
optimizes only given a sufficiently high value of the propensity to optimize variable Pi . Intuitively, we could
think of zi as an instrument which increases cognitive costs, or increases the strength of the biasing frame.
When the cognitive costs are sufficiently low or the biasing frame sufficiently weak, or the individuals’ own
propensity to optimize sufficiently high, an individual optimizes. Note that Pi does not depend on the frame
d, which reflects that individuals optimize if an only if they are consistent across frames in our model (the
consistency principle). Note also that the decision-quality monotonicity assumption (assumption 23) and,
(because εi will have strictly positive density for all εi ∈ R) the assumption that all changes in z affect some
individuals (assumption 24) will be satisfied here when θ 6= 0.
Next, we assume that preferences are determined according to the following latent variable model.
Mi = M + νi
φi = 1 ⇐⇒ Mi > 0
where νi captures idiosyncratic variation in the preference for y over x. We could also add observables wi to
equation for Mi and/or Pi , which would allow for examination of conditional decision-quality independence
like in Section 5. The assumption that Mi does not depend on zi corresponds to the exclusion restriction
assumption, (19).
We will assume that εi and νi have a bi-variate standard normal distribution, where the normalization
is without loss of generality. For all individuals i,
 
  

 εi 
0 1 ρ
  ∼ N   , 

νi
0
ρ 1
where ρ ∈ (−1, 1) is the correlation between the error terms. Note that the independence of the error terms
and zi and the frame d embeds assumption (25) from before. Note that decision-quality independence is
satisfied if and only if ρ = 0. Note also that the assumption that changes in z are related with choices only
if they cause an individual to optimize, assumption (20), is also embedded here.
The final assumptions which close the model correspond to directly earlier assumptions on the relationship
between biased choices in different frames. We maintain assumptions about frames, assuming that frames
do not affect preferences (18) (since Mi does not depend on d), that frames affect all individuals in a known
25
fashion (21), and that consistent choices reveal preferences (20). Together these three justify the following
identifying assumption:
ci (X, dx , zi ) = y =⇒ ψi (zi ) = 1; φi = 1
ci (X, dy , zi ) = x =⇒ ψi (zi ) = 1; φi = 0
Re-formulating these in the form of the latent variables Pi and Mi , we have
ci (X, dx , zi ) = y =⇒ εi > −P − θzi ; νi > −M
ci (X, dy , zi ) = x =⇒ εi > −P − θzi ; νi < −M
7.2
Population Preference Recovery with Binary z
For each individual we have data on a binary choice, yi , under frame dij and decision-quality variable zki . We
wish to use this information to identify the parameters θ, M , P , and ρ. The following provides a reduced form
method for recovering these parameters and testing decision-quality independence using methods previously
described.
Proposition 
8 Suppose

ψi =
 1 ⇐⇒ P
i > 0 and φi = 1 ⇐⇒ Mi > 0, where Pi = P + θzi + εi ,
 εi 
0 1 ρ
Mi = M + νi ,   ∼ N   , 
, and zk ∈ {0, 1}. Then under frame-monotonicity (21) and the
νi
0
ρ 1
assumption that consistent choices reveal preferences (20):
(6.1) One may identify the parameters P , M , θ, and ρ.
(6.2) E[φi ] = Φ(M ), where Φ(.) is the standard normal cumulative density function.
´ P −θz ´ ∞ BV SN
1
(6.3) E[φi |ψi (zi ) = 0] = ψ(z)
φ
(ε, ν; ρ)∂ν∂ε, where φBV SN (a, b; ρ) is the density func−∞
−M
tion for a bivariate standard normal with correlation coefficient ρ and evaluated at (a, b).
Proof: See Appendix.
Discussion of Proposition 8 Intuitively, we can think of the model here as an analogue of the Heckman
[1979] bi-variate normal model of selection bias. Instead of selecting in and out of the sample, or, in the most
common application, electing to work or not to work, individuals in our model end up choosing consistently
or not. We wish to restrict our analysis to an individual’s choices conditional on her choosing consistently,
because these choices are informative of preferences. Failing to account for selection induces bias if the
determinants of optimizing and the determinants of preferences are correlated. The restriction that z does
26
not affect preferences amounts to an exclusion restriction of the form necessary to avoid identifying the
parameters of the model on functional form alone [Puhani, 2000].
7.3
Population Preference Recovery with Multiple Decision-Quality States
With two values of z, the parameters of the latent variable model were exactly identified. With multiple
values of z, we may employ maximum likelihood estimation to recover the underlying parameters of the
model. In particular, suppose we observe decisions under N + 1 decision-quality states where N > 0;
label these z0 , z1 , ...zN . Suppose that each enters into the propensity to optimize equation but satisfies the
PN
i
exclusion restriction with respect to preferences. ψi = 1 ⇐⇒ Pi > 0, Pi = P + m=1 θj zm
+ εi , where
i
zm
is an indicator equal to one if individual i chooses under 
decision-quality
state

 
m
 εi 
0 1
before, we assume φi = 1 ⇐⇒ Mi > 0, Mi = M + νi , and   ∼ N   , 
0
ρ
νi
Under these assumptions, an individual’s likelihood contribution is
li = I{di = dx ; yi = 1}P r(εi > −P −
N
X
and
zero otherwise. As
ρ
.
1
i
θj zm
; M + νi > 0)
m=1
+I{di = dx ; yi = 1} 1 − P r(εi > −P −
N
X
!
i
θj zm
;
M + νi > 0)
m=1
+I{di = dy ; yi = 0}P r(εi > −P −
N
X
i
θj zm
; M + νi < 0)
m=1
+I{di = dy ; yi = 1} P r(εi > −P −
N
X
!
i
θj zm
;
M + νi < 0)
m=1
where
P r(εi > −P −
N
X
ˆ∞
i
θj zm
;
P r(εi > −P −
φBV SN (e, v; ρ)dvde
νi > −M ) =
m=1
N
X
−P −
PN
m=1
ˆ∞
i
θj zm
;
m=1
ˆ∞
i −M
θj zm
−M
ˆ
φBV SN (e, v; ρ)dvde
νi < −M ) =
−P −
PN
m=1
i −∞
θj zm
Intuitively, the first and second term in li represent individuals who optimize and choose against the frame,
and the second and fourth terms combine individuals who do not optimize with individuals who optimize
but do not choose against the frame.
27
Variation in choices and the consistency of choices identifies each parameter as follows: the parameter P
is implied by what fraction of the population are consistent at z0 . Each θk is identified by observing how the
population of consistent choosers,ψ̄(z), changes as z moves from z0 to zk . The key parameter ρ is identified
by the degree to which changes in z, which cause more individuals to be consistent, also cause individuals
to be more likely to choose y. Given a value of all other parameters, the parameter M is implied by the
fraction of consistent individuals choosing y over x at a given value of z. From these parameters, we can
identify E[φi ] and E[φi |ψi = 0] using Proposition 2.
7.4
Population Preference Recovery under Flexible Functional Form Assumptions
If we we observe multiple decision-quality states z, we can measure the preferences of decision-makers who
optimize at each z using our earlier results. In this section, we illustrate a technique for out-of-sample
prediction of the preferences of the population, and the preferences of optimizers, under the assumption that
the preferences of optimizers can be written as a flexible polynomial in the fraction of optimizers.29
Formally, we assume, as in the Corollary 6.1, that we observe behavior in one of N + 1 decision-quality
states, indexed z0 , z1 , ...zN . Suppose each of these decision-quality states comes from an ordered, continuous
set of decision-quality states, [z, z̄] ⊂ R. For each individual, let zi∗ denote the value of z at which they begin
to optimize, with probability density function f (z ∗ ) and cumulative density function F (z ∗ ). The decisionquality monotonicity assumption means that for any i, z > zi∗ =⇒ ψi (z) = 1, so ψ(z) = F (z). We will
also assume that f (z) > 0 for all z ∈ [z, z̄], so that F (z ∗ ) is strictly increasing and has a well-defined inverse
function. Finally, we will assume that F (z) and the preferences of marginal compliers φ̄C (z) = E[φi |zi∗ = z]
are D-times differentiable, so that we can use Taylor Series approximations of degree D.
Lemma 1 Under these assumptions, there exists a function g(ψ̄) such that given an accurate Taylor Series
expansion of g(ψ̄) of degree D, there exist constants b0 , b1 , ..., bD such that
E[φi |ψ(zk ) = 1] = b0 + b1 ψ̄k + b2 (ψ̄k )2 + ...bD (ψ̄k )D
(27)
Proof: See Appendix.
This lemma implies that we can define an out-of-sample prediction problem strictly in terms of ψ̄(zk ) ≡ ψ̄k .
29 Our approach here shares some similarity to the literature on marginal treatment effects and local average treatment
effects[Heckman and Vytlacil, 2005]. However, note that the non-parametric identification techniques in that literature require
instrumental variables that drive the propensity to participate in the treatment over a range from 0 to 1. But in our context, if
we were able to observe decisions at a decision-quality state variable that induced everyone to optimize, we could simply look
at preferences revealed in that state to recover average preferences for the population.
28
Note that in equation 27, when ψ̄(z) = 1 for some z, we will have E[φ|ψi (zm ) = 1] = E[φi ] can be
written as simply φ̄ = b0 + b1 + ... + bD . This insight forms the basis of the extrapolation procedure we
use. Intuitively, when N = D, we will have D + 1 equations in D + 1 unknowns, so that b0 , ..., bD are just
identified. When N > D, we will have more equations than unknowns, and we can use a best-fit technique
such as least squares to estimate b0 , ..., bD .
To illustrate the technique, we will solve analytically the extrapolation to population preferences when
N = D = 1. Equation 27 then becomes simply:30
E[φ|ψi (z) = 1, z] = α + β ψ̄(z)
Proposition 9
(28)
Suppose choices are observed under decision-quality states zh and zl . For any z, let YA (z) ≡
E[y|dx ,z]
E[y|dx ,z]+1−E[y|dy ,z]
and let ψ̄(z) = E[yi |dx , z] + 1 − E[yi |dy , z]. Under assumptions (18) – (25) and (28),
1. Average preferences in the population as a function of YA (zh ) and YA (zl ) are given by
E[φ] ≈ YA (zh ) +
1 − ψ̄(zh )
[YA (zh ) − YA (zl )]
ψ̄(zh ) − ψ̄(zl )
(29)
2. At any level of ψ̄(z), conditional preferences of optimizers and non-optimizers are given by
ψ̄(zh )YA (zl ) − ψ̄(zl )YA (zh )
YA (zh ) − YA (zl )
+ ψ̄(z)
ψ̄(zh ) − ψ̄(zl )
ψ̄(zh ) − ψ̄(zl )
(30)
ψ̄(zh )YA (zl ) − ψ̄(zl )YA (zh )
YA (zh ) − YA (zl )
+ [1 − ψ̄(z)]
ψ̄(zh ) − ψ̄(zl )
ψ̄(zh ) − ψ̄(zl )
(31)
E[φ|ψi (z) = 1] ≈
E[φ|ψi (z) = 0] ≈
3. Average preferences of the population as a function of YC and YA (zh ) are given by
E[φi ] ≈ YA (zh ) +
1 − ψ̄(zh )
[YC − YA (zh )]
ψ̄(zl )
(32)
4. Average preferences of the population as a function of YC and YA (zl ) are given by
E[φi ] ≈ YA (zl ) +
30 This
1 − ψ̄(zl )
[YC − YA (zl )]
ψ̄(zh )
(33)
extrapolation will be exact if
E[φ|ψi (zh ) = 1] − E[φi |ψi (zl ) = 1]
E[φ] − E[φi |ψi (zl ) = 1]
=
∆ψ̄(zh )
1 − ψ̄(zh )
Proof:
Suppose the above condition is satisfied. Let β =
E[φ|ψi (zh )=1]−E[φi |ψi (zl )=1]
,
∆ψ̄(zh )
and let α = E[φ|ψi (zl ) = 1] − β ψ̄(zl ).
Solving these three conditions (the equations for α and β and the assumed condition) yields E[φ|ψi (zl ) = 1] = α + β ψ̄(zl ),
E[φi |ψi (zh ) = 1] = α + β ψ̄(zh ), and E[φi ] = α + β.
29
Proof: See Appendix.
Discussion of Proposition 9
Proposition 9 shows how to recover population preferences using a func-
tional form assumption on the relationship between the average preferences of consistent choosers and the
fraction of consistent choosers. Intuitively, changes in the decision-quality state increase the fraction of
consistent choosers, which allows us to recover the relationship between preferences and the propensity to
optimize. To facilitate its interpretation, we can rewrite equation (29) as follows
φ̄ = YA (zh ) +
πN
[YA (zh ) − YA (zl )]
πC
where πN and πC are the fraction of the population who are never optimizers and contingent optimizers,
respectively. When everyone optimizes at zh , πN = 0 and φ̄ = YA (zh ) as expected. The larger is πN relative
to πC , the more weight we put on the differences between the average preferences of those optimizing at zh
and those optimizing at zl .
Note that equations (30) and (31) apply at any value of ψ̄(z), even values that are not observed in the
data. For example, given data from decision-quality states where 30 percent of individuals choose consistently
and then 80 percent of individuals choose consistently, we could extrapolate to the preferences of optimizers
from continuing to change the decision-quality state until 90 percent of individuals choose consistently (so
ψ̄(z) = 0.9). The next proposition shows that we can also use the estimated preferences of the contingentoptimizers, YC , to identify population preference parameters given a functional form assumption like (28).
The third and fourth equations in this proposition provide intuitive formulas for extrapolation from the
average preferences of contingent-optimizers and optimizers in a given decision-quality state to population
preferences. In addition, Equations (32) and (33) formally justify our intuition that when individuals on the
margin of optimizing between two decision-quality states have substantially different preferences than the
average preference of all consistent individuals, we can obtain only an upper or lower bound on population
preferences φ̄ from YA (z) alone.31 From equation (49), we can see that YC = φ̄C tells us the average
preference of individuals on the margin between optimizing and not between zh and zl , and ∆ψ tells us how
many individuals are on that margin. With the additional functional form assumption embedded in (47),
this information will allow us to extrapolate to obtain the average preferences of never-optimizers, yielding
the average preferences in the population. Finally, equation (32) has intuitive properties. We can re-write
it as
φ̄ = YA (zh ) +
31 Note
πN
[YC − YA (zl )]
πA
that the intuition here does rely on the monotonicity of equation 47.
30
Figure 1: Extrapolation from Preferences of Marginal and Average Optimizers
φ̄
α
β
0.57
0.50
E[φ] ≈ 0.48
2β
E[φi |ψi = 1]
0.35
φ̄C
0.7
0.9
1
ψ̄
When there are no never-optimizers, equation (32) implies that we will have φ̄ = YA (zh ), as in the previous
proposition. When there few individuals who always optimize, πA is small and we will have that φ̄ is
significantly larger than YA (zh ). When YC and YA (zl ) are very different, population preferences should be
expected to be very different from the average preferences of individuals who optimize at zh .
Figure 1 depicts the proposed extrapolation graphically, using moments from Table 3. When E[φi |ψi = 1]
is linear in ψ̄(z), the approximations suggested by Proposition 7 imply that the fraction of the population
preferring y is E[φi ] ≈ 0.48. We show in the appendix that φ̄C is also linear, with twice the slope of
E[φi |ψi = 1]. The figure shows that the preference of the average optimizers is pulled down by the preference
of the marginal optimizer at a given level of ψ̄, and we can use this information, or the change in E[φi |ψi = 1]
directly, to extrapolate to E[φi |ψi = 1] at ψ̄ = 1, which equals E[φi ] since everyone optimizes.
8
Application to the Optimal Default Problem
This section shows how the parameters we focus on in the previous section are relevant for the selection of
an optimal default and an optimal decision-quality state. We deliberately take a broad approach relative to
others who have examined the optimal default problem, such as Carroll et al. [2009], seeking to impose as
little positive structure as possible. We show how a planner, such as a regulator deciding whether privacy
policies should be required to be opt-in or opt-out and how clearly privacy policies must be written or a
benevolent employer selecting a default retirement plan and the width of the enrollment window, can use
choice data to maximize the planners objective.
31
This section derives three key results. First, when decision-makers’ welfare depends only on the outcome
they end up selecting, the optimal frame depends solely on the average preferences of the inconsistent
decision-makers – that is, the quantity E[φi |ψi = 0]. Intuitively, the choice of frame does not affect the
outcomes experienced by the consistent decision-makers, and consequently, the planner should ignore the
preferences of that group when determining the optimal policy. Second, when decision-makers experience
transaction costs associated with selecting an option other than the default, the preferences of the consistent
decision-makers become relevant as well. In particular, the optimal frame depends on the weighted average
of preferences between the consistent and inconsistent decision-makers, where the weights depend on the
size of the transaction costs and the fraction of consistent choosers in the population. Third, we consider
the problem faced by a planner who must decide whether to adopt a (potentially more expensive) decisionquality state. We show that the benefits of doing so depend on the difference between the preferences of the
inconsistent decision-makers at the high decision-quality state, and the preferences of the decision-makers
who would be induced to optimize by the policy change. Intuitively, when this difference is large, more
sometimes-consistent decision-makers benefit from the increase in the decision quality state and the social
planner may be able to provide the never-optimizers with a better default.
8.1
Setup
Assume a continuum population of measure N chooses from a fixed menu X = {x, y}. A benevolent planner
chooses between two frames, dx , and dy . The decision environment is given by z ∈ Z, which could be fixed
(if Z is a singleton set) or chosen at some cost κ(z). For simplicity we assume that the planner cares only
about giving an individual the option she prefers, any transaction cost that individual incurs, and the cost
of implementing the decision environment z.32 The social planner seeks to maximize
ˆ max
d∈{dx dy },z∈Z
I{ci (X, d, z) = φi } − γI{ci (X, d, z) 6= d} di − κ(z)
i
where I{} is an indicator function equal to 1 when the function inside the brackets is satisfied and zero
otherwise; the first term indicates whether individual i chose her preferred option φi ; γ is a transaction
cost incurred if an individual deviates from the default option; ci (X, d, z) 6= d when the individual does not
choose the default option (so d = x when x is the default and d = y when y is the default), and κ(z) is the
32 Alternatively, one could specify the planner’s objective function to account for the intensity of decision-makers’ preferences,
rather than just the ordinal preferences between x and y. That is, the planner would maximize a weighted sum of each individual’s (interpersonally-comparable) utility from her chosen option, ui (ci (X, d, z)). Implementing the solution to the planner’s
problem in this case would require an estimate of the distribution of relative valuation of y compared to x, Mi = ui (y) − ui (x),
which would require estimating the distribution of Mi as in the model in Section 7. In this case, the units of transactions
cost and decision-quality-environment costs would be the same as the units of the utility function. We could also estimate the
transactions cost directly from choice data in this model, rather than take it as a primitive parameter.
32
cost of implementing decision-quality state z. The units of γ and κ(z) are the number of individuals the
planner would need to give their preferred option to justify incurring a cost of γ or κ(z).
8.2
Results
We prove two simple propositions characterizing the solution to the planner’s problem. The first considers
the optimal default when the decision-quality state is fixed. The second considers the joint choice of the
optimal default and the optimal decision-quality state, assuming for simplicity that there are no transactions
cost.
Proposition 10
Suppose Z is singleton, Z = {z}. Assume the planner observes choices and assumes frame separability, the
consistency principle, frame monotonicity, and unconfoundedness. Let φ̄N = E[φi |ψi (z) = 0], let φ̄A =
E[φi |ψi (z) = 1], and let ψ̄(z) = E[ψi (z)]. The planner should choose dy iff
φ̄N (1 − ψ̄(z)) + γ(1 − φ̄A )ψ̄(z) >
1 + (γ − 1)ψ̄(z)
2
(34)
Proof: See Appendix.
Discussion of Proposition 10
The optimal default will be option y when 1) the number of non-optimizers
who prefer y is large, and 2) the number of optimizers who prefer y is large. The first group is helped by
the default being y, since the default directly influences their choice. The second group is harmed by the
default being y, since they will not incur a transaction cost to receive their preferred option. The first term
of the left-hand side of equation (34) is the size of the first of these groups, and the second term is the size of
the second group. The right-hand side tells us how large the number of individuals helped by the default’s
being y must be for dy to be optimal. Note that when γ = 0, the condition for optimality of dy simplifies to
simply φ̄N > 12 . When the planner does not care about transaction costs borne by optimizers, she seeks only
to give as many of the non-optimizers as possible their optimal choice. The larger are γ or ψ̄(z), the more
weight the planner’s decision places on the welfare of optimizers. One limitation of the approach taken here
is that we assume the planner knows γ. None of our methods speak to how γ may or may not be revealed by
choice data. We discuss this issue and the related problem of cognitive costs in the conclusion to the paper.
33
Proposition 11
Suppose the planner has recourse to two decision environments, Z = {zh , zl }. Suppose ψi (zh ) ≥ ψi (zl ) for
all i, and ∃i, ψi (zh ) > ψi (zl ). Suppose κ(zh ) > κ(zl ), and let ∆κ =
1
N [κ(zh )
− κ(zl )] be the change in
per-person cost of the increasing decision-quality state to zh . Assume frame separability over z and frame
exclusivity. Continue to assume frame irrelevance over d, the consistency principle, frame monotonicity,
and unconfoundedness. Suppose that γ = 0. Then solution to the planner’s problem is given case-wise by
1. (dy , zl ) if
(a) φ̄N >
1
2
and
φ̄N πN +φ̄C πC
πN +πC
> 21 , and ∆κ > (1 − φ̄C )πC , OR if
(b) φ̄N >
1
2
and
φ̄N πN +φ̄C πC
πN +πC
<
1
2
and ∆κ > φ̄C πC + (2φ̄N − 1)πN
(a) φ̄N <
1
2
and
φ̄N πN +φ̄C πC
πN +πC
<
1
2
and ∆κ > φ̄C πC , OR if
(b) φ̄N <
1
2
and
φ̄N πN +φ̄C πC
πN +πC
>
1
2
and ∆κ > πC (1 − φ̄C ) + (1 − 2φ̄N )πN ,
(a) φ̄N <
1
2
and
φ̄N πN +φ̄C πC
πN +πC
<
1
2
and ∆κ < φ̄C πC
(b) φ̄N >
1
2
and
φ̄N πN +φ̄C πC
πN +πC
<
1
2
and ∆κ < φ̄C πC + (2φ̄N − 1)πN
(a) φ̄N >
1
2
and
φ̄N πN +φ̄C πC
πN +πC
>
1
2
and ∆κ < (1 − φ̄C )πC , OR if
(b) φ̄N <
1
2
and
φ̄N πN +φ̄C πC
πN +πC
>
1
2
and ∆κ < (1 − φ̄C )πC + (1 − 2φ̄N )πN
2. (dx , zl ) if
3. (dx , zh ) if
4. (dy zh ) if
where φ̄A = E[φi |ψi (zl ) = ψi (zh ) = 1], φN = E[φi |ψi (zl ) = ψi (zh ) = 0], φC = E[φi |ψi (zl ) = 0, ψi (zh ) = 1],
πA = ψ̄(zl ), πN = 1 − ψ̄(zh ), and πC = ψ̄(zh ) − ψ̄(zl ).
Proof: See Appendix.
Discussion of Proposition 11 The planner should switch to zh from zl if the number of individuals
who receive their preferred option increases by enough to justify the increase in implementation cost ∆κ.
In general there are two possibilities for the solution to the problem in this proposition: the optimal choice
of default either depends on the choice of decision-quality environment or it does not. In the latter case,
switching to zh from zl helps only those individuals who optimize at zh but not zl (group C), and who prefer
34
the non-default option. Parts (1a), (2a), (3a), and (4a) correspond to this situation. In the second situation,
the optimal default changes as the planner increases from zl to zh . This occurs if the individuals who switch
to optimizing at zh (group C) have different average preferences from the group who never optimize (group
N). In this case, moving from zh to zl not only gives individuals in group C who prefer the non-default
option their preferred option, but it also allows the planner to set a better default for Group N. For example,
suppose the planner would want to set dx in zl but dy in zh . This corresponds to Parts (1b) and (3b) of the
proposition. Then we must have φC <
1
2
and φN > 12 , and the preferences of the C group dominate when
determining optimal policy under zl , which would occur if there are more of them or their preferences are
more homogenous. In this case the benefit of switching to zh includes not only the benefit of giving those
in group C who prefer the non-default option their preferred option, but also the benefit of setting a default
which is more in accordance with the preferences of the remaining group who do not optimize, group N.
How large this benefit is depends on how far φN is from
1
2
(i.e. how bad the previous default was for this
group) and the size of group N.
9
Generalization to Ordered Choices with Two Frames
This section develops an approach for preference recovery over larger menus, generalizing the theory from
earlier in the paper, to illustrate that our results are useful outside the context of binary choices. There are
many interesting possibilities for generalizations, but we focus here on choice situations g ∈ G consisting of
a fixed, finite menu of ordered choices X = {x1 , ..., xK } and one of two frames, d ∈ {dh , dl }. Intuitively, one
can think of a “high” frame and a “low” frame. For example, we might suppose that an individual chooses
from a menu of insurance plans, ordered from low-cost, low-benefit plans to high-cost, high-benefit plans,
and the frame either emphasizes or de-emphasizes the individual’s risk of serious illness. We will assume
that we observe each individual i in exactly one frame, denoted dij as before, for j ∈ {h, l}. Recall that in the
binary case, the consistency principle and frame monotonicity imply that individuals who choose the “low”
option in the “high” frame prefer the low option. We will use this same intuition to develop an identification
strategy for the non-binary setting.
The preferences of agent i are represented by choice function mi : G −→ X. We continue to assume
frame separability:
∀i, mi (X, dh ) = mi (X, dl )
(35)
and we will suppress the irrelevant input, writing individual i’s optimal choice as mi (X). Define yi (dj ) as
35
follows for k = 1, ..., K:
yi (dj ) = k ⇐⇒ ci (X, dj ) = xk
We strengthen the frame monotonicity assumption as follows
∀i, y(dh ) ≥ y(dl )
(36)
The frame monotonicity assumption imposes an implicit ordering on the menu and assumes that all individuals are pushed in the same direction by the frames. We introduce notation to encode preferences in a
similar fashion to yi (.). Define yi∗ as follows for k = 1, ..., K :
y ∗ = k ⇐⇒ mi (X) = xk
We also strengthen the consistency principle with the following assumption, which we will call the
partition-consistency principle:
∀i, yi (dl ) ≥ k =⇒ yi∗ ≥ k
(37)
∀i, yi (dh ) ≤ k =⇒ yi∗ ≤ k
(38)
The name of this assumption comes from the following: suppose that we partition the menu into X 0 =
{xJ , ...xK } and X 00 = X \ X 0 , for some J and K ≥ J. If the individual consistently chooses within X 0 across
both frames, so c(X, dh ) ∈ X 0 , and c(X, dl ) ∈ X 0 , then assumptions (37) and (38) imply that m(X) ∈ X 0 .
Note also that the partition consistency principle implies the consistency principle used in previous sections:
if ci (X, dh ) = ci (X, dl ), then assumption (38) implies that ci (X, dh ) = mi (X). Note also that the partition
consistency principle and frame monotonicity together imply that ∀i, yi (dh ) ≥ yi∗ ≥ yi (dl ).
Similarly to before, we will indicate whether individual i prefers option k by φki ≡ I{mi (X) = xk }, and
denote the fraction of the population preferring option k by φ̄k . For each k = 1, ..., K, we define partition
consistency at k, ψik , as follows
ψik ≡ I{yi (dh ) ≤ k and yi (dl ) ≤ k} + I{yi (dh ) > k and yi (dl ) > k}
Intuitively, ψik captures whether an individual consistently chooses an option above or below k. Note also
that frame monotonicity implies that one of the conditions inside each indicator function will be implied by
the other condition. We denote the fraction of individuals who are partition consistent at k by ψ̄ k ≡ E[ψik ].
Finally, we assume unconfoundedness, which here requires that frames are independent of preferences
36
and partition consistency at k for every k:
∀k = 1, .., K, φki , ψik ⊥ dij
(39)
Proposition 12 Let Gj (k) ≡ P (yi (dj ) ≤ k|dij = dj ) for k = 1, ..., N, j = h, l and let Gj (0) ≡ 0. Let
Yk ≡
Gh (k)
Gh (k)+1−Gl (k)
for k = 0, ..., K. Frame separability (35), frame monotonicity (36), partition consistency
(38), and unconfoundedness (39) imply that for k = 1, ..., K,
(12.1) The fraction of partition-consistent individuals at k with yi∗ ≤ k is given by
P (yi∗ ≤ k|ψik = 1) = Yk
(12.2) The fraction of partition-consistent individuals at k is given by
ψ̄ k = Gh (k) + 1 − Gl (k)
(12.3) The fraction of the population who prefer option k is bounded as follows:
φk ∈ [Gl (k) − Gh (k − 1), Gh (k) − Gl (k − 1)]
(12.4) If we additionally assume strong decision-quality independence:
0
∀k, k 0 , cov(φki , ψik ) = 0,
then the fraction of the population who prefer option k is
φk = Yk − Yk−1
Proof: See Appendix.
Discussion of Proposition 12
If we partition the menu of choices into options above and below some
option xk , then frame monotonicity and the partition-consistency principle transform the problem to a binary
problem, allowing us to use earlier propositions to identify individuals whose preferred choice is above or
37
below xk . The first two results, (12.1) and (12.2), are therefore the analogue of Proposition 1 in this setting.
With only two observed frames, we cannot identify the fraction of individuals who are consistent across
frames or the fraction of consistent individuals who prefer each option. An individual’s choices in this
setting will not indicate whether her choices are consistent across frames under the assumptions we make,
but in some cases choices indicate that an individual is partition-consistent. Namely, choosing in the lower
partition under dh or choosing an option in the upper partition under dl will imply that an individual is
partition-consistent with respect to that xk . As such, we can gain insight into preferences of several subsets
of the population using the alternative property of partition consistency.
Return to the insurance example described above, where the frame either emphasizes or de-emphasizes
the risk of serious illness. When some individuals choose a low-benefit, low-cost plan under the frame that
emphasizes the risk of serious illness, our assumptions imply that they prefer an option with costs and
benefits at least as low as the ones they chose. The first two results allow us to estimate the fraction of
decision-makers who consistently choose an insurance plan that is above or below some specified cost-benefit
level, and among those people, how many prefer the low-cost plan.
As before, we can also bound population preferences, reflected in (12.3). In this case, the many-options
problem has a new and interesting structure relative to the binary case. In particular, even if individuals
are highly susceptible to framing effects when they prefer some option far away from xk , our estimate for
the fraction of people preferring some option xk can still be precise because we are able to use the partition
consistency principle to ignore individuals highly subject to framing effects far away from k.
Finally, with a stronger version of the decision-quality independence assumption, we can recover the
distribution of preferences for the full population. Strong decision-quality independence guarantees that the
tendency to be partition consistent for any partition is unrelated to an individuals’ preferences. This assumption implies our earlier definition of decision-quality independence, since individuals who are consistent
across frames will be partition consistent for all partitions. However, the previous concept of decision-qualityindependence is insufficient for the recovery of population preferences in the two-frames situation because the
only useful notion of consistency in this setting is partition consistency. If we were to assume that individuals
are partition consistent only if they are fully consistent across frames, which is trivially true in the binary
case, then the two assumptions about decision-quality independence would be equivalent. Under strong
decision-quality independence, obtaining the preferences of partition-consistent individuals from (12.1) will
yield the cumulative distribution of optimal choices in the population. Using standard statistical techniques,
we can then recover the full distribution of population preferences.
In the insurance example, consider the individuals choosing the lowest-cost, lowest-benefit plan after
having the risk of serious illness emphasized, and individuals not choosing this plan after having the risk of
38
serious illness de-emphasized. The proportion of individuals in the first group to individuals in both groups
will be, under strong decision-quality independence, the fraction of the population who prefer the lowest-cost
plan. Proceeding similarly for any partition between relatively low-cost plans and relatively high-cost plans
yields the fraction of the population who prefer one of the low-cost plans, which is the cumulative distribution
of optimal choices. Because we can recover the cumulative distribution at every possible partition, we can
recover the distribution of optimal choices in the population.
The equivalence of this problem to the binary problem directly implies that we could generalize other
identification strategies from the binary case. For example, we can identify the preferences of the population
using observables via a conditional strong decision-quality independence assumption (the generalization of
Proposition 5), and in the absence of any decision-quality independence assumptions we can recover the
preferences of K groups of decision-makers who are contingently-consistent at k between two decision-quality
states zh and zl (Proposition 6).
10
Conclusion
Recovering preferences from choice data is a fundamental problem in behavioral economics; the presence
of systematic “choice-reversals” casts doubt on the revealed preference approach that underlies neoclassical
welfare analysis. We relax the standard revealed preference approach to accommodate the evidence that
decision-makers sometimes choose differently based on preference-irrelevant features of the choice situation.
Like Bernheim and Rangel [2009], there is a sense in which our relaxation of the standard approach is the
minimum required to accommodate the observed choice inconsistencies; that is, we assume that decisionmakers who choose consistently across frames are revealing their true preferences. By imposing additional
structure on the problem in the form of a frame monotonicity assumption, the problem of preference recovery
is transformed into a problem of endogeneity: whether an individual reveals her preferences through choice
may depend on her preferences over the objects being chosen. In many ways, this transformed problem
is both more familiar and more tractable: over the last 50 years, economists have developed a wide range
of tools for dealing with endogeneity in the recovery of parameters of this sort. This paper shows how
many of these tools can be adapted to the problem of identifying preferences in the presence of inconsistent
decision-making.
An important feature of our approach is its reduced-form nature. Within the wide range of models
consistent with our frame-monotonicity assumption, the basic identification problem – i.e., understanding
the empirical correlation between decision-makers’ preferences and their optimizing behavior – is the same
regardless of the specific structural model generating behavior. On the other hand, our approach is not a
39
replacement for traditional behavioral models. As in other areas of empirical economics, the parameters
identified by reduced-form approaches depend on the underlying structural model that generates behavior.
In particular, understanding the underlying structural model provides guidance about which types of control
variables are needed for conditional decision-quality independence to hold and about which types of variation
constitute valid decision-quality state variables. The Appendix considers these questions within a range of
positive models that could explain framing effects in particular applications. The Appendix also shows,
intuitively, that the more rational are the decision-makers, the less likely it is that preferences and the
propensity to optimize will be independent.
The framework studied here can also be thought of as a special case of a more general approach, in which
an observer first identifies the preferences of a reference group of decision-makers whose choices, under some
assumptions, will reveal their preferences, and then extrapolates those preferences to the population. In
our approach, the reference group consists of those decision-makers who choose consistently across frames.
This choice of reference group allows us to avoid ex ante assumptions about which decision-makers are likely
to optimize.33 In other applications the reference group might consist of experts, experienced choosers,
or those thought to be immune to the framing effect in question [Johnson and Rehavi, 2013, Bronnenberg
et al., 2013].34 The approaches we have proposed may be utilized in such contexts; for example, one might
want to adjust the recovered preferences of experts based on observable characteristics before extrapolating
those preferences to the rest of the population. Similarly, given observed variation in a decision-quality
instrument, one could test the assumptions required for extrapolation in such applications along the lines
we have proposed.
Although our focus has been on choice data, the methods we propose here apply equally well to situations
in which survey response data reflect framing effects, such as sensitivity to question phrasing or the order
in which answers are displayed. For example, assuming such framing effects satisfy monotonicity, one could
apply Corollary 1 to Proposition 1 to recover the responses of those respondents whose answers do not vary
by frame. We explore such issues in Goldin and Reck [in progress].
The methods described here are subject to important limitations. First, in certain applications, contrary
to our assumptions, consistent choices may not in fact reveal preferences. For example, even decisionmakers who consistently choose one retirement plan over another, regardless of the default option, may
still be choosing sub-optimally based on, for example, present bias. Similarly, biases in judgment and
33 Nevertheless, incorporating such ex ante information (when available) into the analysis may be desirable in applications
where there is reason to suspect that even consistent decision-makers are choosing sub-optimally, so that the consistency
principle fails.
34 Another interesting example is [Handel and Kolstad, 2013]. These authors explicitly make an assumption about the
relationship between risk preference and information about insurance which is a parametric version of what we call conditional
decision-quality independence.
40
perception – such as over-optimism or a tendency to underweight low-risk events – may manifest themselves
consistently across frames. Accurately identifying preferences in such contexts requires moving further away
from observed choice behavior, along the lines of the models proposed in Rubinstein and Salant [2012].
Another limitation is that the ordinal preferences over menu objects that our approach identifies may
not be the only preferences that are welfare-relevant in a particular application. For example, the analysis
in Section 8 indicated that the optimal choice of frame may depend on the relative magnitude of utility costs
incurred by choosing “against the frame.” Without further assumptions, our approach cannot identify costs
of this nature from choice data. Put differently, we provide methods for identifying one normatively-relevant
type of preference; in some contexts, other types of preferences will be relevant as well.
Finally, we have focused on the binary choice setting to build intuition about preference identification
over relatively simple choices. Apart from the generalization to ordered discrete choices considered here,
more work remains to be done on preference identification in more complicated choice settings. For example,
with more than two frames, an outside observer would need to impose additional structure beyond frame
monotonicity to recover preferences from observed choices. Further work on harder identification problems
should explore what may be gained by imposing such structure, perhaps drawing from recent work on
revealed attention [Masatlioglu et al., 2012] or salience [Chetty et al., 2009]. Nonetheless, the basic approach
we outline here should provide guidance in more complicated settings as well: as long as an outside observer
can conclude that certain choices made under certain frames accurately reveal the preferences of a subpopulation of decision-makers, outside observers may gain insight by examining endogenous selection into
that sub-population.
References
Alberto Abadie. Semiparametric instrumental variable estimation of treatment response models. Journal of
Econometrics, 113(2):231–263, 2003.
Hunt Allcott and Dmitry Taubinsky. The lightbulb paradox: Evidence from two randomized experiments.
Working paper, National Bureau of Economic Research, 2013.
Joshua Angrist and Ivan Fernandez-Val. Extrapolate-ing: External validity and overidentification in the late
framework. Working paper, National Bureau of Economic Research, 2010.
Robert B Barsky, F Thomas Juster, Miles S Kimball, and Matthew D Shapiro. Preference parameters and
behavioral heterogeneity: An experimental approach in the health and retirement study. The Quarterly
Journal of Economics, 112(2):537–579, 1997.
41
Daniel J Benjamin, Miles S Kimball, Ori Heffetz, and Alex Rees-Jones. What do you think would make you
happier? what do you think you would choose? The American economic review, 102(5):2083, 2012.
B Douglas Bernheim. Behavioral welfare economics. Journal of the European Economic Association, 7(2-3):
267–319, 2009.
B Douglas Bernheim and Antonio Rangel. Beyond revealed preference: choice-theoretic foundations for
behavioral welfare economics. The Quarterly Journal of Economics, 124(1):51–104, 2009.
John Beshears, James J Choi, David Laibson, and Brigitte C Madrian. How are preferences revealed?
Journal of Public Economics, 92:1787–1794, 2008.
Bart Bronnenberg, Jean-Pierre Dube, Matthew Gentzkow, and Jesse Shapiro. Do pharmacists buy bayer?
sophisticated shoppers and the brand premium. Working paper, Yale University, 2013.
Gabriel D Carroll, James J Choi, David Laibson, Brigitte C Madrian, and Andrew Metrick. Optimal defaults
and active decisions. The quarterly journal of economics, 124(4):1639–1674, 2009.
Raj Chetty, Adam Looney, and Kory Kroft. Salience and taxation: Theory and evidence. American Economic
Review, 99(4):1145–1177, 2009.
John Conlisk. Why bounded rationality? Journal of economic literature, 34(2):669–700, 1996.
Angus Deaton. The financil crisis and the well-being of americans. 2012.
Baruch Fischhoff. Value elicitation: is there anything in there? American Psychologist, 46(8):835, 1991.
Marc Fleurbaey and Erik Schokkaert. Behavioral welfare economics and redistribution. American Economic
Journal: Microeconomics, 5(3):180–205, 2013.
Jacob Goldin. Optimal tax salience. Unpublished working paper, SSRN, 2014.
Jacob Goldin and Tatiana Homonoff. Smoke gets in your eyes: Cigarette tax salience and regressivity.
American Economic Journal: Economic Policy, 2013.
Jacob Goldin and Daniel Reck. Survey response inconsistency. Technical report.
Benjamin R Handel and Jonathan T Kolstad. Health insurance for ”humans”: Information frictions, plan
choice, and consumer welfare. Working paper, National Bureau of Economic Research, 2013.
Jerry A Hausman. Specification tests in econometrics. Econometrica: Journal of the Econometric Society,
pages 1251–1271, 1978.
42
James J Heckman. Sample selection bias as a specification error. Econometrica, pages 153–161, 1979.
James J Heckman and Edward Vytlacil. Structural equations, treatment effects, and econometric policy
evaluation1. Econometrica, 73(3):669–738, 2005.
Guido W Imbens and Joshua D Angrist. Identification and estimation of local average treatment effects.
Econometrica: Journal of the Econometric Society, pages 467–475, 1994.
Eric J Johnson, Steven Bellman, and Gerald L Lohse. Defaults, framing and privacy: Why opting in-opting
out1. Marketing Letters, 13(1):5–15, 2002.
Erin M Johnson and M Marit Rehavi. Physicians treating physicians: Information and incentives in childbirth. Working paper, National Bureau of Economic Research, 2013.
Daniel Kahneman, Peter P Wakker, and Rakesh Sarin. Back to bentham? explorations of experienced utility.
The Quarterly Journal of Economics, 112(2):375–406, 1997.
Botond Köszegi and Matthew Rabin. A model of reference-dependent preferences. Quarterly journal of
economics, 121(4), 2006.
Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y Ozbay. Revealed attention. The American Economic
Review, 102(5):2183–2205, 2012.
Sendhil Mullainathan and Eldar Shafir. Scarcity. Times Books, 2013.
Patrick Puhani. The heckman correction for sample selection and its critique. Journal of economic surveys,
14(1):53–68, 2000.
Daniel Reck. Taxes and mistakes: What’s in a sufficient statistic? Unpublished working paper, SSRN, 2014.
Ariel Rubinstein and Yuval Salant. Eliciting welfare preferences from behavioural data sets. The Review of
Economic Studies, 79(1):375–387, 2012.
Yuval Salant and Ariel Rubinstein. (a, f): Choice with frames. The Review of Economic Studies, 75(4):
1287–1296, 2008.
Norbert Schwarz and Gerald Clore. Mood, misattribution, and judgments of well-being: Informative and
directive functions of affective states. Journal of Personality and Social Psychology, 45:512–523, 1987.
Strack F. Kommer D. & Wagner D Schwarz, N. Soccer, rooms and the quality of your life: Mood effects on
judgments of satisfaction with life in general and with specific life-domains. European Journal of Social
Psychology, 17:69–79, 1987.
43
Dean Spears. Economic decision-making in poverty depletes behavioral control. B.E. Journal of Economic
Analysis and Policy, 11, 2011.
Wieland Muller Syngjoo Choi, Shachar Kariv and Dan Silverman. Who is (more) rational? Working Paper,
2013.
Richard H Thaler and Cass R Sunstein. Nudge: Improving decisions about health, wealth, and happiness.
Yale University Press, 2008.
De-Min Wu. Alternative tests of independence between stochastic regressors and disturbances. Econometrica:
journal of the Econometric Society, pages 733–750, 1973.
A
Proofs of Propositions
Proof of Proposition 2
By the law of iterated expectations, we can write:
E[φi ] = E[φi |ψi = 1] p(ψi = 1) + E[φi |ψi = 0] p(ψi = 0)
Because φi ∈ {0, 1} ∀i, we have E[φi |ψi = 0] ∈ [0, 1]. Consequently, p(ψi = 0) ≥ 0 implies
E[φi |ψi = 1] p(ψi = 1) ≤ E[φi ] ≤ E[φi |ψi = 1] p(ψi = 1) + p(ψi = 0)
From Proposition 1, we have E[φi |ψi = 1] = YA ≡
E[yi |dx ]
E[yi |dx ]+1−E[yi |dy ]
and p(ψi = 1) = E[yi |dx ] + 1 −
E[yi |dy ]. Applying these results to the above equation yields
E[yi |dx ] ≤ E[φi ] ≤ E[yi |dx ] + p(ψi = 0)
The result then follows from noting that p(ψi = 0) = 1 − p(ψi = 1) = E[yi |dy ] − E[yi |dx ].
Proof of Proposition 5
Proof of (5.1)
By the law of iterated expectations,
E(φ) = Σw E[φ|w] Pw
Repeating the proof of Proposition 1 while conditioning on w yields
YA (w) =
E[yi |dx , w]
cov(φi , ψi |wi )
= E[φi |wi ] +
E[yi |dx , w] + 1 − E[yi |dy , w]
E[ψi |wi ]
44
(40)
Applying (17) and substituting the resulting expression into (40) yields the intended result.
Proof of (5.2)
First, note that the law of iterated expectations gives
E[φi |ψi = 0] = Σw E[φi |ψi = 0, w]p(w|ψi = 0)
(41)
We will complete the proof by showing that (1) YA (w) = E[φi |ψi = 0, w], and (2) Sj = p(wi = j|ψi = 0).
Lemma 1 : YA (w) = E[φi |ψi = 0, w]
By the definition of conditional probability,
E[φi |ψi = 0, wi ] = p(φi = 1|ψi = 0, wi ) =
E(φi |ψi = 0, wi ) =
p(ψi = 0, φi = 1|wi )
p(ψi = 0|wi )
E[(1 − ψi )φi |wi ]
E[1 − ψi |wi ]
(42)
First, we focus on the numerator of this expression. E[(1−ψi )φi |wi ] = E[φi |wi ]−E[ψi φi |wi ]. Using the identity that E[ψi φi |wi ] = E[ψi |wi ]E[φi |wi ]+cov(ψi , φi |wi ), along with the conditional decision-quality independence assumption (17), lets us write E[(1−ψi )φi |wi ] = E[φi |wi ]−E[ψi |wi ]E[φi |wi ] = E[φi |wi ] (1 − E[ψi |wi ]).
Substituting this result into (42) yields E(φi |ψi = 0, wi ) = E[φi |wi ]. From Prop 4.1, we also have that
WA (w) = E[φi |w], which completes the proof of Lemma 1.
Lemma 2: Sj = p(wi = j|ψi = 0)
First, using Bayes Rule, we have
p(wi = j|ψi = 0) =
p(ψi = 0|wi = j)
Pj
p(ψi = 0)
(43)
From the Corollary to Proposition 1, we have p(ψi = 1) = E[yi |dx ] + 1 − E[yi |dy ]. Hence p(ψi = 0) =
1 − p(ψi = 1) = E[y|dy ] − E[y|dx ]. Additionally, repeating the proof of the Corollary while conditioning on
w yields p(ψi = 1|wi = j) = E[yi |dx , wi = j] + 1 − E[yi |dy , wi = j], so that p(ψi = 0|wi = j) = E[yi |dy , wi =
j] − E[yi |dx , wi = j]. Substituting these results into (43) yields
p(wi = j|ψi = 0) =
E[yi |dy , wi = j] − E[yi |dx , wi = j]
Pj
E[yi |dy ] − E[yi |dx ]
which is the definition of Sj .
Proof of (5.3)
This statement is identical to Lemma 2.
45
Proof of (5.4)
First, write p(w) = p(w|ψ = 1)p(ψ = 1) + p(w|ψ = 0)p(ψ = 0)
Applying Proposition 1, (5.3), and the definition of Pw , we have: Pw = p(w|ψ = 1) (E[y|dx ] + 1 − E[y|dy ])+
E[yi |dy ,w]−E[yi |dx ,w]
P
(E[yi |dy ] − E[yi |dx ])
w
E[yi |dy ]−E[yi |dx ]
Rearranging terms yields the intended result:
p(w|ψ = 1) =
E[y|dx , w] + 1 − E[y|dy , w]
Pw
E[y|dx ] + 1 − E[y|dy ]
Proof of Proposition 3
Note that we can re-write the numerator of (10) as E[φi |ψi = 1] p(ψi =
1) = p(φi = 1|ψi = 1) p(ψi = 1) = p(φi = 1 , ψi = 1)= E[φi ψi ]. Exploiting the identity E[φi ψi ] =
E[φi ] E[ψi ] + cov(φi , ψi ) and re-writing yields (2.1).
By the law of iterated expectations, E[φi ] = p(ψi = 1)E[φi |ψi = 1] + p(ψi = 0)E[φi |ψi = 0]. Substituting
this into (12) and applying Proposition 1 yields (2.2).
Proof of Proposition 6
Step 0: Decision-quality monotonicity (23) allows us to partition the population
into three groups based on whether they optimize. The Always-optimizers (A), the Never-Optimizers (N),
and the Contingent-optimizers (C). Group A will have ψ(zh ) = ψi (zl ) = 1. Group N will have ψi (zh ) =
ψi (zl ) = 0. Group C will have ψi (zl ) = 0 and ψi (zh ) = 1. Denote the share of each group in the population
by πA , πN , and πC . Note that (23) rules out the possibility that ψi (zl ) = 1 but ψi (zh ) = 0. Note also that
πC = p(ψ(zh ) = 1, ψ(zl ) = 0) = E[ψi (zh )(1 − ψi (zl )] = E[ψ(zh )] − E[ψ(zh )ψ(zl )] = E[ψ(zh )] − E[ψ(zl )],
where the last equality follows from (23).35 By the existence of contingent optimizers (24), πC > 0.
Step 1: Assume the biasing frame is dx . By frame separability (18), frame monotonicity (21) and the
consistency principle (20), for any z ∈ {zh , zl },
ci (X, dx , z) = y ⇐⇒ ψi (z) = 1, φi = 1
Consequently,
p(yi = 1|dx , z) = E[yi |dx , z] = E[φi ψi (z)|z]
By the law of iterated expectations,
E[yi |dx , z] = E[φi ψi (z)|ψi (zh ) = 1, ψi (zl ) = 1]πA +E[φi ψi (z)|ψi (zh ) = 1, ψi (zl ) = 0]πC +E[φi ψi (z)|ψi (zh ) = 0, ψi (zl ) = 0]πN ,
35 Specifically,
(23) guarantees that ψ(zh ) = 1 and ψ(zl ) = 1 occur if and only if ψ(zl ) = 1.
46
By direct calculation on this formula where z = zh or z = zl , we have that
E[yi |dx , zh ] = φ̄A πA
E[yi |dx , zl ] = φ̄A πA + φ̄C πC
where E[φi |ψi (zh ) = 1, ψi (zl ) = 1] ≡ φ̄A , E[φi |ψi (zh ) = 1, ψi (zl ) = 0] ≡ φ̄C .
Step 2: Assume the frame is dy . We proceed similarly to the previous step. By (18), (21) and (20), for
any z ∈ {zh , zl },
ci (X, dy , z) = x ⇐⇒ ψi (z) = 1, φi = 0
Consequently,
p(yi = 0|dy , z) = 1 − E[yi |dx , z] = E[(1 − φi )ψi (z)|z]
Using the same approach as the in the previous step, we use the law of iterated expectations and the
definitions of groups A, C, and N to write
1 − E[yi dx , zh ] = 1 − E[ψ(zh )] + φ̄A πA
1 − E[yi dx , zl ] = 1 − E[ψ(zl )] + φ̄A πA + φ̄C πC
Step 3: Now we construct the statistic YC ,
YC ≡
E[yi |dx , zh ] − E[yi |dx , zl ]
E[yi |dx , zh ] + (1 − E[yi |dy , zh ]) − {E[yi |dx , zl ] + (1 − E[yi |dy , zl ])}
Substituting for the expressions for E[yi |dx , z] and 1−E[yi dx , z] in this expression and simplifying, we obtain
YC =
φ̄C πC
E[ψ(zh )] − E[ψ(zl )]
Using the previously derived fact that πC = E[ψ(zh )] − E[ψ(zl )], we have YC = φ̄C .
Proof of Corollary 6.1:
Fix some zk and zm . Note that the first three assumptions here will imply that
assumptions (23), (24), and (25) for Proposition 6 obtain. Proposition 6 then implies that Wk,m = φ̄k,m .
The result that φ0 = E[φi |ψi (z0 ) = 1] follows directly from Proposition 1.
Proof of Proposition 7 Suppose that decision-quality independence is satisfied. Then by Proposition 1,
E[yi |dx ,zh ]
E[yi |dx ,zh ]+1−E[yi |dy ,zh ]
= E[φi ] = φ and
E[yi |dx ,zl ]
E[yi |dx ,zl ]+1−E[yi |dy ,zl ]
47
= φ̄.
By Proposition 6,
E[yi |dx , zh ] − E[yi |dx , zl ]
= E[φi |ψi (zh ) = 1, ψi (zl ) = 0].
E[yi |dx , zh ] + (1 − E[yi |dy , zh ]) − {E[yi |dx , zl ] + (1 − E[yi |dy , zl ])}
Under decision-quality independence, E[φi |ψi (zh ) = 1, ψi (zl ) = 0] = φ̄.
Proof of Proposition 8: First, note that for any z ∈ {0, 1}, E[ψi |z] = p(εi > −P −θz) = 1−Φ(−P −θz) =
Φ(P + θz). Note that our setup satisfies the conditions in Proposition 1 for each z, so that we can measure
E[ψi |z] by ψ(z) ≡ E[yi |dx , z] + 1 − E[yi |dy , z]. Thus we have ψ(z) = Φ(P + θz) for z ∈ {0, 1}. This implies
that
P = Φ−1 (ψ(0))
θ = Φ−1 (ψ(1)) − Φ−1 (ψ(0))
.
Similarly, note that E[φi |ψi (z) = 1] = p(εi > −P − θzi |νi > −M ) =
1
=
ψ(z)
ˆ
ˆ
∞
p(εi >−P −θzi , νi >−M )
p(εi >−P −θzi )
∞
φBV SN (ε, ν; ρ)∂ν∂ε
−P −θz
−M
where φBV SN (a, b; ρ) is the density function for a bivariate standard normal with correlation coefficient ρ
and evaluated at (a, b). By applying Proposition 1 for each z we also have that
YA (z) ≡
E[y|dx , z]
= E[φi |ψi (z)]
E[y|dx , z] + 1 − E[y|dy , z]
Combining these results yields
YA (z) =
1
ψ(z)
ˆ
∞
ˆ
∞
φBV SN (ε, ν; ρ)∂ν∂ε
−P −θz
−M
for z = 0, 1. With two equations, we can solve for the two remaining unknowns (M and ρ).
(6.2) and (6.3) follow directly from the distributions of normally distributed and jointly normally distributed random variables.
Proof of Lemma 1:
Let ni (ψ) indicate whether individual i optimizes when the overall fraction of opti-
mizers is ψ. That is,
ni (ψ) = 1 ⇐⇒ ψi (F −1 (ψ)) = 1 ⇐⇒ zi∗ ≤ F −1 (ψ)
48
Let h(z) ≡ E[φi |zi∗ = z] = p(φi = 1|zi∗ = z), be the average preferences for the marginal optimizers at z.
The object of interest is E[φi |ψi (z) = 1] = E[φi |ni (ψ) = 1], which we can write as
´ z0 =F −1 (ψ)
E[φi |zi∗
≤F
−1
(ψ)] ==
p (φi = 1 , zi∗ = z 0 ) ∂z 0
p zi∗ ≤ F −1 (ψ)
z 0 =z
Using the definition of conditional probability and the definition of F , this becomes
´ z0 =F −1 (ψ)
=
z 0 =z
h(z 0 )f (z 0 )∂z 0
(44)
ψ
Now we can use a change of variables, using F (z 0 ) = ψ̄ 0 and f (z 0 )dz = dψ̄ 0 , to write the numerator as
ˆ
ˆ
z 0 =F −1 (ψ)
0
0
0
ψ =ψ
0
h(z )f (z )∂z =
z 0 =z
0
0
g(ψ )∂ψ
0
ψ =0
where g(ψ̄) = h(F −1 (ψ̄)). Now, we can approximate g(ψ) as a polynomial of degree D using Taylor’s theorem:
D
g(ψ) ≈ a0 + a1 ψ + ... + aD ψ ,36 Substituting this into equation (44) and evaluating the integral yields
1
1
1
2
D
E[φi |ψi (z) = 1] ≈ a0 + a1 ψ + a2 ψ + ... +
ak ψ
2
3
k+1
Letting bj ≡
aj
j+1 ,
we obtain the desired result.
Proof of Proposition 9:
By Proposition 1, given any z,
ψ̄(z)=E[ψi (z)] = E[yi |dx ] + 1 − E[yi |dy ]
YA (z) = E[φi |ψi (z) = 1]
By equation (28) evaluated at zh and then at zl
E[φi |ψi (zh ) = 1] = α + β ψ̄(zh )
(45)
φ̄A = E[φi |ψi (zl ) = 1] = α + β ψ̄(zl )
(46)
36 We know these constants a ....a exist and Taylor’s Theorem applies by the assumption that F (z) and φ (z) = E[φ |z ∗ = z]
0
i i
D
C
are D-times differentiable, along with familiar properties of the derivatives of inverse functions and composite functions. Taylor’s
theorem also indicates that this approximation will be accurate as ψ̄ becomes large, which is intuitive. We will for the moment
ignore the issue of bounding the accuracy of the approximation, which for practical purpose may be relevant.
49
Derivation of 7.1
Solving (45) and (46) for α and β yields
β=
α=
YA (zh ) − YA (zl )
ψ̄(zh ) − ψ̄(zl )
ψ̄(zh )YA (zl ) − ψ̄(zl )YA (zh )
ψ̄(zh ) − ψ̄(zl )
Note that, since ψi = 1 for all individuals when ψ̄ = 1,
E[φ] = α + β.
(47)
E[φ|ψi (z) = 0, z] = α + β[1 − ψ̄(z)]
(48)
Derivation of 7.2
Equation (28) also implies37
Substituting for α and β in equations (47), (28), and (48) and re-arranging yield the desired results.
Derivation of 7.3 and 7.4
Divide the population into always-optimizers, never-optimizers and contingent-optimizers as in Proposition 6. By the law of iterated expectations on E[φi |ψi (zh ) = 1] we can write
(πC + πA )E[φi |ψi (zh ) = 1] = φ̄C πC + φ̄A πA
Substituting what we know about πA and πC , (see the proof of Proposition 4), we can write this as
E[φi |ψi (zh ) = 1]ψ̄(zh ) = φ̄C ∆ψ̄ + φ̄A ψ̄(zl )
(49)
where ∆ψ̄ = ψ̄(zh ) − ψ̄(zl ). Plugging (45) and (46) into (49) yields
[α + β ψ̄(zh )] = ψ̄(zh ) = φ̄C ∆ψ̄ + [α + β ψ̄(zl )]ψ̄(zl )
37 Proof :
By the law of iterated expectations,
φ̄ = E[φ|ψi (z) = 1]ψ̄(z) + E[φ|ψi (z) = 0](1 − ψ̄(z))
Equation (28) and the fact that φ̄ = α + β imply equation (48).
50
(50)
If we use (50) and (45) to solve for α and β, noting that E[φ|ψi (zh ) = 1] = YA (zh ) and φ̄C = YC we obtain
α = YA (zh ) −
β=
ψ̄(zh )
[YC − YA (zh )]
ψ̄(zl )
YC − YA (zh )
ψ̄(zl )
As in the previous proposition, we use that E[φ] = α + β to arrive at equation (32).
If instead we use (50) and (46) to solve for α and β, we obtain
α = YA (zl ) −
β=
ψ̄(zl )
[YC − YA (zl )]
ψ̄(zh )
YC − YA (zl )
ψ̄(zh )
and by adding these two together we obtain equation (33).
Proof of Proposition 10 Note that the planner’s problem above is equivalent to the following
max
d∈{dx dy },z∈Z
p(ci (X, d, z) = φi ) − γp(ci (X, d, z) 6= d}) −
1
κ(z)
N
Since z is fixed by assumption, the solution to the planner’s problem simplifies to the comparison of the
objective function evaluated at dx and dy . We will have that dy is superior if and only if
p(ci (.) = φi |d = dy ) − γp(ci (.) 6= y|d = dy ) > p(ci (.) = φi |d = dx ) − γp(ci (.) 6= x|d = dx )
{z
} |
{z
}
|
{z
} |
{z
} |
1
3
2
(51)
4
We next derive each of these probabilities.
Term 1 of Equation (52)
By the law of iterated expectations,
p(ci (.) = φi |d = dy ) = p(ci (.) = φi |d = dy , ψi = 1)p(ψi = 1|d = dy )+p(ci (.) = φi |d = dy , ψi = 0)p(ψi = 0|d = dy )
When ψi = 1, ci (X, dy , z) = φi always, by the consistency principle. So,
p(ci (.) = φi |d = dy , ψi = 1) = 1
51
When ψi = 0, and d = dy , ci (X, dy , z) = φi ⇐⇒ φi = 1 by frame monotonicity. So
p(ci (.) = φi |d = dy , ψi = 0) = p(φi = 1|dy , ψi = 0)
By unconfoundedness and frame separability, p(φi = 1|dy , ψi = 0) = p(φi = 1|ψi = 0). By unconfoundedness, p(ψi = 1|d = dy ) = p(ψi = 1) and p(ψi = 0|d = dy ) = p(ψi = 0). Collecting terms, we have that the
first term of (51) is
p(ci (.) = φi |d = dy ) = p(ψi = 1) + p(φi = 1|ψi = 0)p(ψi = 0)
Term 3 of Equation 51
We obtain this term symmetrically to the first term. By the law of iterated expectations
p(ci (.) = φi |d = dx ) = p(ci (.) = φi |d = dx , ψi = 1)p(ψi = 1|d = dx )+p(ci (.) = φi |d = dx , ψi = 0)p(ψi = 0|d = dx )
When ψi = 1 ci (.) = φi always, by the consistency principle. When ψi = 0 and d = dx , ci (.) = φi ⇐⇒
φi = 0 by frame monotonicity, thus:
p(ci (.) = φi |d = dy , ψi = 0) = p(φi = 0|dy , ψi = 0)
By unconfoundedness and frame separability, p(φi = 0|dy , ψi = 0) = p(φi = 0|ψi = 0). Finally, we apply
unconfoundedness to p(ψi = 0|d = dx ) and p(ψi = 0|d = dx ) to obtain that the third term of (51) is:
p(ci (.) = φi |d = dx ) = p(ψi = 1) + p(φi = 0|ψi = 0)p(ψi = 0)
Term 2 of Equation 51
By the law of iterated expectations,
γp(ci (.) 6= y|d = dy ) = γ[p(ci (.) 6= y|d = dy , ψi = 1)p(ψi = 1|d = dy )+p(ci (.) 6= y|d = dy , ψi = 0)p(ψi = 0|d = dy )]
Since ci (X, dy , z) = φi when ψi = 1 by the consistency principle, we have ci (X, dy , z) = y ⇐⇒ φi = 1.
So
p(ci (.) = d|d = dy , ψi = 1) = p(φi = 0|ψi = 1, d = dy )
Since ψi = 0 =⇒ ci (X, dy , z) = y by frame monotonicity, we know that p(ci (.) = d|d = dy , ψi = 0) = 0.
52
By unconfoundedness and frame separability, p(φi = 0|d = dy , ψi = 1) = p(φi = 0|ψi = 1) and p(ψi =
1|d = dy ) = p(ψi = 1). So the second term of (51) becomes
γp(ci (.) = φi |d = dy ) = γp(φi = 0|ψi = 1)p(ψi = 1)
Term 4 of Equation (51)
We obtain this term symmetrically to the second term. By the law of iterated expectations
γp(ci (.) 6= x|d = dx ) = γ p(ci (.) 6= x|d = dx , ψi = 1)p(ψi = 1|d = dx )+p(ci (.) 6= x|d = dx |ψi = 0)p(ψi = 0|d = dx )
The second term of this equation will be zero by frame monotonicity. By the consistency principle,
p(ci (.) 6= x|d = dx , ψi = 1) = p(φi = 1|d = dx , ψi = 1)
By unconfoundedness and frame separability, p(φi = 1|d = dx , ψi = 1) = p(φi = 1|ψi = 1) and p(ψi =
1|d = dx ) = p(ψi = 1). So the fourth term becomes
γp(ci (.) 6= x|d = dx ) = γp(φi = 1|ψi = 1)p(ψi = 1)
Last Step
Combining terms and simplifying, we have that dy is optimal if and only if
p(φi = 1|ψi = 0)p(ψi = 0) + γp(φi = 0|ψi = 1)p(ψi = 1) >
1 + (γ − 1)p(ψi = 1)
2
Now note that by definition p(φi = 1|ψi = 0) = φ̄N , p(φi = 1|ψi = 1) = φ̄A , p(ψi = 1) = ψ̄(z), and
p(ψi = 0) = 1 − ψ̄(z). Substituting these terms yields the desired result.
Proof of Proposition 11
When γ = 0 the planner’s objective evaluated at each of the four possible d by
z combinations is
dy , zl : p(ci (.) = φi |d = dy , z = zl ) −
1
κ(zl )
N
dx , zl : p(ci (.) = φi |d = dx , z = zl ) −
1
κ(zl )
N
dy , zh : p(ci (.) = φi |d = dy , z = zh ) −
1
κ(zh )
N
dz , zh : p(ci (.) = φi |d = dx , z = zh ) −
1
κ(zh )
N
53
In the proof of Proposition 10, we showed that these four expressions can be re-written as
dy , zl : p(ψi (zl ) = 1|zl ) + p(φi = 1|ψi (zl ) = 0, zl )p(ψi (zl ) = 0|zl ) −
1
κ(zl )
N
dx , zl : p(ψi (zl ) = 1|zl ) + p(φi = 0|ψi (zl ) = 0|zl )p(ψi (zl ) = 0|zl ) −
1
κ(zl )
N
dy , zh : p(ψi (zh ) = 1|zh ) + p(φi = 1|ψi (zh ) = 0, zh )p(ψi (zh ) = 0|zh ) −
1
κ(zh )
N
dz , zh : p(ψi (zh ) = 1|zh ) + p(φi = 0|ψi (zh ) = 0, zh )p(ψi (zh ) = 0|zh ) −
1
κ(zh )
N
By decision-quality monotonicity and the existence of continent optimizers, we can divide the population
into always optimizers (A), never optimizers (N) and sometimes optimizers (C), exactly as in Proposition 6.
The average preferences in each population are given by φA , φN , and φC , respectively, and the size of each
population is given by πA , πN , and πC , respectively.
By unconfoundedness with respect to z, p(ψi (zl ) = 1|zl ) = p(ψi (zl ) = 1) = πA , p(ψi (zh ) = 0|zh ) =
p(ψi (zh ) = 0) = πN , p(ψi (zh ) = 0|zh ) = p(ψi (zh ) = 0) = 1 − πA = πN + πC . By the law of iterated
expectations
p(φi = 1|ψi (zl ) = 0, zl ) = p(φi = 1|ψi (zl ) = 0, ψi (zh ) = 0)p(ψi (zh ) = 0|ψi (zl ) = 0)+p(φi = 1|ψi (zl ) = 0, ψi (zh ) = 1)p(ψi (zh ) =
which using the definition of conditional probability and various π’s and φ’s will yield
p(φi = 1|ψi (zl ) = 0, zl ) =
p(φi = 0|ψi (zl ) = 0, zl ) =
φN πN + φC πC
πN + πC
(1 − φN )πN + (1 − φC )πC
πN + πC
Our four conditions simplify to
dy , zl : πA + φN πN + φC πC −
1
κ(zl )
N
dx , zl : πA + (1 − φN )πN + (1 − φC )πC −
dy , zh : πA + πC + φN πN −
1
κ(zl )
N
1
κ(zh )
N
dx , zh : πA + πC + (1 − φN )πN −
1
κ(zh )
N
(52)
(53)
(54)
(55)
Note that the first two terms in each of these will be the total number of individuals who receive their
54
preferred option when the planner chooses that (d, z) combination.
First, consider situations where the planner chooses dy regardless of z. This requires (52)¿(53) and
(54)¿(55), which simplify to the first two conditions in (1a) and (4a). The planner will set zh if (54)¿(52),
which simplifies to ∆κ < (1 − φC )πC , which yields (4a). With the inequality reversed, we get (1a).
Second, consider situations where the planner chooses dx regardless of z. This requires (52)¡(53) and
(54)¡(55), which simplify to the first two conditions in (2a) and (3a). Then the planner chooses zh if
(55)¿(53), which simplifies to ∆κ < φC πC . This yields the final condition in (2a) and (3a).
Third, consider the situation where the planner would want to choose dy under zh and dx under zl . This
requires (52)¡(53) and (54)¿(55), which provides the first two conditions in (1b) and (3b). In this situation,
the planner chooses zh if (54)¿(53) and zl otherwise. Performing this comparison, we have that the planner
chooses zh if ∆κ < φC πC +(2φN −1)πN , which is the final condition in (3b). When the inequality is reversed,
we obtain the final condition in (1b).
Finally, consider the situation where the planner would want to choose dx under zh and dy under zl .
This requires (52)¿(53) and (54)¡(55), which provide the first two conditions in (2b) and (4b). In this
situation, the planner chooses zh if (55)¿(52). Comparing these, we see that the planner chooses zh if
∆κ < (1 − φC )πC + (1 − 2φN )πN ,which is the final condition in (4b). When the inequality is reversed, we
obtain the final condition in (2b).
Proof of Proposition 12 Proof of (12.1) and (12.2)
Fix some k ∈ {1, ..., K − 1}. Let X 0 = {x1 , ...xk }and X 00 = {xk+1 , .., XK } Note that we can write the
many-choices problem into a binary menu choice problem between X 0 and X 00 . Similarly, note that frame
separability (35), frame monotonicity (36), partition consistency (37)/(38), and partition unconfoundedness
(39) imply the binary analogues to these assumptions: (1), (3), (2), and (5). As such, (10.1) and (10.2)
follows directly from the application of Proposition 1 to this problem.
Proof of (10.3)
First suppose that k = 1. Applying Proposition 2 to the binary menu choice problem with X 0 = {x1 }
and X 00 = {x2 , ..., xK } implies that
E[φ1 ] ∈ [Gl (1), Gh (1)]
(56)
Note that this confirms the desired result for k = 1 since Gh (0) = Gl (0) = 0 by definition. Next, applying
the same proposition for k = 2, we have φ1 + φ2 ∈ [Gl (2), Gh (2)]. Combined with (56), this implies
φ2 ∈ [Gl (2) − Gh (1), Gh (2) − Gl (1)]
55
(57)
Similarly with k = 3, we have that φ1 + φ2 + φ3 ∈ [Gl (3), Gh (3)], and applying (56) and (57) implies
that φ3 ∈ [Gl (3) − Gh (2), Gh (3) − Gl (2)]. Proceeding recursively, suppose that for some k, we know that for
k 0 < k,
φk0 ∈ [Gl (k 0 ) − Gh (k 0 − 1), Gh (k 0 ) − Gl (k 0 − 1)]
(58)
Then application of proposition 3 to the binary menu choice problem with X 0 = {x1 , ..., xk } yields
φ1 + φ2 + ... + φk ∈ [Gl (k), Gh (k)], so φk ∈ [Gl (k) − (φ1 + φ2 + ... + φ̄k+1 ), Gh (k) − (φ1 + φ2 + ... + φ̄k+1 )].
Applying the lower and upper bounds from (58) and simplifying yields the desired result.
Proof of (10.4)
Along with (10.1), strong decision-quality independence implies that for any k,
P (yi∗ ≤ k|ψik = 1) = P (yi∗ ≤ k) = Yk
(59)
φ̄1 = Y1
(60)
Applying (59) at k = 1 yields
Applying (59) at k = 2 yields φ̄1 + φ̄2 = Y2 and substituting equation (60) yields
φ̄2 = Y2 − Y1
As in the Proof of (10.3), we proceed recursively to obtain the desired result. Given some k, suppose
that for any k 0 < k we have
φ̄k0 = Yk0 − Yk0 −1
(61)
Applying (59) at k yields φ̄1 + φ̄2 + ..., +φ̄k = Yk . Applying (61) for φ̄1 , ..., φ̄k−1 and simplifying yields the
desired result.
B
Positive Models of Framing Effects
In this section, we describe several different positive models of frame-sensitivity, and discuss how the various
methods described in the body of the paper apply to each. We proceed roughly from models imposing the
least rationality on choices, such as models where the variation in whether individuals optimize depends
solely on individual characteristics unrelated to the choice at hand, to models imposing complete rationality,
in which framing effects stem from the presence of neoclassical transaction costs.
Any of these models could potentially explain observed framing effects. Because each model satisfies
56
the assumptions of frame-monotonicity, the irrelevance of frames for preferences over menu items, and that
consistent choices reveal preferences, the methods described in the paper may be applied to each as well. But
as is generally the case, the parameters identified by our reduced form techniques depend on which model is
generating behavior.
In all cases, we assume each decision-maker (DM) i chooses from a menu {x, y}. DM’s valuations of
the two options are given by ui (x) and ui (y), and we write ui (y) − ui (x) ≡ ūi .38 We continue to denote
φi = I{ui (y) > ui (x) and ψi = I{c(X, dx ) = c(X, dy )}. We denote the frame facing DM i by dij ∈ {dx , dy }.
B.1
Optimization Based on Individual Characteristics
In this model, decision-makers optimize whenever the costs of optimizing Ci are below a threshold value
C. Decision-makers who optimize make the same (optimal) choice regardless of the frame whereas decisionmakers who do not optimize choose according to the frame (they select x under dx and y under dy ).39 What
we call the costs of optimizing are very general: Ci may reflect the decision-maker’s expertise (or ignorance)
in the choice being made, the opportunity cost of attention, the cognitive cost of expending mental effort
on the decision, or psychological susceptibility to the frame. In contrast to later models, the variation in
whether decision-makers optimize (i.e. the variation in Ci ) is driven by variation in individual characteristics
among decision-makers as opposed to the specific benefits to optimizing in the particular decision at hand.
Assuming that Ci is distributed in the population with a cumulative distribution function G(.), we will have
that E[ψi ] = G(C̄).
Whether decision-quality independence holds in this model depends on the empirical correlation between
the determinants of optimization behavior, Ci , and individuals’ preferences, represented by ui . In particular,
we will have cov(ψi , φi ) = 0 ⇐⇒ E[Ci |ui ≥ 0] = E[Ci |ui < 0]. Thus a sufficient condition for decisionquality independence is if Ci is distributed independently of ui .40 Assessing decision-quality independence
in the context of such models thus requires considering the individual characteristics associated with framesensitivity and whether those same characteristics are also associated with preferences over x and y. Whether
these conditions hold will depend upon the application. For example, with regard to choices over retirement
savings plans, preferences over savings may be correlated with financial literacy, which may also be correlated
with the latent characteristics driving the variation in optimization (such as cognitive ability). In such
settings, the matching approach of Proposition 2 is most likely to succeed when one can observe the individual
38 The statistics in the body of the paper are concerned only with ordinal preferences of individuals, but in some models
differences in relative utility between individuals will drive some individuals to optimize. Comparing optimizing individuals to
frame-sensitive individuals in these models requires a utility concept comparable across individuals.
39 Note that this directly imposes the assumption of frame monotonicity and that consistent choices reveal preferences.
40 Note that this condition is sufficient but not necessary. For example, decision-quality independence will hold when the
joint distribution of ui and Ci is symmetric around ui = 0. This may hold, for example, when Ci is correlated with the utility
“stakes” of the decision, |ui |, as explored in the next model.
57
characteristics driving the endogeneity problem, such as financial literacy in the previous example.
When a decision-quality instrument, such as a treatment offering information on the choice, is available,
this setting readily admits testing of decision-quality independence or conditional decision-quality independence using decision-quality state variation, as in Proposition 7. Furthermore, when ūi and Ci are correlated
and appropriate observables for a matching estimator are unavailable, models like the one in Section 7 provide a natural way to examine the joint distribution of the two variables. With a joint normal distribution
of ūi and log(Ci ) and a homogeneous effect of a change in z on log(Ci ), this becomes exactly identical to the
latent variable model outlined in that section, so we can trace out the propensity to optimize as a function
of the fraction of optimizers at a given level of z, and extrapolate to recover the full distribution of ūi and
Ci .
B.2
Revealed Attention
Using the approach of Masatlioglu et al. [2012], we show here how the assumptions of frame monotonicity and
the consistency principle are implied by an intuitive assumption in a revealed attention framework. Assume
that an individual pays attention only to some subset of the menu X, but that she maximizes her preferences
over the alternatives she notices. In order to incorporate framing effects from variation in the choice situation
(which does not come from variation in X itself), we must specify an attention filter Γ which depends on dj .
Denote the attention filter by Γ(X, dj ).41 Given a utility function representation of individual i’s preferences,
ui (.), we could write the consumer’s choice as the solution to the utility-maximization problem restricted to
Γi (X, dij ).
max
c∈Γi (X,dij )
ui (c)
(62)
Claim: When X = {x, y} is binary, frame monotonicity and the consistency principle will be satisfied if the
individual always pays attention to the option favored by the frame. Formally,
∀i, x ∈ Γi (X, dx ) andy ∈ Γi (X, dy )
(63)
Proof: Suppose condition (63) is satisfied. We proceed in two cases. First, suppose that y(dy ) = 0, i.e.
c(X, dy ) = x. Since c(X, dy ) ∈ Γ(X, dj ) by (62), we must have x ∈ Γ(dy ), which together with condition
(63) implies Γ(X, dy ) = {x, y}. By (62), u(x) > u(y). Given that x ∈ Γ(X, dx ) ⊆ {x, y} by (63) again, we
know that y(dx ) = 0. Second, suppose that y(dx ) = 1. Then we must have y ∈ Γ(X, dx ). Similar to before,
41 It will have the property that ∀X, Γ (X, d) = Γ (X\x, d) whenever x ∈
/ Γi {X, d}. Masatlioglu et al’s assumption is not
i
i
directly relevant in our setting because we examine binary choices. However, in the non-binary case it will place additional
restrictions on when preferences are revealed by choices in the presence of frame monotonicity.
58
we know that Γ(X, dy ) = {x, y}. So u(y) > u(x). Then since y ∈ Γ(X, dy ) ⊆ {x, y}, y(dy ) = 1. These two
conditions are sufficient to prove frame monotonicity, ∀i, y(dy ) ≥ y(dx ).42 Note that whenever c(X, dx ) = y,
we also have c(X, dy ) = y and u(y) > u(x), and whenever c(X, dy ) = x, we also have c(X, dx ) = x and
u(x) > u(y). This guarantees that consistent choices reveal preferences.
Granted this assumption (and unconfoundedness and frame separability), the results of this paper will all
obtain in the revealed attention framework of Masatlioglu et al. [2012]. Intuitively, when individuals choose
y under dx , they are “revealing” that they pay attention to y under dx , since an individual cannot choose
an alternative not in the attention set Γ(.). Given the assumption that all individuals also pay attention to
the favored choice in a given frame, by choosing y a DM reveals that she prefers x to y. Variation in which
individuals are consistent is equivalent to variation across individuals’ in Γi (X, dj ) for each frame dj , which
could be endogenized using the similar approaches to those in other sections in this appendix. Proposition
1 will apply. If Γi is independent of preferences φi , we will have that decision-quality independence is
satisfied, and if we allow Γi to depend on individual characteristics or if we allow Γi to expand based on a
decision-quality instrument, later propositions in the paper may be applied. Finally, note that, like frame
monotonicity, property (63) could be tested if we are able to observe individuals’ choices across frames. In
addition, note that when we move beyond the binary case, assumption (63) would justify the assumption
that active choices reveal a preferences for the chosen option over the default option, but not the stronger
assumption that active choices reveal preferences over the entire menu.
B.3
Bounded Rationality Models
The models in this section work towards stronger and stronger notions of bounded rationality, by which
we mean that whether decision-makers optimize depends (in some form) on the gains to doing so. The
“meta-optimization” problems used in these models create an infinite regress problem [Conlisk, 1996]. It is
likely costly to perform a cost-benefit calculation to decide whether to incur some cost of optimizing, which
may lead us to wonder how the individual acquires and processes information to solve the meta-optimization
problem. Assumptions on what exogenous knowledge the individuals posses (and account for costlessly)
about costs and benefits of optimizing in this and later models bypass the infinite regress issue in an ad
hoc fashion. These assumptions are common in the literature, but we take no stance on which, if any, are
appropriate. Note that these conceptual difficulties grows more severe the more that decision-makers are
assumed to optimize or not based on the true utilities associated with the available menu items in the choice
decision at hand.
42 The
other two cases, where c(X, dx ) = x and c(X, dy ) = y, will trivially satisfy y(dy ) ≥ y(dx ).
59
B.3.1
Stakes-Based Optimization
Assume the DM knows the utility “stakes” of the meta-decision of whether to optimize, |ūi |, and must
decide whether to incur the cost of optimizing, Ci . As before, we assume that individuals who pay the cost
to optimize select their most-preferred option under both frames whereas individuals who do not optimize
select the option associated with the frame under which they choose. To motivate the inclusion of utility
stakes into the decision of whether to optimize, consider an employee selecting a retirement savings plan. The
employee may know how much selecting the right retirement savings plan matters to her and how costly it is
to learn about the menu of plans, but she may not actually know which plan is best for her without incurring
the optimization cost. Similarly, an individual may have a general sense that accounting for low-salience
taxes when making purchasing decisions is more important for large purchases than for small purchases.
Suppose that individuals believe that x is best with probability ωx , which is homogenous for simplicity.43
In the frame dx , DM decides whether to optimize based on the solution to the following problem:
max ψi [|ūi | − Ci ] + (1 − ψi )[ωx |ū| − (1 − ωx )|ū|]
ψi ∈{0,1}
The individual pays the cost to learn the best option only if
Ci < 2(1 − ωx )|ūi |
Symmetrically, when the frame is dy , the individual pays the cost only if
Ci < 2ωx |ūi |
The individual is consistent if and only if
ψi = 1 ⇐⇒ Ci < min {2(1 − ωx )|ūi |, 2ωx |ūi |}
(64)
and whenever the individual is consistent, she chooses her preferred option, so the consistency principle applies. The assumption of frame monotonicity is embedded in the optimization problem. As such Propositions
1, 2, and 3 are readily applied to this situation.
Decision-quality independence will not hold in general in this setting, since individuals with high |ū| tend
to choose consistently and may also be more likely to prefer y. More formally, the event in equation (64) will
43 Letting ω depend on ū would bring this model closer in line with stronger versions of bounded rationality presented in
x
later sections.
60
not be independent of the event ui > 0 without strong assumptions. One set of such assumptions is that Ci
is independent of ū, and the distribution of ū is symmetric about 0. The latter is tantamount to assuming
that E[φi ] = 0.5 ex ante.
In spite of the likely failure of decision-quality independence, conditional decision-quality independence
will hold when one can observe sufficient characteristics to control for both Ci and |ūi |, in which case
Proposition 5 will allow us to recover population preferences. That is, under stakes-based optimization
models, the observer should control for variation among decision-makers associated with the utility stakes in
the underlying decision. For example, in the retirement savings plan context, one could solicit and control
for 1) the individual’s knowledge of the definitions of various aspects of retirement plans, and 2) for the
self-reported importance of the savings decision to this person. Propositions 1, 2, and 3 are applicable in
general for this model. Valid decision-quality instruments consist of any variation in the choice environment
that change the cost of optimizing or the perceived stakes of the decision monotonically for all individuals.
B.3.2
Optimal Decision Rule
Assume DM chooses whether to optimize in a given situation, or a given class of situations she may encounter
multiple times. DM knows her preferences ui (x) and ui (y) when she decides whether to optimize, but not
when she actually chooses from the menu {x, y}. This phenomenon may arise because when she actually
chooses there are some environmental influences that she cannot avoid without paying some cost ex ante
(e.g. additional mental effort to stay focused or exercise self-control). Suppose the DM decides whether to
optimize given her beliefs about how likely she is to encounter a given framedj and the cost of optimizing
Ci . The likelihood of encountering frame dj is homogeneous across decision-makers and denoted αj .
The choice of whether to optimize is given by:
max (1 − ψi ){αx ui (x) + (1 − αx )ui (y)} + ψi [max{ui (x), ui (y)} − Ci ]
ψi ∈{0,1}
Note that frame monotonicity is embedded inside the first term in curly brackets. Solution of this
optimization problem yields that
ψi = 1 ⇐⇒ Ci < min {−(1 − αx )ūi , αx ūi }
Decision quality independence will not in general obtain in this model, for similar reasons to the previous model: underlying utility over x and y, ū, affects both ψi and φi . We will not have in general that
Cov(φi , ψi ) = 0 without strong assumptions like the symmetry assumption discussed in the previous section.
61
When αx = 0.5, this model becomes exactly like the previous model (with ωx = 0.5), in which case controlling for variation in costs Ci and stakes |ūi | may yield conditional decision-quality independence. Without
reason to believe that the DM believes she is equally likely to face either frame, so αx = 0.5, conditional
decision-quality independence is unlikely to be satisfied in this case, since “controlling” for how ū enters the
optimization decision requires knowing whether ū is positive or negative, i.e. whether φi = 1 or φi = 0.
However, Proposition 2 can be applied in this setting to recover bounds on E[φi ]. We could apply Proposition 6 in this setting via a treatment affecting the cost of optimizing Ci monotonically for all individuals,
and recover the fraction of individuals who switch to optimizing who prefer y.
B.3.3
Framing Effects as Pure Transaction Costs
In this model, we assume a fully rational DM is affected by the default option due to a transaction cost.
We will analyze this model in somewhat more detail to illustrate the relationship between the structural
approach commonly used in the literature and our thinking, as well as the two-sided selection issue alluded
to in the body of the paper.
We assume that choosing the option that is not the default incurs a transaction cost γi . DM’s choice ci
solves
max ui (ci ) − γI{ci 6= dij }
ci ∈{x,y}
When dij = dx , the solution to this problem is given by ci = y ⇐⇒ ūi > γi . When d = dy , the solution is
given by ci = y ⇐⇒ −ūi < γi . Note that individuals who choose x under dy prefer x, and they will also
choose x when y is the default, so consistent choices reveal preferences and frame monotonicity is satisfied.
Note also that although we focus on the case where γi is a real transactions cost, we could also think of this
model as a model of bounded rationality similar to the previous one, but where the individual can condition
her choice of whether to optimize on the frame.44
We can summarize the three distinct possibilities for the choices of individual i as follows:
(ci (X, dx ), ci (X, dy )) =




(x, x)




(x, y)






(y, y)
if − ūi > γi
if -ūi < γi , ūi < γi
(65)
if ūi > γi
Note that the above implies the consistency principle and frame monotonicity obtain. The two statistics
44 Formally,
we would assume that the individual i updates αx in the previous model to αx |dij = I{dij = dx }.
62
studied in the bulk of the paper will be given in this model by
φi = 1 ⇐⇒ ūi > 0
ψi = 1 ⇐⇒ ūi ∈ [−∞, −γi ] ∪ [γi , ∞]
Decision-quality independence will not generally be satisfied:45
cov(φi , ψi ) = p(u > γi ) − p(ū > 0)p(ūi < −γi or ūi > γi )
which will not generally equal zero.46 Proposition 4 cannot be applied in a model such as this one, which
is not surprising: whether an individual is consistent in this model depends strongly on her preferences.
However, all the assumptions of Proposition 1-3 are satisfied, so, for example, we can apply Proposition 1:
E(ψi ) = E[yi |dx ] + E[yi |dy ] = p(ūi > γi ) + 1 − p(−ūi > γi )
E[φi |ψi = 1] =
p(ūi > γi )
E[y|dx ]
=
.
E[y|dx ] + 1 − E[y|dy ]
p(ūi > γi ) + 1 − p(−ūi > γi )
It is unlikely that conditioning on observables will help us identify population preferences, for the same
reason as in the last section. One exception occurs when variation in benefits is negligible compared to variation in costs, which transforms this model (to some approximation) into the one in the first model presented,
in which optimization is a latent characteristic. This transformation is especially useful in situations such
as the one where x and y are non-transparently the same good, such as the example of store-brand versus
brand name pharmaceuticals with identical chemical components in Bronnenberg et al [2013].
We can use Proposition 2, here, because when γi ≥ 0 for all i,
φ̄ = p(ūi > 0) ∈ [p(ūi > γi ), p(ūi > −γi )] = [E[y|dx ], E[y|dy ]].
However, we can identify the preferences of a larger number of decision-makers in the presence of varying
decision-quality environments, and changes in transactions cost provide a natural example of such variation.
Reductions could be obtained by easing the administrative requirements (such as paperwork) for choosing
the non-default option. Suppose that transactions costs change from γi to γi0 ≤ γi , with γi0 < γi for some i.
45 cov(φ
46 In
i , ψi ) = E[φi ψi ] − E[φi ]E[ψi ] = p(φi = ψi = 1) − p(φi = 1) p(ψi = 1).
particular, this condition holds if and only if p(ui > γi |ui > 0) = p(ui < −γi |ui < 0)
63
Then the conditions for Proposition 6 are satisfied and we will have:
(c(X, dx , γ), c(X, dx , γ 0 ), c(X, dy , γ 0 ), c(X, dy , γ)) =



(x, x, x, x) if ūi < −γi







(x, x, x, y) if ūi ∈ [−γ i , −γi0 ]




(x, x, y, y)







(x, y, y, y)







(y, y, y, y)
if ūi ∈ [−γi0 , γi0 ]
(66)
if ūi ∈ [γi0 , γ i ]
if ūi > γi
The second and fourth cases correspond to the contingent optimizers whose ordinal preferences are
captured by the statistic in Proposition 6. However, the two-sided nature of selection into observables in
this model suggest that we might attempt to recover more meaningful parameters, such as those governing
the distribution of ūi and γi . The remainder of this section describes such a model.
Assume that transaction costs are described by
log(γi ) = µν + θI{z = zh } + νi
where I{z = zh } is an indicator for being in the high-quality decision state zh . Suppose also that
ūi = µu + i
 
  

ε
0
1
ρ
 i
  

  ∼ N   , 

νi
0
ρ 1
Note the similarity between the setup of this model and the one in Section 7. We can use equation (66)
to calculate likelihood contributions, replacing γi by eµν +νi and γi0 by eµν +θ+νi .
64
Download