Rational Inattention and State Dependent Stochastic Choice February 22 2013

advertisement
Rational Inattention and State Dependent Stochastic Choice
Andrew Caplin and Mark Dean
February 22 2013
Preliminary and Incomplete - Please Do Not Circulate
Abstract
Economists are increasingly interested in how attention impacts behavior. Rational inattention theory models the allocation of attention in an optimizing framework. We characterize
patterns of stochastic choice consistent with a general model of rational inattention, extending results based on Shannon information costs (Matejka and McKay [2011], Caplin and Dean
[2013]). We experimentally elicit “state dependent” stochastic choice data of the form required
to test the model. Rational inattention theory does a qualitatively better job of matching this
data than do standard stochastic choice models that ignore the link between incentives and
attention.
1
Introduction
Limits on attention impact choice. Shoppers may buy unnecessarily expensive products due to
their failure to notice whether or not sales tax is included in stated prices (Chetty et al. [2009]).
Buyers of second-hand cars focus their attention on the left hand most digit of the odometer when
evaluating the available alternatives (Lacetera et al. [2011]). Purchasers limit their attention to a
relatively small number of websites when buying over the internet (Santos et al. [2012]).
Given its e¤ect on choice, the forces that determine attentional e¤ort are being intensively studied. The rational inattention model (Sims [1998], Sims [2003]) is particularly in‡uential, capturing
as it does the balancing act between the improved decision quality that attention produces and the
entailed costs in terms of time and e¤ort. It is being applied to model a wide array of phenomena,
from price stickiness to consumption dynamics to portfolio choice.1 Yet understanding the implications of rational inattention can be challenging. The information that a decision maker obtains
is not directly observable, making it di¢ cult to di¤erentiate between rationally inattentive choices
and internally inconsistent mistakes.
In this paper we develop a simple characterization of the choice behavior consistent with rational inattention, and implement the corresponding tests in a laboratory experiment. We make
We thank Roland Benabou, Federico Echenique, Andrew Ellis, Daniel Martin, Stephen Morris, Pietro Ortoleva
and Mike Woodford for their constructive contributions. We also thank Samuel Brown for his exceptional research
assistance and Severine Toussaert and Isabel Trevino for their help in running the experiments
1
See Sims [2006], Tutino [2008], Woodford [2008], Nieuwerburgh and Veldkamp [2009], Mackowiak and Wiederholt
[2009], Mackowiak and Wiederholt [2010], Matejka [2010] and Paciello and Wiederholt [2011] for applications of
rational inattention. See Marschak [1971] for an earlier formulation of similar ideas.
1
no assumption on the nature of informational costs or constraints, so that our model represents a
signi…cant generalization of those in the current literature.2 The necessary and su¢ cient conditions
for rational inattention that we identify are simple and intuitive. A “no improving action switch”
(NIAS) condition ensures that choices are optimal given what was learned about the state of the
world (see Caplin and Martin [2011]). A “no improving attention cycles”(NIAC) condition ensures
that total utility cannot be raised by reassigning attentional strategies across decision problems.
These conditions are robust. One can insist that costs associated with more informative attentional
strategies (in the sense of Blackwell) are no lower than those associated with less informative strategies. One can insist on the feasibility of mixed attentional strategies. One can set inattention to be
costless. Even with these additional restrictions, the NIAS and NIAC conditions fully characterize
rationally inattentive behavior.
As with many existing models of rational inattention, our model allows for stochastic choice.
The data set we consider consists of “state dependent” stochastic choice data that allows choice
probabilities to depend on an underlying state of the world.3 We assume that, while perfectly
observable to the researcher, it is potentially costly for a decision maker to learn the true state.4
For example, the econometrician may know whether or not sales taxes are included in stated prices
even if consumers do not. State dependent data of this form is readily in the experimental laboratory
and in many …eld settings.5
Current models of stochastic choice are predominantly based on assumed randomness in the
utility function.6 Matejka and McKay [2011] (henceforth MM) establish an important link between
rational inattention and such random utility models. When attention costs are proportionate to
Shannon mutual information, demand functions in the rational inattention model are of a generalized logit form.7 Yet there are also important di¤erences, since random utility models treat
attentional e¤ort as independent of incentives. In section 5 we dig further into the di¤erences
between rational inattention and random utility. On the one hand, we show that random utility
models can violate NIAS and NIAC. On the other hand, as again pointed out by MM, rational
inattention can lead to violations of a monotonicity condition central to random utility models.
Speci…cally, when a newly available act incentivizes additional attention, the resulting knowledge
may induce choice of acts that were previously unchosen due to the lower level of attention. Another
important distinction arises when acts have “state dependent”dominance properties, so that choice
is simple with a known state of the world. Unlike rational inattention theory, standard random
utility theory does not allow attentional incentives to impact the quality of the signals on the basis
of which decisions are made. As a result, stochastic choice is similarly non-responsive to incentives.
In order to test whether behavior can be modeled using rational inattention theory, we experi2
Followings Sims [2003], much of the literature assumes that costs are based on the Shannon mutual information
between prior and posterior information states. Woodford [2012] suggests an alternative information cost function,
based on the concept of a Shannon capacity, which consistent with certain psychological experiments. Gul et al. [2012]
and Ellis [2012] take a di¤erent approach, assuming limits on the partitional structures of information available to
the decision maker.
3
While little studied in economics, state dependent stochastic choice data has featured in psychometric experiments
on perception dating back at least to Weber (see Murray [1993]).
4
This inverts the interpretation that stochasticity in choice arises from the opposite asymmetry: factors that are
unobserved to the econometrician yet observed by the decision maker (Manski [1977]).
5
Other authors have used di¤erent data sets to capture the implications of rational inattention. Ellis [2012] uses
state dependent deterministic choice, while Mihm and Ozbek [2012] use choice over menus.
6
See for example Luce and Suppes [1965], Falmagne [1978], McFadden and Richter [1990], Clark [1996], McFadden
[2005] and Gul and Pesendorfer [2006].
7
Caplin and Dean [2013] characterize stochastic choice behavior for a broad class of entropy-based cost functions.
2
mentally elicit state dependent stochastic choice data. We present subjects with a screen containing
only red and blue dots, with the proportion of red dots determining the state of the world. They
then choose from a set of available acts, the payo¤s to which depend on the state. By repeatedly
presenting subjects with the same decision problem, we estimate their probability of choosing each
act in each state. Unlike most psychology experiments (for example Navalpakkam and Perona
[2009]), our experiment has no time limit, so that subjects can in principle perfectly determine the
state of the world. The fact that they choose not to results precisely from their internal trade o¤
between additional input of cognitive e¤ort and time on the one hand and monetary reward on the
other.
We use our experimental data to conduct two primary sets of tests. First we determine the
validity of the NIAS and NIAC conditions. Aggregate data aligns well with both conditions, and
losses due to violations are small at the individual level. Second, we look to discriminate between
rational inattention theory and random utility theory. We identify qualitative violations of the
latter models. Overall, we conclude that rational inattention theory is a reasonable starting point
for modeling stochasticity in choice when attentional e¤ort is chosen.
Section 2 introduces our formulation of the rational inattention model with an unrestricted cost
function. Section 3 derives the testable implications of this general model for state dependent choice
data. Section 4 shows that this characterization is unchanged by addition of natural restrictions
on the cost function. Section 5 contrasts rational inattention theory with more familiar theories
of stochastic choice. Section 6 details our experimental design and presents experimental results.
Section 7 relates our work to the prior literature and to further ongoing work. Section 8 concludes.
2
A General Model of Rational Inattention
2.1
Model
We consider a decision making environment comprising a set of possible states of the world and a
set of acts, the payo¤s of which are state dependent. A given decision problem consists of a prior
distribution over states of the world and the subset of acts from which the decision maker (DM)
must choose.8 The key assumption underlying models of rational inattention is that the true state
of the world is knowable in practice, but obtaining (or processing) information is costly. Thus,
prior to choosing what act to take, the DM chooses an attentional strategy. In line with the rational
inattention literature, an attentional strategy is described in an abstract manner - as a stochastic
mapping from states of the world to subjective posterior information states (equivalently as a sets
of signals, the probabilities of which depend on the state of the world). Subject only to Bayes’
rule, the DM is free to choose any attentional strategy they wish, but will face a utility cost of
doing so. Having selected an attentional strategy, the DM can condition choice of act only on the
subjective state.9 Their payo¤ is then given by the expected utility of the chosen acts less the costs
of attention.
Figure 1 illustrates an attentional strategy of a …rm that is choosing which of two prices to
charge (high or low). Pro…ts depend on the price chosen and the underlying state of the economy,
8
9
Both of which are assumed known to the DM.
And possibly an independent randomizing device, by which we allow for mixed strategies.
3
which can be good, medium or bad (G, M and B) with probabilities 61 ,
1
2
and
1
3
respectively.
Figure 1: An Example of an Attentional Strategy
In this example, the …rm has chosen an attentional strategy which produces two subjective
states, or signals, R and S. If demand is good they receive signal R for sure, while if demand is
bad they receive signal S for sure. If demand is medium, they receive each signal with probability
1
2 . Upon on receiving signal R, the …rm sets prices high. Conditional on receiving this signal. the
probability of state G is 25 and of state M is 35 . The expected utility of action H is therefore 25
times the utility that H gives in state G and 53 times the utility that H gives in M . The …rm sets
price L upon receipt of signal S, where the conditional probabilities are 47 M and 73 L. The net
bene…t to the …rm from this attention strategy is this expected “gross bene…t” of the acts chosen
net of attentional costs.
Formally, a decision making environment is identi…ed by a …nite set of states of the world,
= fm 2 Nj1
m
M g and a …nite set of acts F , with F = 2F =; comprising all non-empty
f
denote
subsets. We take as given a state dependent utility function U :
F ! R and let Um
the utility of act f in state m. De…ne = ( ) as the set of probability distributions over states.
Given 2 , m denotes the probability of state m. A decision problem consists of a prior belief
2 over states of the world and a non-empty set of acts A 2 F from which the decision maker
(DM) must choose. For any prior , de…ne
= fm 2 j m > 0g as its associated support.
An attentional strategy maps each state of the world to a probability distribution over subjective
information states. Since we will be characterizing expected utility maximizers, we identify a
subjective information state with its associated posteriors beliefs 2 . For a particular prior, the
set of feasible attention strategies is the set of stochastic mappings from objective to subjective
information states that satisfy Bayes’law.
De…nition 1 Given 2 ; the set of feasible attentional strategies
comprises all mappings
:
! ( ) that have …nite support ( )
and that satisfy Bayes’ law, so that for all m 2
4
and
2 ( ),
m
=
m m(
M
X
)
j j(
,
)
j=1
where
m(
)
(m)(f g). De…ne
=[
.
2
Note that m ( ) can be interpreted as the probability of information state conditional on true
state m. Note that one can identify attentional strategies also as compound lotteries or temporal
lotteries as analyzed in models that capture prefences over the timing of the resolution of uncertainty
(Kreps and Porteus [1978]).
Rational inattention assumes that subjects choose attentional strategies to maximize gross
payo¤s net of information costs. The gross payo¤ associated with using an attention strategy
2
in decision problem ( ; A) is calculated assuming that acts are chosen optimally at each
posterior state. Let G :
F
! R denote the gross payo¤ of using a particular information
strategy in a particular decision problem:
" M
#
X X
G( ; A; ) =
m m ( ) g( ; A)
2 ( )
where g( ; A) = max
f 2A
m=1
M
X
f
m Um :
m=1
An attentional cost function maps priors and attentional strategies to the corresponding level
of disutility. We allow costs to be in…nite for some strategies for two reasons. As a technical convenience, it enables us to ignore the Bayesian constraint by setting the cost of strategies inconsistent
with the speci…ed prior to be in…nitely high.10 More importantly, it means that our model nests
those that rule out certain types of information acquisition - for example by putting a hard limit on
the mutual information between priors and posteriors (Sims [2003]), by allowing only partitional
information structures (Ellis [2012]), or by allowing only normal signals (Mondria [2010]). These
models can be accomodated in our framework by setting to in…nity the cost of infeasible strategies.
To avoid triviality we assume that feasible attention strategies exist for all priors.
De…nition 2 An attentional cost function is a mapping K :
! R with K( ; ) = 1 for
2
=
and K( ; ) 2 R for some 2
. We let K denote the class of such functions.
We make the standard assumption that attention costs are additively separable from the prizebased utility derived from the acts taken. We let ^ : K
F !
map cost functions and
decision problems into rationality inattentive strategies. These are the strategies that maximize
gross payo¤s minus attention costs,
^ (K; ; A) = arg sup fG( ; A; )
2
10
As in Mihm and Ozbek [2012].
5
K( ; )g :
2.2
State Dependent Stochastic Choice Data
The data that we use to test the model of rational inattention is state dependent stochastic choice
data. For a given decision problem ( ; A) we observe the probability of choosing each act f 2 A
in each state of the world m 2
. It is the fact that we observe choice probabilities in each state
that di¤erentiates this from standard stochastic choice data.
Formally, let q :
! (A) denote a state dependent stochastic choice function for a decision
problem ( ; A), and let Q be the set of such mappings, with Q = [ 2 Q . qm (f ) is the probability
of the DM choosing act f in state m. We denote as F (q) F the set of acts chosen with non-zero
probability in some state of the world under stochastic choice function q,
F (q) = ff 2 Ajqm (f ) > 0 for some m 2 g
We assume that data has been gathered on some …nite set D of decision problems:
De…nition 3 A state dependent stochastic choice data set is a pair (D; q) with D
and q : D ! Q, with q ( ;A) 2 Q and F ( ;A) F q ( ;A)
A.
F …nite
Data of this general form is standard in psychometric research and is gathered in the experiment
detailed in section 6.
The main theoretical aim of the paper is to characterize the conditions under which state
dependent stochastic choice data is consistent with rational inattention. By this we mean that we
can …nd an attentional cost function and resulting optimal attentional strategy that would generate
the pattern of stochastic choice that we in fact observe.
De…nition 4 Given decision problem ( ; A) 2
F, an attentional strategy
with q 2 Q if there exists a choice strategy Y : ( ) ! (A), such that:
2
is consistent
1. Final choices are optimal: Letting Y (f ) denote the probability of choosing act f at
Y (f ) > 0 some
2 ( ) =)
M
X
m=1
f
m Um
M
X
m=1
g
m Um
2 ,
all g 2 A.
2. The attention and choice functions match the data,
X
qm (f ) =
m ( )Y (f ):
2 ( )
Note that we do not restrict the DM to pure strategies at any posterior. However, any act they
choose with strictly positive probability must be utility maximizing given beliefs.
~ ~ ) if there exists
De…nition 5 Data set (D; q) has a rational inattention representation (K;
~
^
~
K 2 K and ~ : D !
such that ~ ( ; A) 2 (K; ; A) and is consistent with q ( ;A) for all
( ; A) 2 D.
6
The existence of such a representation implies that the cost function is well-behaved, in the
sense that an optimal attentional strategy exists.
Throughout this paper, we assume that the DM’s expected utility function and prior beliefs over
objective states are known to the researcher - only attention costs are not directly observable. This
assumption is in line with the focus of the paper, but is not central to our approach. By enriching
the data set, we could recover beliefs and preferences from the choice data of the DM, and use these
as a starting point for our representation. In order to recover utility, we could replace the “Savage
style”acts we use in this paper (which map deterministically from states of the world to prizes) with
“Anscombe-Aumann” acts that map states of the world to probability distributions over the prize
space. Assuming the DM does maximize expected utility, U could then be recovered by observing
choices over degenerate acts (i.e. acts whose payo¤ are state independent).11 If we further add
to our data set the choices of the DM over acts before the state of the world is determined (or at
least in a situation in which they cannot exert any e¤ort to determine that state) then we can also
recover the DM’s prior over objective states (again assuming expected utility maximization).
3
The Characterization
The main theorem of this section establishes two intuitive conditions as necessary and su¢ cient
for (D; q) to have a rational inattention representation. We use simple examples to illustrate the
role of these conditions before stating the main theorem, the proof of which is in appendix A. The
…rst condition - which establishes optimality of …nal choice given an attentional strategy - applies
to each decision problem separately. The second - which establishes optimality of the attentional
strategy - applies to all decision problems that share a speci…c prior.
3.1
Minimal Attentional Strategies
The key to our approach is the observation that, if a DM is rationally inattentive, then one can
learn much about their attentional strategy from state dependent stochastic choice. To begin with,
one can identify the average posterior beliefs that a subject must have had when choosing each act.
De…nition 6 Given q 2 Q and
2 , de…ned the revealed posteriors r(
( ;q)
rm
(f ) =
m qm (f )
M
X
;q)
: F (q) !
by,
:
j qj (f )
j=1
If the DM chooses each act in at most one subjective state then the revealed posteriors will in
fact be the posteriors that de…ne their attentional strategy. If they choose the same act in more
than one subjective state then the revealed posterior will be equivalent to the weighted average of
beliefs across all posteriors at which that act is chosen.
11
This is the approach taken by Ellis [2012]. Caplin and Martin [2011] present an alternative approach that allows
for an unknown utility function.
7
We can use the revealed posteriors to construct a possible attentional strategy for each decision
problem. We do so by treating the revealed posteriors as identifying all possible posteriors the
attentional strategy can produce.
De…nition 7 Given
satisfy,
2
and q 2 Q, de…ne the minimal attentional strategy
( ;q)
m (
X
)=
ff 2F (q)jr( ;q) (f )=
for all m 2
( ;q)
2
to
qm (f );
g
.
If a rationally inattentive DM chose each act in at most one subjective state, and did not use
mixed strategies at any posterior, then the minimal attentional strategy would be their true strategy.
More generally, any attentional strategy consistent with the data must be more informative than
the minimal attentional strategy, in the sense of statistical su¢ ciency. Intuitively, this means that
the minimal attentional strategy can be obtained by “adding noise”to the true attentional strategy.
De…nition 8 Attentional strategy 2 is su¢ cient for attentional strategy 2P (equivalently
ij = 1
is a garbling of ) if there exists a j ( )j j ( )j stochastic matrix B 0 with
j2 ( ) b
all i and such that, for all j 2 ( ) and m 2 ,
X
j
bij m ( i ):
m( ) =
i2
( )
Lemma 1 establishes that any consistent attentional strategy must be su¢ cient for the minimal
attentional strategy.
Lemma 1 Given decision problem ( ; A) 2
data, then is su¢ cient for ( ;q) :
F and q 2 Q , if
2
is consistent with these
Blackwell’s theorem establishes the equivalence of the statistical notion of “more informative
than”(su¢ ciency) and the economic notion “more valuable than”. If attentional strategy is su¢ cient for strategy , then it yields (weakly) higher gross payo¤s in any decision problem. See Cremer
[1982] for a statement and simple proof of Blackwell’s theorem. This result plays a signi…cant role
in our characterization.
Remark 1 Given decision problem ( ; A) 2
F and ; 2
G( ; A; )
3.2
with
su¢ cient for ,
G( ; A; ):
No Improving Actions Switches
Our …rst condition ensures that the choice of act at any given posterior is optimal. Consider the
trivial case in which only one decision problem ( ; A) is observed. Suppose that there are two states
8
with 1 = 2 = 0:5, two available acts, A = ff; gg, and state dependent payo¤s (U1f ; U2f ) = (0; 10)
and (U1g ; U2g ) = (20; 0). Finally suppose that the behavioral data set speci…es:
1
q1 (f ) = q1 (g) = ;
2
2
1
q2 (f ) =
; q2 (g) = :
3
3
One can readily con…rm that there is no attentional strategy consistent with this behavioral
data. This is because, at the revealed posterior when f is chosen, it would be optimal to choose g:
The posterior probability that the true state is 1 when f is chosen is 37 . Given these beliefs, the
3
4
40
payo¤ of taking act g is 37 :20 + 47 :0 = 60
7 , while the payo¤ of act f is 7 :0 + 7 :10 = 7 .
If the minimal attention strategy was in fact employed, the revealed posterior associated with
each act would be the only posterior at which this act was chosen. In this case existence of such
an improving switch would be a direct violation of the rationally inattentive model. With a more
general attention strategy, rational inattention implies that f must be weakly preferred to g at each
state in which f is chosen. This means that f must also be weakly preferred to g at the weighted
average of these posteriors which, by the requirement of consistency, forms the revealed posterior.12
The NIAS condition rules out cases in which there are improving switches of this form. It
speci…es that, when one identi…es in the data the revealed posterior associated with any chosen act,
this act must be optimal at that posterior.
Condition D1 (No Improving Action Switches) Data set (D; q) satis…es NIAS if, for every
( ; A) 2 D and f 2 F ( ;A) ,
M
X
( ;q)
f
rm
(f )Um
m=1
M
X
( ;q)
g
rm
(f )Um
;
m=1
( ;q)
all g 2 A, where rm (f ) is the revealed posterior belief of state m when f is chosen in
decision problem ; A 2 D
3.3
No Improving Attention Cycles
Our second condition restricts choice of attentional strategy across decision problems that share
the same prior. Essentially, it cannot be the case that the total gross utility can be increased by
reassigning attentional strategies across decision problems that share the same prior. The following
example illustrates a violation of this condition.
Consider again the decision problem above with two equiprobable states and two available acts,
A = ff; gg, and with the state dependent payo¤s,
(U1f ; U2f ) = (0; 10); (U1g ; U2g ) = (20; 0):
12
See the proof of theorem 1 for details of this argument.
9
Suppose now that the observed choice behavior is as follows (using the choice set A as the identifying
superscript),
2
=1
3
1
=1
3
q1A (f ) =
q2A (f ) =
q1A (g);
q2A (g):
Now consider a second decision problem di¤ering only in that the act set is B = ff; hg, with
(U1h ; U2h ) = (10; 0), with the corresponding state dependent data set,
3
=1
4
1
=1
4
q1B (f ) =
q2B (f ) =
q1B (g);
q2B (g):
The speci…ed data looks problematic with respect to rational inattention. Act set A provides
greater reward for discriminating between states, yet the DM is more discerning under act set B. To
crystallize the resulting problem, note that, for behavior to be consistent with rational inattention
for some cost function K it must be the case that,
G( ; A;
A
G( ; B;
B
)
)
K( ;
A
K( ;
B
)
)
G( ; A;
B
G( ; B;
A
)
)
K( ;
B
);
K( ;
A
):
While we do not observe attentional strategies directly, it is immediate that G( ; i; i ) =
G( ; i; i ) for i 2 fA; Bg. Furthermore, as i is su¢ cient for i , Blackwell’s theorem tells us
that G( ; i; j )
G( ; i; j ) for i:j 2 fA; Bg (see Remark 1): Thus we can insert the minimal
attention strategies in the calculation of gross bene…ts and the above inequalities must still hold.
Substituting and combining the two conditions therefore yields,
G( ; A;
A
) + G( ; B;
B
)
G( ; A;
B
) + G( ; B;
A
);
indicating that total gross bene…t across the two decision problems must be maximized by the
assignment of minimal attention strategies observed in the data. Plugging in the minimal attentional strategies from our example data, we …nd that G( ; A; A ) + G( ; B; B ) = 17 21 , while
11
G( ; A; B ) + G( ; B; A ) = 17 12
. Thus, there is no cost function that can be used to rationalize
this data
The following general assumption ensures that there are no such cycles of gross utility improving
strategy switches.
Condition D2 (No Improving Attention Cycles) Data set (D; q) satis…es NIAC if, for any
2 and any set of decision problems ( ; A1 ); ( ; A2 ); :::::( ; AK ) 2 D with AK = A1 ,
K
X1
G( ; Ak ;
k
)
k=1
where
k
=
K
X1
k=1
[ ;q( ;Ak )] .
10
G( ; Ak ;
k+1
);
3.4
Characterization
Our main result is that NIAC and NIAS together are necessary and su¢ cient for a data set to have
a rational inattention representation.
Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es
NIAS and NIAC.
The key step in the proof that the NIAS and NIAC conditions are su¢ cient for (D; q) to have
a rational inattention representation is connecting the model with the linear allocation problem
analyzed by Koopmans and Beckmann [1957]. The cost function that we introduce is based on the
shadow prices that decentralize the optimal allocation in that model. Our result is also strongly
related to rationalizability conditions for quasi-linear preferences in the mechanism design literature
(see Rochet [1987]).
4
Monotonicity, Mixtures and Normalization
Theorem 1 states only that, if NIAS and NIAC hold, we can …nd some attentional cost function
that rationalizes the data. No restrictions are placed on the form of the function. In this section we
consider three natural restrictions on attentional cost functions: weak monotonicity with respect
to su¢ ciency; feasibility of mixed strategies; and costless inattention. In principle these restrictions
might place further conditions on stochastic choice data for it to be rationalizable, because they
imply that costs of unchosen strategies may be constrained by those assigned to chosen strategies.
Theorem 2 establishes that this is not the case: if state dependent stochastic choice is rationalizable,
then it is rationalizable by a cost function that satis…es these three conditions.
4.1
Weak Monotonicity
A partial ranking of the informativeness of attentional strategies is provided by the notion of
statistical su¢ ciency (see de…nition 8). A natural condition for an attentional cost function is that
more information is (weakly) more costly.
Condition K1 K 2 K satis…es weak monotonicity in information if, for any
; 2
with su¢ cient for ,
K( ; )
2
and
K( ; ):
Free disposal of information would imply this property, as would a ranking based on Shannon
mutual information (see also Mihm and Ozbek [2012] and Ming [2013]).13
13
While in many ways intuitively attractive, this assumption may not be universally valid. In a world with discrete
signals it may be very costly or even impossible to generate continous changes in information. Moreover the DM may
be restricted to some collection of partitions [Ellis 2012, Gul et al. 2011] in which case less informative structures are
essentially disallowed. It may not be possible to automatically and freely dispose of information once learned.
11
4.2
Mixture Feasibility
In addition to using pure attentional strategies, it may be feasible for the DM to mix these strategies
using some randomizing device.
De…nition 9 Given
+ (1
)
2 , attentional strategies ; 2
2
is de…ned by,
m(
all 1
m
M and
)=
m(
) + (1
, and
)
m(
2 [0; 1], the mixture strategy
);
2 ( ) [ ( ).
The de…nition implies that the mixing is not of the posteriors themselves, but of the odds of the
given posteriors. To illustrate, consider again a case with two equiprobable states. Let attentional
strategy be equally likely to produce posteriors (:3; :7) and (:7; :3), with equally likely to produce
posteriors (:1; :9) and (:9; :1). Then the mixture strategy 0:5
+ 0:5
is equally likely to produce
all four posteriors.
A natural assumption is that a DM can choose to mix between two attentional strategies in this
way, and pay the expected cost of such a mixture: for example, they could construct a strategy
which involved ‡ipping a coin, then choosing strategy if the coin comes down heads and strategy
if it comes down tails. In expectation the cost of this strategy would be half that of and half
that of . Allowing such mixtures puts an upper bound on the cost of the strategy 0:5
+ 0:5 .
However, it does not pin down the cost precisely, because we do not rule out the possibility that
there is a more e¢ cient way of constructing the mixed attentional strategy.
Condition K2 Mixture Feasibility: for all
2 (0; 1), the cost of the mixture strategy
K( ; )
4.3
2
=
and for any two strategies ; 2
+ (1
)
2
satis…es,
K( ; ) + (1
and
)K( ; ):
Normalization
It is typical in the applied literature to allow inattention at no cost, and otherwise to have costs be
non-negative. Given weak monotonicity, non-negativity of the entire function follows immediately
if one ensures that inattention is costless.
Condition K3 Given 2 , de…ne I 2
as the strategy in which m ( ) = 1 for 1 m M .
Attentional cost function K 2 K satis…es normalization if it is non-negative where realvalued, with K(I ) = 0 all 2 .
4.4
Theorem 2
Theorem 2 states that, whenever a rational inattention representation exists, one also exists in
which the cost function satis…es conditions K1 through K3. Whatever one thinks of the above
assumptions on intuitive grounds, even if any one or all of them are in fact false, any data set that
can be rationalized can equally be rationalized by a function that satis…es all of these conditions.
12
Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention
representation with conditions K1 to K3 satis…ed.
This result has the ‡avor of the Afriat characterization of rationality of choice from budget sets,
which states that choices can be rationalized by some utility function if and only if they can be
rationalized by a non-satiated, continuous, monotone, and concave utility function.
Note that the representation need not be unique. Conditions for recoverability are left for future
research.
4.5
No Strong Blackwell
Not all restrictions on the form of the cost function can be so readily absorbed as K1 through
K3. For example, there are data sets satisfying NIAS and NIAC yet for which there exists no cost
function that produces a rational inattention representation while respecting strict monotonicity,
whereby, if is su¢ cient for 0 but 0 is not su¢ cient for , then K( ; ) > K( ; 0 ).
A simple example with data on only one decision problem in which there are two equally likely
states illustrates that one cannot further strengthen the result in this dimension. Suppose that
there are three available acts A = ff; g; hg with corresponding utilities,
(U1f ; U2f ) = (10; 0) ; (U1g ; U2g ) = (0; 10) ; (U1h ; U2h ) = (7:5; 7:5) :
Consider the following state dependent stochastic choice data in which the only two chosen acts
are f and g,
3
q1 (f ) = q2 (g) = = 1 q1 (g) = 1 q2 (f ):
4
Note that this data satis…es NIAS; given posterior beliefs when f is chosen, f is superior to g
and indi¤erent to h, and when g is chosen it is superior to f and indi¤erent to h. It trivially satis…es
NIAC since there is only one decision problem observed. We know from theorem 2 that is has a
rational inattention representation with the cost of the minimal attention strategy K ( ; ) 0 and
that of the inattentive strategy being zero, K( ; I ) = 0. Note that is su¢ cient for I but not
vice versa, hence any strictly monotone cost function would have to satisfy K ( ; ) > 0. In fact
it is not possible to …nd a representation with this property. To see this, note that both strategies
have the same gross utility,
G( ; A; ) =
1
2
3
4
10 +
1
2
3
4
10 = 1 7:5 = G( ; A; I );
where we use the fact that the inattentive strategy involves picking act h for sure. In order to
rationalize selection of the inattentive strategy, it must therefore be that is no more expensive
than I , contradicting strict monotonicity.
5
Rational Inattention and Random Utility
In this section we compare the rational inattention model with alternative models in which stochasticity in choice stems from randomness in the utility function (e.g. McFadden [1974], Loomes and
13
Sugden [1995]).14 Such random utility models (RUMs) take as given a probability measure over
some family of utility functions. Prior to making a choice, one utility function gets drawn from this
set. The DM then chooses to maximize this utility function.15
There is a …rst order di¤erence between standard RUMs and the rational inattention model.
While the latter produces choice behavior that di¤ers across states of the world, the former typically
conditions out all observable states, as a result giving rise to state independent choice (e.g. Falmagne
[1978], McFadden and Richter [1990], Clark [1996], McFadden [2005] and Gul and Pesendorfer
[2006]). In this section we consider three distinct approaches to translating RUMs to settings in
which there is an underlying state that is observable to the econometrician. We begin by considering
an “uninformed”RUM in which the DM gathers no information other than their prior belief about
the state of the world. We then consider the other extreme case of a “fully informed” RUM in
which the DM knows the true state perfectly. Finally, we consider a “partially informed” RUM in
which the DM receives an exogenous signal about the state of the world before maximizing their
randomly selected utility function.
All three RUM variants are distinct from the rational inattention model, in the sense that there
is behavior which is consistent with rational inattention but not the RUM and vice versa. As
formalized below, the Uninformed and Perfectly Informed RUMs place fundamental restrictions on
the data that are not implied by rational inattention. The case of the Partially Informed RUM is
more subtle, but (along with the other two variants) it implies a monotonicity property which is
not required by rational inattention. In the interests of parsimony, we relegate speci…c examples of
RUMs that violate NIAC and NIAS to appendix B. In section 6.6 we discuss how our experimental
data helps us to di¤erentiate between rational inattention and RUMs.
5.1
Uninformed RUM
Consider a DM who chooses according to a RUM without learning anything beyond the prior about
the true state of the world. To maintain generality, de…ne the class of utility functions U to be
all functions :
F ! R, with ( ; f ) the utility assigned by 2 U to act f 2 A at prior
2 . Note that random expected utility (in the manner of Gul and Pesendorfer [2006]) is a
special case in which the utility can be computed by weighting some underlying utilities on a prize
space according to the prior probabilities. Letting denote the probability measure over U, a data
set (D; q) generated by a prior only RUM is de…ned by,
( ;A)
qm
(f ) =
f 2 Uj ( ; f ) > ( ; g) 8 g 2 Ag ; 16
for every ( ; A) 2 D and m 2 .
The key behavior that is allowed by rational inattention but not by the Uninformed RUM is
state dependence: in this prior only formulation, the stochastic choice function is state invariant,
because the beliefs of the agent are state invariant. Clearly rational inattention allows for state
dependence in the choice function.
14
Random choice models have been used in psychology since the work of Thurstone [1927] and Zermelo [1929].
In the case of choice over lotteries, the family of utility functions can be over the lotteries themselves or, following
Gul and Pesendorfer [2006] over the underlying prize space, with the utility of a lottery equal to its expectation
according to the selected utility function.
16
In the case of utility functions in which the utility maximizing elements of A are not unique, a tie breaking rule
is needed. See Gul and Pesendorfer [2010].
15
14
On the other hand, as we demonstrate in appendix B, the Uninformed RUM allows for violations
of NIAS. This is obvious from the fact that such a DM has a single posterior belief, yet chooses
many acts. The Uninformed RUM will, however, generate data that is consistent with NIAC: as
the DM is always uninformed, their choice of attentional strategy can be rationalized by a cost
function that puts arbitrarily high costs on any informative attention strategy.
5.2
Perfectly Informed RUM
An alternative and possibly more natural interpretation of the RUM is that the DM is fully informed
about the state of the world, then chooses in each state according to a RUM. Letting m 2 be
the degenerate probability distribution on state m, a Perfectly Informed RUM would generate data
of the following form:
( ;A)
qm
(f ) =
f 2 Uj (
m; f )
> (
m ; g)
8 g 2 Ag :
MM identify an intriguing connection between the Perfectly Informed RUM and rational inattention. One heavily used variant of the RUM is the logit model, which assumes that is constructed
using a …xed “base”function 2 U which is then perturbed by an error term distributed according
to an extreme value type-I distribution. Applying this assumption to the Perfectly Informed RUM
leads to a state dependent stochastic choice function of the form,
e
( ;A)
qm
(f ) = P
(
m ;f )
g2A e
(
m ;g)
:
MM show that one can provide an analogous characterization of the state dependent stochastic
choice associated with the rational inattention when attentional costs are linear in Shannon mutual
information,
( ;A)
q~m
(f )
=P
Pfe
f
Um
k
g
g2A P e
g
Um
k
;
where P f is the unconditional probability of choosing act f and k > 0 scales the disutility of
attention relative to prizes. Thus, rational inattention with Shannon costs looks in certain respects
like a state dependent logistic RUM.
Despite the apparent similarity, there are major qualitative di¤erences between a Perfectly
Informed RUM and the rational inattention model. As the above formula shows, the Perfectly
Informed RUM implies that the probability of choosing an act in a particular state depends only
on its payo¤ in that state, and is independent of the payo¤s of any act in other states of the world
and of prior beliefs. For rationally inattentive models, this is generally not the case as the DM will
choose not to be perfectly informed about the state, which implies that choice probabilities will be
impacted by the relative performance of all acts in all states as well as prior beliefs. In the MM
formulation, this dependence is re‡ected in the unconditional probabilities P f .
In appendix B we show how Perfectly Informed RUMs can violate both NIAS and NIAC. In
the former case this is because the DM can still choose inferior options with positive probability
under the RUM. In the latter, it is because the perfectly informed RUM involves the DM acquiring
a large amount of information even when that information is not instrumental for choice.
15
5.3
Partially Informed RUM, Monotonicity, and Stochastic Dominance
The third and …nal RUM we consider involves a DM who receives an exogenous and imperfect
signal about the state of the world before a utility function is randomly drawn. In formal terms,
starting with prior 2 A, the DM receives a …xed informative signal :
! ( ) that determines
the information at the point that the random utility function is drawn. In this case the generated
data set for any ( ; A) 2 D will be of the form.
X
qm (f ) =
m ( ) f 2 Ujv( ; f ) > ( ; g) 8 g 2 Ag ;
2 ( )
where ( ) is the set of possible posteriors associated with signal .
The Partially Informed RUM does not exhibit either of the obvious restrictions on the data
that we have so far discussed: choices can vary with the underlying state, and the payo¤s of an
act in one state can a¤ect behavior in another. This makes di¤erentiating between the Partially
Informed RUM and rational inattention more subtle. We pursue two approaches in the experiment
that follows, as now detailed.
Our …rst approach involves testing a simple monotonicity axiom that is an essentially universal
feature of all RUMs, including those in which signals are received prior to making a decision. This
axiom states that the addition of a new act to the set of available choices cannot increase the
probability that one of the pre-existing options will be chosen (Gul and Pesendorfer [2006], see also
Luce and Suppes [1965]).
Monotonicity Axiom Given ( ; A) 2
F, h 2 F nA and m 2 ,
( ;A)
qm
(f )
( ;A[h)
qm
(f ):
As MM show, the rational inattention model can lead to robust violations of monotonicity.
Their canonical example involves two states of the world, with prior 1 on state 1. There are three
acts that may be available, with the following payo¤s in expected utility units,
U1f ; U2f
= (0; 1) ; (U1g ; U2g ) =
1 1
;
2 2
; and
U1h ; U2h = (Y; Y ) ;
where Y > 0. Assuming information costs based on Shannon mutual information, MM show that it
is optimal to pay no attention and choose the safe act g when only f and g are available, provided
1 is high enough. It is simply too expensive to bother trying to overturn the prior. However,
with h available also, it becomes more important to learn the true state - increasingly so the higher
is Y: The rationally inattentive agent may therefore select a more informative attention strategy.
If this learning suggests to the DM that state 2 is more likely, then it is optimal to choose act
f , producing a violation of monotonicity. In section 6.6 we report on an experiment designed to
capture the intuition of the MM example.
The second method for distinguishing the Partially Informed RUM from the rational inattention
model involves identifying cases in which all utility functions have the same ranking. This can
produce patterns whereby stochastic choice fails to respond to attentional incentives. To make
this precise, note that in many re…nements of the general RUM (such as the random expected
utility model of Gul and Pesendorfer [2006]), all possible utility functions have the property that
16
choice between stochastically ranked alternatives is deterministic. Moreover, many simple decision
problems generate posteriors in which acts are always so ranked. Consider simple cases with
two states and two acts ff; gg with variation in the incentive to learn: (U1f ; U2f ) = (0; Y ) and
(U1g ; U2g ) = (Y; 0) for Y > 0. For any posterior belief 1 > 0:5, act f stochastically dominates
g, while for 1 < 0:5, act g dominates f , regardless of the reward value, Y: Thus, if all utility
functions obey stochastic dominance, any randomness in choice must be due to the signal which,
by assumption, is invariant to Y . Thus data generated by a Partially Informed RUM should not
respond, for example, to doubling the value of Y . In general we would expect a rationally inattentive
agent to respond to such changes.
6
An Experimental Test of Rational Inattention
We introduce an experiment that produces state dependent stochastic choice data at the subject
level. We use the resulting data to implement our axiomatic tests of the rational inattention model.
One goal of the experimental design is to provide clear separation between the predictions of the
rational inattention model and standard random utility models. The experiments indeed con…rm
that it is possible to discriminate between models.
6.1
Design Overview
In a typical question in the experiment, a subject is shown a screen on which there are displayed
100 balls, some of which are red and some of which are blue. The state of the world is determined
by the number of red balls on the screen. Prior to seeing the screen, subjects are informed of
the probability distribution over such states. Having seen the screen they choose from a number
of di¤erent acts whose payo¤s are state dependent. A decision problem is de…ned by this prior
information and the set of available acts, as it is in section 2.1. A subject faces each decision
problem 50 times, allowing us to approximate their state dependent stochastic choice function. In
any given experiment, the subject faces 4 di¤erent problems. All occurrences of the same problem
are grouped, but the order of the problems is block-randomized.
There are several things to note about our experimental design. First there is no external limit
(such as a time constraint) on any subject’s ability to collect information about the state of the
world. If they so wished, subjects could perfectly determine the state on each question - a very small
number of subjects do just this. We are therefore not studying limits to any subject’s perceptual
ability to determine the state, as is traditional in many psychology experiments. At the same time,
there is no extrinsic cost to the subject of gathering information. Therefore the extent to which
subjects fail to discern the true state of the world is due to their unwillingness to trade cognitive
e¤ort for monetary reward.
Second, in order to estimate the state dependent stochastic choice function we treat the 50
times that a subject faces the same decision making environment as 50 independent repetitions of
the same event. To prevent subjects from learning to recognize patterns, we randomize the position
of the balls. The implicit assumption is that the perceptual cost of determining the state is the
same for each possible con…guration of balls. We discuss evidence for order e¤ects in our results in
section 6.3.
Third, we assume that utility is a linear function of money. This may not be the case if subjects
17
are risk averse over the amounts available in this experiment. One approach to this problem would
be to measure risk aversion using the multiple price list method of Holt and Laury [2002], then
to use the estimated curvature to assign utility numbers to monetary prizes. We make use of
this approach in Caplin and Dean [2013] when estimating the elasticity of information acquisition
with respect to rewards. In the current context, the rational inattention model does a good job of
explaining the data even without accounting for possible risk aversion.
6.2
Description of Experiments
We run four di¤erent experiments. The …rst three are primarily designed to test NIAS and NIAC,
while the fourth is designed to test for violations of monotonicity as described in section 5.3.
Experiments 1 and 2 consist of decision problems with two states (48 and 52 red balls), a prior
belief of 1 = 0:5 and two acts ff; gg. Act f is always superior in state 48 while g is superior in
state 52. Experiment 1 examines the e¤ect of asymmetric changes in reward - changing only the
value of choosing the correct option in state 48, while experiment 2 examines the e¤ect of symmetric
changes in reward. Tables 1 and 2 describes the decision problems in these experiments (payo¤s
are in US$).
Table 1: Experiment 1
Decision Prior
Payo¤s
f
Problem
u48 uf52 ug48
1
1
0.5
10
0
0
2
0.5
20
0
0
3
0.5
10
0
5
4
0.5
30
0
0
Table 2: Experiment 2
Decision Prior
Payo¤s
f
Problem
u48 uf52 ug48
1
5
0.5
10
0
0
6
0.5
2
0
0
7
0.5
20
0
0
8
0.5
30
0
0
ug52
10
10
10
10
ug52
10
2
20
30
Experiment 3 studies whether subjects can adjust the states between which they more …nely
di¤erentiate based on the available rewards. All decision problems in this experiment involve four
equally likely states (29, 31, 69 and 71 red balls). There are four decision problems with two possible
acts (d and e). The decision problems di¤er according to whether it is important to di¤erentiate
between 29 and 31 or 69 and 71 red balls.
Decision Problem
9
10
11
12
Table 3: Experiment 3
Payo¤s
ud29 ud31 ud69 ud71 ue29
1
0
10
0
0
10
0
1
0
0
1
0
1
0
0
10
0
10
0
0
ue31
1
10
1
10
ue69
0
0
0
0
ue71
10
1
1
10
Experiment 4 performs a test of monotonicity inspired by the example of MM. Once again
there are two equiprobable states (49 and 51 red balls). In decision problem 13 the DM must
choose between a safe option a which pays out 23 in each state and an option b which pays out
25 if there are 51 red balls, but only 20 if there are 49. Decision problem 14 introduces a third
alternative c which pays out 30 if there are 49 red balls and 10 if there are 51 red balls. Decision
problems 15 and 16 increase the payo¤ of act c in state 49 but decrease it in state 51, increasing
18
the value of attention.
Table 4: Experiment 4
Prior
Payo¤s
a
a
b
Decision Problem
u
u
u
ub51
49
49
51
49
13
0.5
23 23
20 25
14
0.5
23 23
20 25
15
0.5
23 23
20 25
16
0.5
23 23
20 25
uc49
n/a
30
35
40
uc51
n/a
10
5
0
Each experiment was run on between 23 and 33 subjects.17 Each subject answers 200 questions
as well as 1 practice question. At the end of the experiment, one question is selected at random for
payment, in addition to a show up fee of $10.
6.3
Overview of Data
In this section, we provide an overview of the data for experiments 1 and 2. In these experiments
there are only ever two acts available and two states, and the correct act to take in each state is
clearly de…ned. This allows us to establish a number of key features of our data.
First, subjects make a signi…cant number of mistakes: they chose the wrong act on 35% of
the trials overall. Second, subjects make signi…cantly di¤erent choices in the two di¤erent states
- that is they do make use (at least partially) of the information available to them. Averaging
across all individuals and decision problems, act f was chosen 68% of the time when the true state
was 48 red balls and 38% of the time when there were 52 red balls (the hypothesis that choice
behavior is the same in both states can be rejected at the 0.001% level).18 These patterns hold
true at the individual level. Of the 62 subjects that took part in experiment 1 and 2, only 6%
made mistakes in less than 10% of questions, while 84% had choice behavior that was signi…cantly
di¤erent between the two states at the 10% level. These results suggest that our subjects are
absorbing some information about the state of the world, but are not fully informed when they
make their choice.
We also use the overview data to test for order e¤ects due to, for example, learning or fatigue.
Averaging over decision problems, the percentage of correct responses in blocks 1-4 was 68%, 65%,
66%, and 65% respectively. Regressing a binary variable indicating whether the correct choice was
made on decision problem dummies and a dummy indicating the order in which the treatment was
seen by the subject suggests that these di¤erences are not signi…cant: A test of the linear restriction
that the block dummies are simultaneously equal to zero fails to reject the null hypothesis at the
17% level. We ignore order e¤ects in the remaining analysis.
6.4
Testing NIAS
In the two state, two act, set up of experiments 1 and 2, NIAS implies existence of a cuto¤ posterior
probability of state 48 that determines the optimal act. For higher such posteriors, act f is chosen,
17
29 subjects took part in experiment 1, 33 in experiment 2, 24 in experiment 3 and 23 in experiment 4. Each
subject took part only in one experiment.
18
Estimated using a linear probability model with individual-level …xed e¤ects and standard errors clustered at the
individual level. All statistical tests reported use this method.
19
while for lower posteriors, act g is chosen. Tables 5 and 6 summarizes these cuto¤s, and the extent
to which posteriors that are revealed in the experiments are consistent with them.19
Decision
Problem
1
2
3
4
Table 5: Experiment 1
Aggregate % Subjects
f
g
Cuto¤
Rational
48
48
50%
67% 31%
90
33%
58% 34%
59
67%
78% 46%
82
25%
61% 21%
76
Decision
Problem
5
6
7
8
Table 6: Experiment 2
Aggregate % Subjects
f
g
Cuto¤
Rational
48
58
50%
62% 34%
82
50%
63% 33%
85
50%
66% 30%
85
50%
68% 32%
88
These tables show that subjects in this experiment by and large satisfy the NIAS conditions.
The aggregate data (treating all data as if it was generated by a single subject) satis…es NIAS in
all but one case (the probability of state 1 is 1% too high when g is chosen in problem 2. While
there are some violations at the individual level, the losses associated with these violations are low.
Figure c1 in appendix C shows the distribution of costs of NIAS failures for each subject (i.e., for
each subject it calculates the actual expected value of their choice minus the expected value of
the optimal choices given their posterior beliefs) in experiments 1 through 4. As a benchmark,
these losses are compared to those that would have been observed from a population of decision
makers choosing at random.20 The use of random benchmarks has been discussed by, for example,
Beatty and Crawford [2011]. In each case, the observed distribution is signi…cantly di¤erent from
the simulated distribution at the 0.01% level.
Experiment 3 provides a somewhat di¤erent test of NIAS. In this experiment, act d is always
superior in states 29 and 69 and act e is superior in state 31 and 71. For it to be optimal to choose
option d, it must be the case that,
ud29
29
+ ud69
ue31
69
31
+ ue71
71 :
As, in experiment 3, ud29 = ue31 and ud29 = ue31 , and assuming that when d is chosen
must be the case that, for d to be optimal,
71
69
29
31
29
>
31 ,
it
ud29
:
ud69
In fact, this condition is su¢ cient to ensure that the choice of act e is also rational.21 Table 7
shows the values of this cuto¤ for each decision problem, and the extent to which aggregate and
19
In order to calculate posterior beliefs we combine the conditional probabilities of choosing each act from each
state estimated from the data with prior beliefs about the likelhood of each state, rather than the empirical likelihood
of each state.
20
The procedure to construct the random behavior is as follows: for each decision problem and for each state, a
random number is drawn for each available act. The probability of choosing each act from that state is then calculated
as the value of the random number associated with that act over the sum of all random numbers.
21
This follows from the fact that
d
71
d
29
=
(1
(1
d
69
d
31
=
71 (d))
29 (d))
71 71 (d)
69 69 (d)
P (d)
29 29 (d)
P (d)
31 31 (d)
(1
(1
P (d)
=
P (d)
69 (d))
31 (d))
20
=
71 (d)
69 (d)
29 (d)
31 (d)
71 (e)
69 (e)
29 (e)
31 (e)
=
e
71
e
29
e
69
e
31
individual level data is consistent with NIAS
Table 7: Experiment 3
Aggregate
Decision Problem
d
29
d
71
Cuto¤
1
10
9
10
11
12
b
[0; 40%]
[0; 40%]
[0; 40%]
[0; 40%]
13
14
15
16
% subjects rational
-1.3
-0.6
-0.9
-0.8
10
1
1
DP
d
31
d
69
88
96
80
92
Table 8: Experiment 4
Range 1
a
c
[40%; 100%] n/a
[40%; 65%]
[65%; 100%]
[40%; 60%]
[60%; 100%]
[40%; 57:5%] [57:5%; 100%]
Aggregate
b
1
a
1
c
1
49
44
43
41
50
50
50
51
63
64
65
Experiment 4 provides a sterner test of NIAS because generally there are more acts available
to the subject. For example, in decision problem 10, there are 3 available acts, and 3 regions of
the posterior probability space in which each of the di¤erent acts is optimal. Despite this, in the
aggregate data act a is always the optimal choice at its revealed posterior. Act c is optimal in all
but problem 10, in which the posterior belief is too low (by 2%). Posterior probabilities of state 1
are generally slightly too high when b is chosen - by a maximum of 8% in decision problem 9.
6.5
Testing NIAC
We next use our data to test whether subjects’ choices of information strategy are rationalizable
by some cost function - in other words, whether they satisfy NIAC. For experiments 1 and 2, the
fact that total surplus cannot be increased by switching information strategies between any two
decision problems implies the following condition (assuming that both acts are chosen with positive
probability in each case),
48 (f )
(f48
g48 ) +
52 (g)
(g52
f52 )
0,
(1)
where (x) indicates the change in x between the two decision problems. This expression has a
natural interpretation. The …rst term is the change in the probability of choosing the right act
in state 48 multiplied by the change in the bene…t of choosing the right act - i.e. the di¤erence
between the payo¤ of act f and g in that state. The second term is the change in the probability
of choosing the right act in state 52 multiplied by the bene…t of so doing.
Thus the condition described implies that
e
71
e
29
ud29
ud69
By assumption
e
29
e
31
e
69
e
31
is negative, so rearranging tells us,
ud29
e
29
+ ud69
e
69
ue31
21
e
31
+ ue71
e
71 :
Experiment 1 is designed so that (g52 f52 ) = 0 for every pair or acts. Thus this condition
implies a ranking of 48 (f i ) (or the probability of correct choice in state 48) across the di¤erent
decision problems,
4
2
1
3
48 (f )
48 (f )
48 (f )
48 (f ):
Figure 4 shows that this ranking holds true in the aggregate.
In experiment 2, for every pair of acts we have that (f48 f48 ) = (g52 g52 ). Thus this
condition implies a ranking on
48 (f ) +
52 (g), or the total probability of choosing the right
action across both states. The implied ranking is,
48 (f
8
)+
52 (g
8
)
48 (f
7
)+
52 (g
7
)
48 (f
5
)+
52 (g
5
)
48 (f
6
)+
52 (g
6
):
Figure 6 shows that these implications broadly hold in the aggregate data. The total proportion
of correct responses in decision problem 8 is higher than in decision problem 7 which is in turn
higher than in decision problem 5. The total proportion of errors in decision problem 6 is slightly
higher than that in problem 5 (by 0.5%) but the di¤erence is not statistically signi…cant. While
the ordering is in line with the theory, it is clear that the elasticity in the response of information
gathering to monetary incentives is quite low: the probability of making a correct choice rises from
65% in the $2 treatment to 70% in the $30 treatment (signi…cant at the 10% level). We consider
this issue in the context of Shannon Mutual Information cost functions in ?.
Experiment 1
Experiment 2
Figure 4: % of correct responses in state 48 (experiment 1) and both states (experiment 2).22
The equivalent of condition 1 for experiment 3 is
29 (d)
(d29
e29 ) +
69 (d)
(d69
e69 ) +
31 (e)
(e31
d31 ) +
71 (e)
(e71
d71 )
0:
Thus we …nd that, comparing decision problem 9 and 10 we have,
29 (d
10
)+
10
31 (e )
9
31 (e )
+
29 (d
9
)
69 (d
10
)+
10
71 (e )
9
71 (e )
+
69 (d
9
) ;
meaning that the increase in the proportion of correct choices in states 29 and 31 must be bigger
than the increase in proportion of correct choices in state 69 and 71 when shifting from decision
problem 9 to 10. This makes sense, as relative to problem 9, problem 10 has higher rewards for
correct decisions in the former two states that the latter two states.
22
Errors bars shown taking into account clustering at the subject level.
22
Comparing problems 9 and 10 to 11 implies,
9
)+
9
71 (e )
10
)+
10
31 (e )
69 (d
29 (d
69 (d
11
29 (d
11
71 (e )
)+
11
)+
(C2)
11
31 (e )
(C3)
Relative to problem 11, there should be a higher proportion of correct choices in state ! 69 and ! 71
in problem 9 and a higher proportion of correct choices in ! 29 and ! 31 in problem 10.
Comparing problems 9 and 10 to 12 gives,
29 (d
69 (d
12
12
)+
12
31 (e )
12
71 (e )
)+
9
)+
9
31 (e )
(C4)
10
)+
10
71 (e )
(C5)
29 (d
69 (d
Comparing problems 11 and 12 tells us that the total number of correct choices should be higher
in the latter that the former,
29 (d
12
)+
69 (d
12
)+
12
31 (e )
+
12
71 (e )
29 (d
11
)+
69 (d
11
)+
11
31 (e )
+
11
71 (e )
Table 8 shows the extent to which the aggregate data satis…es these conditions
Table 9
Condition
C1
C2
C3
C4
C5
C6
Left Hand Side
27.5
132.7
155.7
151.1
138.3
289.4
Right Hand Side
-3.7
127.4
131.5
128.2
132.7
258.9
P
0.01
0.36
0.05
0.03
0.34
0.17
In all cases, the left hand side values are higher than the right hand side, as required. These
di¤erences are signi…cant at the 5% level except for conditions C3, C5 and C6.
Applying bilateral NIAC to experiment 4 is slightly more complex. Comparing decision problems
14-16 the relevant condition is
(
49 (c)
This implies the following ranking on
16
49 (c )
49 (c)
16
51 (c )
uc49
51 (c))
0
51 (c),
15
49 (c )
15
51 (c )
14
49 (c )
14 23
51 (c ):
In the aggregate data this ordering holds, though the di¤erences are small. The values of
51 (c) are 8.2, 7.4 and 5.5 for decision problem 16, 15 and 14 respectively.
23
49 (c)
If it were the case that posterior beliefs when a is chosen in decision problem 13 are such it would be preferable
to choose c14 (if available) we additionally have the restriction,
49 (c
14
)
51 (c
14
)
49 (c
However this is not the case is our aggregate data.
23
13
)
13
51 (c )
The tests described consider only bilateral comparisons of attention strategies The NIAC condition requires more than this: it must be impossible to increase total surplus through any reassignment of attention strategies between problems. This condition holds on the aggregate data for
experiments 3, and 4. In experiment 2 it is violated in experiment 2 due to the slightly higher
accuracy in decision problem 5 than in decision problem 4. In experiment 1 it holds conditional on
actual choice at each posterior, but not given optimal choice at each posterior. Given that NIAS is
violated in decision problem 2, it would in fact be optimal for act f to be chosen at both posteriors
in this problem. However, in the strategies of decision problem 1 and 2 were swapped, it would be
optimal to make di¤erent choices at di¤erent posteriors in both decision problems, which would in
turn improve gross surplus.
In order to test the full NIAC condition at the individual level, …gure c2 plots the distribution
of actual surplus minus the maximal surplus possible by reassigning attention strategies to decision
problems. The NIAC condition demands this number to be zero. As a comparator, we show the
distribution obtained from random choice.
6.6
Comparison to Random Utility
In section 5 we discussed three variants of RUM: Uninformed, Perfectly Informed and Partially
Informed. All of these models have serious problems in explaining our data. Uninformed RUMs
imply that choice probabilities should be the same in each state. As illustrated in section 6.3 this
is clearly not the case in our data. Perfectly Informed RUMs imply that choice probabilities in any
state should only be a function only of payo¤s in that state. This implies that, in experiment 1,
choice probabilities in state 52 should be identical across decision problems 1-4. The probability of
choosing act f in these four decision problems is 46%, 69%, 15% and 74% respectively, signi…cantly
di¤erent at the 0.01% level.
We illustrated above two important distinctions between the Partially Informed RUM and
rational inattention. First, if subjects do not violate stochastic dominance in their choice, then
in decision problems 1-3 choice should in fact be deterministic conditional on the signal received.
We tested 24 subjects in experiment 1 for violations of stochastic dominance. We asked subjects a
set of 10 question in which they were asked to choose between act f which paid $10 if there were
49 red balls and act g which paid $10 if there were 51 red balls. The prior probability of 49 red
balls varied from 10% to 90%. However, for these questions, subjects did not get to see the screen
with the balls, and had to make their decisions based purely on the prior. Thus, subjects obeyed
stochastic dominance if and only if they chose act f when the prior on state 49 was greater than
equal to 50%. We found that indeed 83% of subjects obeyed stochastic dominance, suggesting that
for the majority of subjects all randomness we observe would have to be due to the exogenous
signal. This cannot be squared with the increase in attentiveness as rewards increase in experiment
2: by assumption, the signal in the Partially Informed RUM is exogenous.
Second, we argued that all RUMs, including the hybrid, must obey monotonicity. Experiment
4 shows that monotonicity can be violated, as suggested by MM. Table 10 shows the probability of
24
choosing option b in the four decision problems in this experiment.
Table 10
Decision problem
13
14
15
16
49 (b)
51 (b)
25%
27%
29%
26%
26%
34%
39%
38%
The introduction of act c increases the probability of choosing act b in state 51 from 26% to an
average of 37% acriss decison problems 14-16.. While small, this increase is signi…cant at the 10%
level.
7
Existing Literature
Much of the work on rational inattention in economics can be traced back to Sims [1998] and
Sims [2003] which characterized the behavioral impact of constraints on information processing in
linear quadratic control problems.24 The rate of information ‡ow is measured using a the Shannon
mutual information.25 Sims [2003] shows that such a constraint generates behavior similar to that
of assuming that an agent observes the state of the world only noisily. However, the type of
noise is determined endogenously, based on the incentives in the environment. Following this
paper, mutual information constraints have been incorporated in an increasing number of economic
settings, including consumption-savings problems (Sims [2006], Tutino [2008], and Mackowiak and
Wiederholt [2010]), pricing problems (Mackowiak and Wiederholt [2009], Matejka [2010], Martin
[2013]), monetary policy (Paciello and Wiederholt [2011]), and portfolio choice (Mondria [2010]).
In part, the focus on mutual information as the measure of information is justi…ed by its central
position in the information theory literature. The Shannon mutual information of two random
variables is related to the expected length in bits of the optimally encoded signal needed to generate
one from the other. It also has an axiomatic characterization which shows that information costs
must be of this form if they are to obey certain intuitive properties (see for example Csiszár [2008]).
Shannon mutual information costs also have interesting properties from an economic standpoint.
As discussed in the text, Matejka and McKay [2011] demonstrate a strong relationship between
mutual information based rational inattention and logit-style random choice. Cabrales et al. [2011]
demonstrate a further interesting link between mutual information and economic behavior. They
consider a ranking of information structures according to a “ruin averse” investor facing a class of
no-arbitrage investment problems and show that the ranking of information structures based on
willingness to pay is equivalent to that provided by mutual information.
While much of the rational inattention literature has focussed on mutual information costs, a
variety of other cost functions and constraints have been studied. Woodford [2012] points out that
mutual information does not imply that less attention will be paid to rare events (as such attention
is cheap in expectation), in violation of experimental results by Shaw and Shaw [1977]. He therefore
proposes an alternative measure in which the cost of an information structure is evaluated according
24
Although the study
[1961], Marschak [1971],
25
Although the study
[1961], Marschak [1971],
of costly
Milgrom
of costly
Milgrom
information aquisition in economics goes back much further - for example Stigler
[1981].
information aquisition in economics goes back much further - for example Stigler
[1981].
25
to the related concept of the Shannon capacity. Gul et al. [2012] consider the behavior of households
who are restricted to having “crude” consumption plans i.e. plans that are restricted to having at
most n realizations. Nieuwerburgh and Veldkamp [2009] consider a more general information cost
function, based on the distance between prior and posterior variance. Saint-Paul [2011] considers
the case in which decision makers face Shannon cost, but can only choose discrete policy functions
(essentially combining the approaches of Sims [2003] and Gul et al. [2012]). Reis [2006] considers the
case of a binary information choice: in any given period either attention can be paid, and the state
is fully revealed, or not, in which case no information is gathered. Even many of the articles that
ostensibly use mutual information costs e¤ectively restrict the decision maker to choose Gaussian
signals, implying additional constraints (see Sims [2006] for a discussion).
A key strength of our approach is that our model nests all of the above costs functions. The costs
of allowable attentional strategies can be captured by K, while the cost of inadmissible strategies
can be set to in…nity. The NIAS and NIAC conditions therefore provide a test of the entire class
of rational inattention models currently in use.26
A recent wave of decision theoretic literature has attempted to capture the observable implications of inattention, both rational and otherwise. Closest in spirit to our work is Ellis [2012],
who works with a data set similar to ours - state dependent choice functions. Ellis [2012] initially
asks under what conditions such choice data can be rationalized by a model in which the DM’s
information is a partition on the underlying state space, with choices are optimal given this partition. The characterization is based on the identi…cation of cells in the partition by all objective
states in which the same choice is made - an approach similar to that taken in our paper. This
allows for the identi…cation of the preferences reveled by choices. The basic condition that Ellis
[2012] applies is that revealed preference information has to be consistent with independence and
dynamic consistency. Under these conditions (along with continuity and monotonicity) the coarsest partition that can rationalize the data is uniquely identi…ed. Ellis [2012] then goes on to make
inattention “rational”by requiring that the partition in use for a particular set of acts is optimal in
the set of partitions available to the DM. The additional observable implications of this data are (i)
the irrelevance of independent acts - a version of the independence of irrelevant alternatives.(ii) a
further application of the independence axiom, which states that, if more information is used when
choosing from choice set A than from set B; then independence must hold.
There are two key di¤erences between the theoretical section of our paper and Ellis [2012]. On
the one hand, Ellis [2012] places weaker requirements on the data: unlike our approach, the DM’s
utility function and prior beliefs are derived from behavior rather than directly observable. On the
other Ellis [2012] considers a more restrictive class of information restrictions: the DMs in the Ellis
[2012] model e¤ectively face a cost function which is zero for allowable partitions and in…nity for all
other information structures. This restriction rules out any stochasticity in choice, as well as many
commonly used information cost functions (such as those based on Shannon Mutual Information).
A second decision theoretic approach to identifying rational attention is to examine choice over
menus. Ergin and Sarver [2010] consider a model in which a decision maker makes choices over
choice sets by optimally selecting a partition on (subjective and unobservable) states of the world,
then choosing the best action conditional on that partition. They characterize the implications for
such a model for choices between choice sets. Costly contemplation is characterized by an aversion
to contingent planning: an agent would prefer to …nd out which set they are choosing from and
26
Note that we consider only the instrumental value of information, not any intrinsic value that information might
have as in Grant et al. [1997].
26
then choose from that set, rather that have to make contingent plans. Mihm and Ozbek [2012]
extend this approach to the case in which there are observable states of the world, resulting in a
representation similar to that considered in this paper.
Our work is related to an ongoing project in which we aim to characterize choice behavior when
the internal information state of the agent is not directly observable. Van Zandt [1994] provide an
early negative result in this regard, showing that any choice behavior is rationalizable in a model
that allows for hidden costly information acquisition if the state of the world is not observable.
Caplin and Dean [2011] and Caplin et al. [2011] consider the case of sequential information search,
using an extended data set to derive behavioral restrictions of search of this kind as well as of satis…cing behavior. Caplin and Martin [2011] introduce the NIAS condition to characterize subjective
rationality in a single decision problem. Masatlioglu et al. [2012] characterize “revealed attention”,
using the identifying assumption that removing an unattended item from the choice set does not
a¤ect attention. Dillenberger et al. [2012] consider a dynamic problem in which the DM receives
information in each period, characterizing the resulting preference over menus.
In the psychology literature, theories to which we are close in spirit are signal detection theory
(Green and Swets [1966]) and categorization theory. Unlike our approach, these models tend to
assume that the attention strategy is …xed (commonly it is assumed that the DM gets a normal
signal). A common feature is that the DM must choose the optimal action at each posterior. Based
in part on these theories, there is an enormous experimental literature on signal detection and
categorization in psychology (much of which uses state dependent stochastic choice data).27
Despite the psychological precedents, there is little experimental work on state dependent stochastic choice data within economics, and no work in either …eld that tests NIAC and NIAS directly.
One related paper is Cheremukhin et al. [2011], which uses a formulation similar to Matejka and
McKay [2011] to estimate a rationally inattentive model on subject’s choice over lotteries. They
do not analyze the state dependence in the resulting stochastic choice data.
8
Conclusions
As economists increasingly focus on attentional constraints, so the importance of rational inattention theory has grown. We characterize a general model of rational inattention which encompasses
all models currently in the literature. The necessary and su¢ cient conditions are simple and readily testable. We …nd the model to do a qualitatively good job of explaining subject behavior in a
simple experimental implementation. In contrast, traditional random utility models fail to capture
important data features.
In addition to further investigating the comparison with random utility models, we are currently
exploring the behavioral content of more structured models of attention costs, in particular the
Shannon model.28 We are also continuing to explore the implications of the model for behavior in
important economic domains.
27
28
We do not attempt to summarize the literature here - see Verghese [2003] for a review.
Caplin and Dean [2013] makes a start in this direction.
27
References
Timothy K. M. Beatty and Ian A. Crawford. How demanding is the revealed preference approach
to demand? American Economic Review, 101(6):2782–95, October 2011.
Antonio Cabrales, Olivier Gossner, and Roberto Serrano. Entropy and the value of information
for investors. Economics Working Papers we1104, Universidad Carlos III, Departamento de
Economú a, March 2011.
Andrew Caplin and Mark Dean. Search, choice, and revealed preference. Theoretical Economics,
6(1), January 2011.
Andrew Caplin and Mark Dean. Rational inattention, entropy, and choice: The posterior-based
approach. Memeo, Center for Experimental Social Science, New York University, 2013.
Andrew Caplin and Daniel Martin. A testable theory of imperfect perception. NBER Working
Papers 17163, National Bureau of Economic Research, Inc, June 2011.
Andrew Caplin, Mark Dean, and Daniel Martin. Search and satis…cing. American Economic
Review, 101(7):2899–2922, December 2011.
Anton Cheremukhin, Anna Popova, and Antonella Tutino. Experimental evidence on rational
inattention. Technical report, 2011.
Raj Chetty, Adam Looney, and Kory Kroft. Salience and taxation: Theory and evidence. American
Economic Review, 99(4):1145–77, September 2009.
StephenA. Clark. The random utility model with an in…nite choice space. Economic Theory,
7:179–189, 1996.
Jacques Cremer. A simple proof of blackwell’s "comparison of experiments" theorem.
Journal of Economic Theory, 27(2):439–443, August 1982.
Imre Csiszár. Axiomatic Characterizations of Information Measures. Entropy, 10:261–273, 2008.
David Dillenberger, Juan Sebastian Lleras, Philipp Sadowski, and Norio Takeoka. A theory of
subjective learning. Technical report, 2012.
Andrew Ellis. Foundations for optimal attention. Memeo, Boston University, 2012.
Haluk Ergin and Todd Sarver. A Unique Costly Contemplation Representation. Econometrica,
78(4):1285–1339, 2010.
J C Falmagne. A Representation Theorem for Random Finite Scale Systems. Journal of Mathematical Psychology, 18:52–72, 1978.
S Grant, A Kajii, and B Polak. Intrinsic preference for information. Technical report, 1997.
D. M. Green and J. A. Swets. Signal detection theory and psychophysics. Wiley, New York, 1966.
Faruk Gul and Wolfgang Pesendorfer. Random Expected Utility. Econometrica, 74(1):121–146,
2006.
Faruk Gul, Wolfgang Pesendorfer, and Tomasz Strzalecki. Behavioral competitive equilibrium and
extreme prices. Memeo, Princeton University, 2012.
28
C.A. Holt and S.K. Laury. Risk aversion and incentive e¤ects. American Economic Review,
92(5):1644–1655, 2002.
Tjalling C. Koopmans and Martin Beckmann. Assignment problems and the location of economic
activities. Econometrica, 25(1):pp. 53–76, 1957.
Nicola Lacetera, Devin G. Pope, and Justin R. Sydnor. Heuristic thinking and limited attention
in the car market. NBER Working Papers 17030, National Bureau of Economic Research, Inc,
May 2011.
Graham Loomes and Robert Sugden. Incorporating a stochastic element into decision theories.
European Economic Review, 39(3-4):641–648, April 1995.
Bartosz Mackowiak and Mirko Wiederholt. Optimal sticky prices under rational inattention. American Economic Review, 99(3):769–803, June 2009.
Bartosz Adam Mackowiak and Mirko Wiederholt. Business cycle dynamics under rational inattention. CEPR Discussion Papers 7691, C.E.P.R. Discussion Papers, February 2010.
CharlesF. Manski. The structure of random utility models. Theory and Decision, 8:229–254, 1977.
Jacob Marschak. Economics of information systems. Journal of the American Statistical Association, 66(333):192–219, March 1971.
Daniel Martin. Strategic pricing and rational inattention to quality. Memeo, New York University,
2013.
Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y. Ozbay. Revealed attention. American
Economic Review, 102(5):2183–2205, August 2012.
Filip Matejka and Alisdair McKay. Rational inattention to discrete choices: A new foundation
for the multinomial logit model. CERGE-EI Working Papers wp442, The Center for Economic
Research and Graduate Education - Economic Institute, Prague, June 2011.
Filip Matejka. Rationally inattentive seller: Sales and discrete pricing. CERGE-EI Working Papers
wp408, The Center for Economic Research and Graduate Education - Economic Institute, Prague,
March 2010.
Daniel McFadden. Revealed stochastic preference: a synthesis. Economic Theory, 26(2):245–264,
08 2005.
Maximilian Mihm and M. Kemal Ozbek. Decision making with rational inattention. Working
paper, Social Science Research Network, 2012.
Paul R Milgrom. Rational expectations, information acquisition, and competitive bidding. Econometrica, 49(4):921–43, June 1981.
Jordi Mondria. Portfolio choice, attention allocation, and price comovement. Journal of Economic
Theory, 145(5):1837–1864, September 2010.
David J. Murray. A perspective for viewing the history of psychophysics. Behavioral and Brain
Sciences, 16:115–137, 2 1993.
29
Koch C. Navalpakkam, V. and P Perona. Homo economicus in visual search. Journal of Vision,
9(1):1–16, January 2009.
Stijn Van Nieuwerburgh and Laura Veldkamp. Information immobility and the home bias puzzle.
Journal of Finance, 64(3):1187–1215, 06 2009.
Luigi Paciello and Mirko Wiederholt. Exogenous information, endogenous information and optimal
monetary policy. Technical report, 2011.
Ricardo Reis. Inattentive Consumers. Journal of Monetary Economics, 53(8):1761–1800, 2006.
Jean-Charles Rochet. A necessary and su¢ cient condition for rationalizability in a quasi-linear
context. Journal of Mathematical Economics, 16(2):191–200, April 1987.
Gilles Saint-Paul. A "quantized" approach to rational inattention. TSE Working Papers 10-144,
Toulouse School of Economics (TSE), 2011.
Babur De Los Santos, Ali Hortacsu, and Matthijs R. Wildenbeest. Testing models of consumer
search using data on web browsing and purchasing behavior. American Economic Review,
102(6):2955–80, October 2012.
M. L. Shaw and P. Shaw. Optimal allocation of cognitive resources to spatial locations. J Exp
Psychol Hum Percept Perform, 3(2):201–211, May 1977.
Christopher A. Sims. Stickiness. Carnegie-Rochester Conference Series on Public Policy, 49(1):317–
356, December 1998.
Chris Sims. Implications of Rational Inattention. Journal of Monetary Economics, 50(3):665–690,
2003.
Chris Sims. Rational Inattention: A Research Agenda. 2006.
George J. Stigler. The economics of information. Journal of Political Economy, 69:213, 1961.
Antonella Tutino. The rigidity of choice: Lifecycle savings with information-processing limits.
Technical report, 2008.
Timothy Van Zandt. Hidden information acquisition and static choice. CORE Discussion Papers
,
1994017, UniversitÃl catholique de Louvain, Center for Operations Research and Econometrics
(CORE), 1994.
Preeti Verghese. Visual search and attention: A signal detection theory approach.
31(4):523–535, 2003.
Neuron,
Michael Woodford. Information-constrained state-dependent pricing. NBER Working Papers 14620,
National Bureau of Economic Research, Inc, December 2008.
Michael Woodford. Inattentive valuation and reference-dependent choice. Memeo, Columbia University, 2012.
30
9
9.1
Appendix A: Proofs
Lemma 1
Lemma 1 Given decision problem ( ; A) 2
data, then is su¢ cient for ( ;q) :
F and q 2 Q , if
2
is consistent with these
Proof. Let 2
be an attention strategy that is consistent with q 2 Q in decision problem
( ; A) . First, we list in order all distinct posteriors i 2 ( ) for 1 i j ( )j. Given that
is consistent with q, there exists a corresponding optimal choice strategy Y : f1; :::; Ig !
(A), with Y i (f ) denoting the probability of chosing act f 2 F (q) with posterior i , such
that the attention and choice functions match the data,
qm (f ) =
I
X
i
m(
)Y i (f ):
i=1
We also list in order all possible posteriors j 2
all chosen acts that are associated with posterior
Fj
;q)
ff 2 F jr(
j
( ( ;q) ), 1
as F j ,
(f ) =
j
j
j (
( ;q) )j,
and identify
g:
The garbling matrix bij sets the probability of j 2 given
all choices associated with acts f 2 F j .
X
bij =
Y i (f ):
i
2 ( ) as the probability of
f 2F j
Note that this is indeed a j ( )j j ( )j stochastic matrix B
Given j 2 ( ) and m 2 , note that,
I
X
bij
i=1
i
m( ) =
I
X
m(
i=1
i
)
X
Y i (f ) =
f 2F j
X
0 with
PJ
j=1 b
ij
= 1 all i.
qm (f );
f 2F j
by the data matching property. It is de…nitional that m ( j ) is precisely equal to this, as the
observed probability of all acts associated with posterior j 2 . Hence,
m(
j
)=
I
X
bij
m(
i
);
i=1
as required for su¢ ciency.
9.2
Theorem 1
Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es NIAS
and NIAC.
31
Proof of Necessity. To con…rm that existence of a rational inattention representation (K; ) of
(D; q) implies that the NIAS condition is satis…ed, consider ( ; A) 2 D and a corresponding choice
strategy Y : ( ( ; A)) ! (A) such that …nal choices are optima and that matches the data,
X
qm (f ) =
m ( )Y (f ):
2 ( ( ;A))
Given f 2 q (
;A) ,
consider all posteriors
f
2
for which Y (f ) > 0,
(f ) = f (( ( ; A))jY (f ) > 0g :
Note that it must be the case that
M
X
M
X
f
m Um
m=1
g
m Um
m=1
all g 2 A for all posteriors
2 ,
Note also that we can rewrite the revealed posterior beliefs as a weighted average of the true
posterior beliefs
2
3
X
4
m ( )Y (f )5
m
qm (f )
2 ( ( ;A))
( ;q)
=:
rm
(f ) = Mm
M
X
X
q
(f
)
j j
j qj (f )
j=1
=
X
2 ( (
j=1
2
6
6
6
6
6
;A)) 4
M
X
j=1
m
3
j m(
)Y (f ) 7
7
7
7=
M
7
X
5
q
(f
)
j
j
X
mP (
2 ( ( ;A))
jf )
j=1
Where the second line is obtained by dividing and multimplying each term in the sum by
M
X
j m(
j=1
)Y (f ), or the probability of state , and P ( jf ) indicates the probability of state
given that f was chosen
The value of choosing act f at its revealed posterior is therefore
M
X
( ;q)
f
rm
(f )Um
=
m=1
X
P ( jf )
X
P ( jf )
2 ( ( ;A))
2 ( ( ;A))
=
M
X
m=1
f
m Um
m=1
M
X
g
m Um
m=1
( ;q)
g
rm
(f )Um
8g2A
Where the middle inequality stems from the fact that
M
X
m=1
every
M
X
2 ( ( ; A))
32
f
m Um
M
X
m=1
g
m Um
all g 2 A for
Next we show that, if there exists a rational attention representation (K; ) of (D; q), then
N IAC must hold For any sequence ( ; A1 ); ( ; A2 ); :::::( ; AJ ) 2 D with AJ = A1 , it must be the
case that
J
X1
G( ; Aj ;
( ;Aj )
)
K( ;
( ;Aj )
J
X1
)
j=1
J
X1
G( ; Aj ;
G( ; Aj ;
( ;Aj+1 )
)
K( ;
( ;Aj+1 )
j=1
( ;Aj )
)
G( ; Aj ;
( ;Aj+1 )
J
X1
)
j=1
K( ;
( ;Aj )
)
K( ;
( ;Aj+1 )
))
)=0
j=1
Where the last equality stems from the fatc that K( ; (
that
J
J
X1
X1
G( ; Aj ; ( ;Aj ) )
G( ; Aj ;
j=1
;A1 ) )
= K( ;
( ;Aj+1 )
( ;AJ ) ).
This implies
)
j=1
It is immediate that G( ; Aj ; ( ;Aj ) ) = G( ; Aj ; ( ;Aj ) ) 8 j, as the minimal attention strategy
implies the same pattern of stochastic choice as does the original attention strategy. Moreover,
by lemma 1, we know that ( ;Aj ) is su¢ cient for ( ;Aj ) 8 j, and so, by Blackwell’s theorem
G( ; Aj ; ( ;Aj+1 ) )
G( ; Aj ; ( ;Aj+1 ) ) 8 j.(see remark 1). Thus the true attention strategies
( ;Aj ) can be replaced by the minimal attention strategies ( ;Aj ) in the above expression and the
inequality will still hold, implying NIAC.
Proof of Su¢ ciency. There are three steps in the proof that the NIAS and NIAC conditions
are su¢ cient for (D; q) to have a rational inattention representation. The …rst step is to establish
that the NIAC conditions ensures that there is no global reassignment of the minimal attention
strategies observed in the data to decision problems ( ; A) 2 D that raises total gross surplus. The
second step is use this observation to de…ne a candidate cost function on attentional strategies,
K:
! R [ 1. The key is to note that, as the solution to the classical allocation problem of
Koopmans and Beckmann [1957], this assignment is supported by “prices” set in expected utility
units. It is these prices that de…ne the proposed cost function. The …nal step is to apply the NIAS
conditions to show that K;
represents a rational inattention representation of (D; q), where
comprises minimal attention strategies.
Consider any prior 2
such that there exists two or more sets A such that ( ; A) 2 D.
Enumerate these sets as Al for 1 l L. De…ne the corresponding minimal attention strategies l
for 1 l L as each is revealed in the corresponding data q ( ;Al ) . Note that the minimal attentions
strategies may not all be distinct. In cases in which the same strategy appears more than once,
one retains all copies so that the cardinality of the set of strategies precisely matches that of the
set of underlying decision problems at L. With this, one can consider the set M of all matchings
of minimal attention strategies as identi…ed by their index 1 l L with correspondingly indexed
decision problems. Formally, each such matching is a 1-1 function m : f1; ::Lg ! f1; ::Lg in which
strategy m(l) is applied with decision problem ( ; Al ). Given each such matching, one can de…ne
the corresponding sum of gross utiilties as,
S G (m) =
L
X
G( ; Al ;
m(l)
):
l=1
The claim is that the NIAC condition implies that the identify map mI (l) = l maximizes this
33
sum over all matching functions m 2 M. Suppose to the contrary that there exists some alternative
matching function that achieves a strictly higher sum, and denote this match m 2 M. In this case
construct a …rst sub-cycle as follows: start with the lowest index l1 such that m (l1 ) 6= l1 . De…ne
m (l1 ) = l2 and now …nd m(l2 ), noting by construction that m(l2 ) 6= l2 . Given that the domain is
…nite, this process will terminate after some J
L steps with m (lJ ) = l1 . If it is the case that
m (l) = l outside of the set [Jj=1 lj , then we know the increase in the value of the sum is associated
only with this cycle, hence,
J
X1
G( ; Alj ;
lj
)<
j=1
J
X1
G( ; Alj ;
lj+1
);
j=1
directly in contradition to NIAC. If this inequality does not hold strictly, then we know that there
exists some l0 outside of the set [Jj=1 lj such that m (l0 ) 6= l’. We can therefore iterate the process,
knowing that the above strict inequality must be true for at least one such cycle to explain the
strict increase in overall gross utility. Hence the identity map mI (l) = l indeed maximizes S G (m)
amongst all matching functions m 2 M.
The fact that mI (l) = l maximizes S G (m) enables us to apply the results of Koopmans and
Beckmann [1957] who used linear programming techniques to solve allocation problems of this form.
Their results directly imply that any solution to this problem of optimally matching production
functions (the attentional strategies) with locations (the decision problems) can be decentralized
by charging prices (not necessarily positive) for either resource, and leaving the owners of the other
resource to maximize their pro…ts. To apply this to our problem, consider any 2 such that
there exist sets ( ; Al ) 2 D for 1 l L with L 2, de…ne
as the set of all minimal attention
strategies,
[( ;A)2D ( ;A) :
The result of Koopmans and Beckmann directly implies existence of a real-valued function K :
! R that decentralizes the problem from the viewpoint of the owner of the decision problems,
seeking to identify surplus maximizing attentional strategies to match to their particular problems.
The de…ning characeristic of these costs is optimality of the revealed minimal attention strategies,
G( ; Al ;
all
2
l
)
K ( l)
G( ; Al ; )
K ( );
.
To complete the de…nition of the cost function, consider all 2 such that there exist a unique
decision problem ( ; A) 2 D, and set the cost of the corresponding minimal attentional strategy
( ;A) = 0 Now complete the function across all
to zero K
2 such that there exist one or
more ( ; A) 2 D by setting K( ; ) = 1 for 6= ( ;A) . Finally, for all 2 such that there is no
( ; A) 2 D, set K( ) = 0 all 2
and K( ; ) = 1 for 2
=
.
Note that we have now completed construction of a qualifying cost function K :
!
R [ 1 that satis…es K( ; ) = 1 for 2
=
and K( ; ) 2 R for some 2
. The entire
construction was aimed at ensuring that the observed attentional strategy choices were always
maximal, ( ; A) 2 ^ (K; ; A) for all ( ; A) 2 D. It remains to prove that ( ;A) is consistent
with q ( ;A) for all ( ; A) 2 D. This requires us to show that, for all ( ; A) 2 D, the choice rule
that associates with each 2 ( ( ;A) ) the certainty of choosing the associated act f 2 F ( ; A)
as observed in the data is both optimal and matches the data. That it is optimal is the precise
34
content of the NIAS constraint,
M
X
M
X
( ;A)
f
rm
(f )Um
m=1
( ;A)
g
rm
(f )Um
;
m=1
for all g 2 A. That this choice rule and the corresponding minimal attention function match the
data holds by construction.
9.3
Theorem 2
Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention
representation with conditions K1 to K3 satis…ed.
Proof. The proof of necessity is immediate form theorem 1. The proof of su¢ ciency proceeds in
four steps, starting with a rational inattention representation K;
of (D; q) of the form produced
in theorem 1 based on satisfaction of the NIAS and NIAC conditions. A key feature of this function
is that, given 2 such that ( ; A) 2 D some A 2 F, the function K is real-valued only on the
minimal information strategies
f ( ;A) j ; A) 2 Dg associated with all corresponding decision
problems, otherwise being in…nite. The …rst step is the proof is to construct for each such 2 a
larger domain
on which K is real-valued to satisfy four properties: to include all minimal
attention strategies,
; to include the inattentive strategy, I 2
; to be closed under
mixtures so that ; 2
and 2 (0; 1) implies
(1
)
2
; and to be “closed under
garbling,” so that if 2
is su¢ cient for attentional strategy 2
, then 2
. The second
step is, for each 2 , to de…ne a new function K that preserves the essential elements of K
while being real-valued on the larger domain
, and thereby to construct the full candidate
cost function K :
! R [ 1. The third step is to con…rm that K 2 K and that K satis…es
the required conditions K1 through K3. The …nal step is to con…rm that K;
forms a rational
inattention representation of (D; q).
Given 2 such that ( ; A) 2 D some A 2 F, we construct the domain
in two stages.
First, we de…ne all attention strategies for which some minimal attentional strategy
2
is
su¢ cient;
j9 2
su¢ cient for g:
S =f 2
Note that this is a superset of
and that it contains I . The second step is to identify
as the
smallest mixture set containing S : this is itself a mixture set since the arbitrary intersection of
mixture sets is itself a mixture set.
By construction,
has three of the four desired properties: it is closed under mixing; it
contains
, and it contains the inattentive strategy. The only condition that needs to be checked
is that it retains the property of being closed under su¢ ciency:
2
su¢ cient for
2
=)
2
To establish this, it is useful …rst to establish certain properties of
S is closed under garbling:
2
S
su¢ cient for
35
2
=)
2
.
S
S.
and of
. The …rst is that
Intuitively, this is because the garbling of a garbling is a garbling. In technical terms, the product of
the corresponding garbling matrices is itself a garbling matrix. The second is that one can explicity
express
as the set of all …nite mixtures of elements on S ,
8
9
J
<
=
X
j
J 1
j
=
=
jJ 2 N; ( 1 ; :: J ) 2 S
; 2 S ;
j
:
;
j=1
where S J 1 is the unit simplex in RJ: To make this identi…cation, note that the set as de…ned on
the RHS certainly contains S and is a mixture set, hence is a superset of
. Note moreover that
all elements in the RHS set are necessarily contained in any mixture set containing S by a process
of iteration, making is also a subset of
, hence …nally one and the same set.
We now establish that if 2
is a garbling of some 2
, then indeed 2
. The …rst
step is to express 2
as an appropriate convex combination of elements of S as we now know
we can,
J
X
j
=
:
j
j=1
with all weights strictly positive, j > 0 all j. Note that, given any such expression, one can
generate another set of elements ~ j 2 S with the additional property that they all have precisely
the same support as does , ( j ) = ( ). To see this, note that we can de…ne as each ~ j as a
mixture of j and itself,
j
~j =
+ (1
) ;
with weight
2 (0; 1) that is independent of j. With this, the mixture property
=
J
X
j
~ j is
j=1
preserved, while the possible posteriors associated with each ~ j become the common set ( ). As
a mixture of elements from the set S , note that ~ j 2 S . To prevent notation from proliferating,
we assume that the initial set j in the above expression all have support ( j ) = ( ). Lemma 2
below establishes that in this case there exist garblings j of j 2 S such that,
=
J
X
j
j
;
j=1
establishing that indeed
of j implies j 2 S .
2
since, with
S
closed under garbling,
j
2
S
and
j
a garbling
Given 2 such that ( ; A) 2 D some A 2 F, we de…ne the function K on
in three stages.
First we de…ne the function KS on the domain S by identifying for any 2 S the corresponding
set of minimal attention strategies 2
of which is a garbling, and assigning to it the lowest
such cost. Formally, given 2 S ,
KS ( )
min
f 2
j
K( ):
su¢ cient for g
Note that KS ( ) = K( ) all 2
. To see this, consider ( ; A); ( ; A0 ) 2 D with
(
;A)
su¢ cient for
. By the Blackwell property, expected utility is at least as high using (
(
;A)
using
for which it is su¢ cient,
G( ; A;
( ;A0 )
)
G( ; A;
36
( ;A)
):
( ;A0 )
;A0 )
as
At the same time, since K;
( ;A) 2 ^ (K; ; A), so that,
G( ; A;
is a rational attention representation of (D; q), we know that
( ;A)
Together these imply that K(
2
.
)
K(
( ;A) )
( ;A)
K(
)
( ;A0 )
G( ; A;
( ;A0 ) ),
)
K(
( ;A0 )
):
which in turn implies that KS ( ) = K( ) all
Note that KS ( ) also satis…es weak monotonicity on S on this domain, since if we are given
; 2 S with su¢ cient for , then we know that any strategy 2
that is su¢ cient for is
also su¢ cient for , so that the minimum de…ning KS ( ) can be no lower than that de…ning KS ( ).
The second stage in the construction is extend the domain of the cost function from
As noted above, this set comprises all …nite mixtures of elements of S ,
8
9
J
<
=
X
j
J 1
j
=
=
jJ
2
N;
(
;
::
)
2
S
;
and
2
j
1
J
S; :
:
S
to
.
j=1
Given 2
, we take the set of all such mixtures that generate it and de…ne K ( ) to be the
corresponding in…mum,
K ( )=8
>
<
>
:
inf
X
J
J2N; 2S J
1 ;f j gJ
j=1 2
Sj
=
j
j
>
;
j=1
J
X
9
>
= j=1
j KS (
j
):
Note that this function is well de…ned since KS is bounded below by the cost of inattentive strategies
and the feasible set is non-empty by de…nition of
. We establish in Lemma 3 that the in…mum
is achieved. Hence, given
2
, there exists J 2 N; 2 S J 1 ; and elements j 2 S with
J
X
j such that,
=
j
j=1
K ( )=
J
X
j KS (
j
):
j=1
We show now that K satis…es K2, mixture feasibility. Consider distinct strategies 6= 2
.
We know by Lemma 3 that we can …nd J ; 2 N; corresponding probability weights ; 2 S ;
J
J
X
X
j
j
j
j , and such that,
and elements ; 2 S with =
, =
j
j
j=1
K ( ) =
j=1
J
X
j KS (
j
j KS (
j
);
j=1
K ( ) =
J
X
):
j=1
Given
2 (0; 1), consider now the mixture strategy de…ned by taking each strategy j with
j with probability (1
) j . By construction, this mixture
probability
j and each strategy
37
strategy generates
K ( ) that,
K ( )
=[
J
X
+ (1
j KS (
j
)
)+
j=1
J
X
] 2
and hence we know by the in…mum feature of
(1
j
j KS (
)
) = K ( ) + (1
)K ( );
j=1
con…rming mixture feasibility.
We show also that K satis…es K3, weak monotonicity in information. Consider ; 2
with
su¢ cient for . We know by Lemma 3 that we can …nd J 2 N; 2 S J 1 ; and corresponding
J
X
J
j
j
j and such that,
elements
2 S with …xed range ( ) = ( ) such that =
j
j=1
j=1
K ( )=
J
X
j KS (
j
):
j=1
j J
j=1
We know also from Lemma 2 that we can construct
such that each
on its domain
j
is a garbling of the corresponding
S , we conclude that,
KS (
j
j.
2
S
such that
=
J
X
j
j
and
j=1
Given that KS satis…es weak monotonicity
KS ( j ):
)
By the in…mum feature of K ( ) we therefore know that,
K ( )
J
X
J
X
j
j KS ( )
j
) = K ( );
j=1
j=1
con…rming weak monotonicity.
j KS (
We show now that we have retained the properties that made K;
a rational inattention
representation of (D; q) for prior
2 . It is immediate that
and the choice function that
involves picking act f 2 F ( ;A) for sure in revealed posterior r( ;A) (f ) is consistent with the data,
since this was part of the initial de…nition. What needs to be con…rmed is only that the revealed
minimal attentional strategies are optimal. Suppose to the contrary that there exists ( ; A) 2 D
such that,
G( ; A; ) K ( ) > G( ; A; ( ;A) ) K ( ( ;A) );
for some
2
. By Lemma 3 we can …nd J 2 N; a strictly positive vector
J
X
J
j
j and such that,
2 S , such that =
corresponding elements
j
j=1
j=1
K ( )=
J
X
j KS (
j
):
j=1
By the fact that
=
J
X
j
j
and by construction of the mixture strategy,
j=1
G( ; A; ) =
J
X
j=1
38
j G(
; A;
j
);
2 SJ
1;
and
so that,
J
X
j
G( ; A;
j
)
j
KS (
) > G( ; A;
( ;A)
)
( ;A)
K (
):
j=1
We conclude that there exists j such that,
j
G( ; A;
)
KS (
j
) > G( ; A;
( ;A)
)
K (
( ;A)
):
Note that each j 2 S inherits its cost KS ( j ) from an element j 2
that is the lowest
cost minimal attentional strategy according to K on set
that is su¢ cient for j ,
KS (
j
)=K (
j
);
where the last equality stems from the fact (established above) that KS ( ) = K ( ) on 2
.
Note by the Blackwell property that each strategy j 2
o¤ers at least as high gross value as the
strategy
for which it is su¢ cient, so that overall,
G( ; A;
j
)
K (
j
)
G( ; A;
j
)
KS (
j
) > G( ; A;
( ;A)
)
K (
( ;A)
):
To complete the proof it is su¢ cient to show that,
K ( ) = K ( );
on
2
: With this we derive the contradiction that,
G( ; A;
j
)
K (
j
) > G( ; A;
in contradiction to the assumption that K;
(D; q).
( ;A)
)
K (
( ;A)
);
formed a rational inattention representation of
To establish that K ( ) = K ( ) on 2
, note that we know already that KS ( ) = K ( )
on 2
. If this did not extend to K ( ), then we would be able to identify a mixture strategy
2
su¢ cient for ( ;A) with strictly lower expected costs, K ( ) < K ( ). This see that this
is not possible, note …rst from Lemma ?? that all strategies that are consistent with ( ; A) and
q ( ;A) are su¢ cient for ( ;A) . Weak monotonicity of K on
then implies that the cost K ( )
of any mixture strategy su¢ cient for ( ;A) satis…es K ( ) K ( ), as required.
The …nal and most trivial stage of the proof is to ensure that normalization holds. We …rst
normalize each function K ( ) for those 2 for which ( ; A) 2 D for some A 2 F. In such
cases we note that I 2 S , so that KS (I ) 2 R according to the rule immediately above. If we
renormalize this function by subtracting KS (I ) from the cost function for all attention strategies
associated with this prior then we impact no margin of choice and do not interfere with mixture
feasibility, weak monotonicity, or whether or not we have a rational inattention representation.
Hence we can avoid pointless complication by assuming that K (I ) = 0 from the outset so that
this normalization is vacuous. We have now fully-speci…ed a cost function K :
! R for all
2 such that ( ; A) 2 D some A 2 F.
All that remains to complete our de…nition of the candidate cost function K is to expand the
domain to include all inattentive strategies for the irrelevant priors 2 for which there is no
corresponding ( ; A) 2 D. We de…ne
= I in such cases and set and K (I ) = 0. Note that
this single element domain is trivially closed under garbling and under mixtures. With this, we
39
de…ne the candidate cost function K :
cost functions as de…ned above,
! R [ 1 by patching together the set of prior-based
K ( ) if 2
1 if 2
=
:
K( ; ) =
Note that weak monotonicity implies that the function is non-negative on its entire domain.
It is immediate that K 2 K, since K( ; ) = 1 for 2
=
and all domains
for 2 contain
the corresponding inattentive strategy I on which K( ; ) is real-valued. It is also immediate that
K satis…es K3, since K (I ) = 0 by construction. It also sati…es K1 and K2, and represents a
rational inattention representation, completing the proof.
Lemma 2 If
=
J
X
j
j
1
(
j)
2 SJ
with J 2 N;
= ( ) all j, then for any garbling
1
with
j
j
of , there exist garblings
=
J
X
j
j
j J
j=1
> 0 all j, and
of
j
2
2
, with
such that,
;
j=1
Proof. By assumption, there exists a j ( )j j ( )j matrix B with
for all k 2 ( ),
X
k
(
)
=
bik m ( i ):
m
i2
P
k
bik = 1 all k and such that,
( )
Given that ( j ) = ( ) all j, this same matrix can be applied to all vectors
garblings j of each j ,
X
j
k
bik jm ( i ):
m( ) =
i2
j
to generate
( )
It is clear that this satis…es the required condition
=
J
X
j
j
since,
j=1
m(
k
)=
X
i2
b
ik
)=
X
i2
( )
Lemma 3 Given
m(
i
2
b
( )
ik
J
X
j
i
j m( )
=
j=1
, there exists J 2 N;
J
X
j=1
2
SJ 1,
K ( )=
j=1
40
i2
j KS (
j
):
bik jm ( i )
=
J
X
j
k
j m ( ):
j=1
( )
and elements
such that,
J
X
j
X
j
2
S
with
=
J
X
j=1
j
j
Proof. By de…nition K ( ) is the in…mum of
J
X
j KS (
j)
j J
j=1
over all lists
j=1
=
J
X
j
j.
2
S
such that
We now consider a sequence of such lists, indicating the order in this sequence
j=1
j (n) J(n) ,
j=1
in parentheses,
J(n)
with
=
X
j (n)
j (n)
such that in all cases there are corresponding weights (n) 2 S J(n)
1
and that achieve a value that is heading in the limit to the in…mum,
j=1
J(n)
lim
n !1
X
j
j (n)KS (
(n)) = K ( ).
j=1
A …rst issue that we wish to avoid is limitless growth in the cardinality J(n). The …rst key
observation is that, by Charateodory’s theorem, we can reduce the number of strictly positive
J (n)
X
j (n) to have cardinality J (n)
weights in a convex combination =
M + 1. We
j (n)
j=1
J (n)
con…rm now that we can do this without raising the corresponding costs,
X
j (n)KS (
j (n)).
j=1
Suppose that there is some integer n such that the original set of attentional strategies has strictly
higher cardinality J(n) > M + 1. Suppose further that the …rst selection of J 1 (n)
M +1
such posteriors for which there exists a strictly positive probability weights 1j (n) such that =
J 1 (n)
X
1
j (n)
j (n)
has higher such costs (note WLOG that we are treating these as the …rst J 1 (n)
j=1
1
j (n)
attention strategies in the original list). It is convenient to de…ne
so that we can express this inequality in the simplest terms,
J(n)
X
= 0 for J 1 (n)+1
j
J(n)
J(n)
j
1
j (n)KS ( (n))
>
j=1
X
j (n)KS (
j
(n):
j=1
This inequality sets up an iteration. We …rst take the smallest scalar
1 1
j (n)
=
2 (0; 1) such that,
j (n):
J 1 (n)
That such a scalar exists follows from the fact that
1
X
j=1
J(n)
1
j (n)
=
X
j (n)
= 1, with all components
j=1
in both sums strictly positive and with J(n) > J 1 (n). We now de…ne a second set of probability
weights 2j (n),
1 1 (n)
j (n)
j
2
:
(n)
=
j
1
1
41
J(n)
for 1
j
J(n). Note that these weights have the property that
=
X
2
j (n)
j (n)
and that,
j=1
J(n)
X
J(n)
2
j
j (n)KS ( (n))
=
j=1
X
j=1
"
1 1 (n)
j
1
j (n)
1
#
J(n)
KS (
j
(n)) <
X
j (n)KS (
j
(n):
j=1
By construction, note that we have reduced the number of strictly positive weights 2j (n) by at
least one to J(n) 1 or less. Iterating the process establishes that indeed there exists a set of no
more than M + 1 posteriors such that a mixture produces that …rst strategy and in which this
mixture has no higher weighted average costs than the original strategy. Given this, there is no
loss of generality in assuming that J(n) M + 1 in our original sequence.
With this bound on cardinality, we know that we can …nd a subsequence of attentional strategies
which all have precisely the same cardinality J(n) = J
M + 1 all n. Going further, we
j (n) 1 . First, we can select
can impose properties on all of the J corresponding sequences
n=1
subsequences in which the ranges of all corresponding attention functions have the same cardinality
independent of n,
( j (n)) = K j
j (n)
for 1
j
J. With this, we can index the possible posteriors jk (n) 2 ( j (n)) in order,
j
1 k K and then select further subsequences in which these posteriors themselves converge to
limit posteriors,
jk
(L) = lim jk (n) 2 :
n!1
We ensure also that both the associated state dependent probabilities themselves and the weights
J(n)
X
j (n) converge,
=
j (n) in the expression
j (n)
j=1
lim
jk
m
n!1
lim
n!1
(n)
=
jk
m (L);
j (n)
=
j (L):
The …nal selection of a subsequence ensures that, given 1 j J, each j (n) 2 S has its value
de…ned by precisely the same minimal attentional strategy j 2
as the least expensive among
those that were su¢ cient for it and hence whose cost it was assigned in function KS . Technically,
for each 1 j J,
KS ( j (n)) = K ( j );
for 1 n 1: this is possible because the data set and hence the number of minimal attention
strategies is …nite.
We …rst use these limit properties to construct a list of limit attention strategies
J
X
j for 1
j J. Strategy j (L) has range,
with =
j
j=1
(
j
j
(L)) = [K
k=1
42
jk
(L);
j (L)
2
S
with state dependent probabilities,
j
(L)
jk
Note that the construction ensures that
=
jk
m (L):
(L) =
m
J
X
j (L).
j (L)
To complete the proof we must
j=1
establish only that,
K ( )=
J
X
j
j (L)KS (
(L)):
j=1
We know from the construction that, for each n;
J
X
j (n)KS (
j
(n)) =
j=1
J
X
j (n)K
(
j
):
j=1
Hence the result is established provided only,
KS (
j
(L))
K (
j
);
which is true provided j being su¢ cient for all j (n) implies that j is su¢ cient for the corresponding limit vector j (L). That this is so follows by de…ning B j (L) = [bik (L)]j to be the limit of
any subsequence of the j ( j )j K j stochastic matrices B j (n) = [bik (n)]j which have the de…ning
property of su¢ ciency,
X
j
i
(n) m ( jk (n)) =
[bik (n)]j
m ( );
i2
(
j)
for all jk (n) 2 ( j (n)) and 1 m M . It is immediate that the equality holds up in the limit,
establishing that indeed j is su¢ cient for each corresponding limit vector j (L), con…rming …nally
that KS ( j (L)) K ( j ) and with it establishing the Lemma.
43
10
Appendix B: Examples of NIAS and NIAC Violations from
RUMs
To be completed
44
11
Appendix C: Individual Costs of NIAS and NAIC Violations
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Figure c1: $ losses due to NIAS violations - actual and simulated subjects
45
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Figure C2: $ losses due to NIAC violations - actual and simulated subjects
46
Download