Rational Inattention and State Dependent Stochastic Choice Andrew Caplin and Mark Dean February 22 2013 Preliminary and Incomplete - Please Do Not Circulate Abstract Economists are increasingly interested in how attention impacts behavior. Rational inattention theory models the allocation of attention in an optimizing framework. We characterize patterns of stochastic choice consistent with a general model of rational inattention, extending results based on Shannon information costs (Matejka and McKay [2011], Caplin and Dean [2013]). We experimentally elicit “state dependent” stochastic choice data of the form required to test the model. Rational inattention theory does a qualitatively better job of matching this data than do standard stochastic choice models that ignore the link between incentives and attention. 1 Introduction Limits on attention impact choice. Shoppers may buy unnecessarily expensive products due to their failure to notice whether or not sales tax is included in stated prices (Chetty et al. [2009]). Buyers of second-hand cars focus their attention on the left hand most digit of the odometer when evaluating the available alternatives (Lacetera et al. [2011]). Purchasers limit their attention to a relatively small number of websites when buying over the internet (Santos et al. [2012]). Given its e¤ect on choice, the forces that determine attentional e¤ort are being intensively studied. The rational inattention model (Sims [1998], Sims [2003]) is particularly in‡uential, capturing as it does the balancing act between the improved decision quality that attention produces and the entailed costs in terms of time and e¤ort. It is being applied to model a wide array of phenomena, from price stickiness to consumption dynamics to portfolio choice.1 Yet understanding the implications of rational inattention can be challenging. The information that a decision maker obtains is not directly observable, making it di¢ cult to di¤erentiate between rationally inattentive choices and internally inconsistent mistakes. In this paper we develop a simple characterization of the choice behavior consistent with rational inattention, and implement the corresponding tests in a laboratory experiment. We make We thank Roland Benabou, Federico Echenique, Andrew Ellis, Daniel Martin, Stephen Morris, Pietro Ortoleva and Mike Woodford for their constructive contributions. We also thank Samuel Brown for his exceptional research assistance and Severine Toussaert and Isabel Trevino for their help in running the experiments 1 See Sims [2006], Tutino [2008], Woodford [2008], Nieuwerburgh and Veldkamp [2009], Mackowiak and Wiederholt [2009], Mackowiak and Wiederholt [2010], Matejka [2010] and Paciello and Wiederholt [2011] for applications of rational inattention. See Marschak [1971] for an earlier formulation of similar ideas. 1 no assumption on the nature of informational costs or constraints, so that our model represents a signi…cant generalization of those in the current literature.2 The necessary and su¢ cient conditions for rational inattention that we identify are simple and intuitive. A “no improving action switch” (NIAS) condition ensures that choices are optimal given what was learned about the state of the world (see Caplin and Martin [2011]). A “no improving attention cycles”(NIAC) condition ensures that total utility cannot be raised by reassigning attentional strategies across decision problems. These conditions are robust. One can insist that costs associated with more informative attentional strategies (in the sense of Blackwell) are no lower than those associated with less informative strategies. One can insist on the feasibility of mixed attentional strategies. One can set inattention to be costless. Even with these additional restrictions, the NIAS and NIAC conditions fully characterize rationally inattentive behavior. As with many existing models of rational inattention, our model allows for stochastic choice. The data set we consider consists of “state dependent” stochastic choice data that allows choice probabilities to depend on an underlying state of the world.3 We assume that, while perfectly observable to the researcher, it is potentially costly for a decision maker to learn the true state.4 For example, the econometrician may know whether or not sales taxes are included in stated prices even if consumers do not. State dependent data of this form is readily in the experimental laboratory and in many …eld settings.5 Current models of stochastic choice are predominantly based on assumed randomness in the utility function.6 Matejka and McKay [2011] (henceforth MM) establish an important link between rational inattention and such random utility models. When attention costs are proportionate to Shannon mutual information, demand functions in the rational inattention model are of a generalized logit form.7 Yet there are also important di¤erences, since random utility models treat attentional e¤ort as independent of incentives. In section 5 we dig further into the di¤erences between rational inattention and random utility. On the one hand, we show that random utility models can violate NIAS and NIAC. On the other hand, as again pointed out by MM, rational inattention can lead to violations of a monotonicity condition central to random utility models. Speci…cally, when a newly available act incentivizes additional attention, the resulting knowledge may induce choice of acts that were previously unchosen due to the lower level of attention. Another important distinction arises when acts have “state dependent”dominance properties, so that choice is simple with a known state of the world. Unlike rational inattention theory, standard random utility theory does not allow attentional incentives to impact the quality of the signals on the basis of which decisions are made. As a result, stochastic choice is similarly non-responsive to incentives. In order to test whether behavior can be modeled using rational inattention theory, we experi2 Followings Sims [2003], much of the literature assumes that costs are based on the Shannon mutual information between prior and posterior information states. Woodford [2012] suggests an alternative information cost function, based on the concept of a Shannon capacity, which consistent with certain psychological experiments. Gul et al. [2012] and Ellis [2012] take a di¤erent approach, assuming limits on the partitional structures of information available to the decision maker. 3 While little studied in economics, state dependent stochastic choice data has featured in psychometric experiments on perception dating back at least to Weber (see Murray [1993]). 4 This inverts the interpretation that stochasticity in choice arises from the opposite asymmetry: factors that are unobserved to the econometrician yet observed by the decision maker (Manski [1977]). 5 Other authors have used di¤erent data sets to capture the implications of rational inattention. Ellis [2012] uses state dependent deterministic choice, while Mihm and Ozbek [2012] use choice over menus. 6 See for example Luce and Suppes [1965], Falmagne [1978], McFadden and Richter [1990], Clark [1996], McFadden [2005] and Gul and Pesendorfer [2006]. 7 Caplin and Dean [2013] characterize stochastic choice behavior for a broad class of entropy-based cost functions. 2 mentally elicit state dependent stochastic choice data. We present subjects with a screen containing only red and blue dots, with the proportion of red dots determining the state of the world. They then choose from a set of available acts, the payo¤s to which depend on the state. By repeatedly presenting subjects with the same decision problem, we estimate their probability of choosing each act in each state. Unlike most psychology experiments (for example Navalpakkam and Perona [2009]), our experiment has no time limit, so that subjects can in principle perfectly determine the state of the world. The fact that they choose not to results precisely from their internal trade o¤ between additional input of cognitive e¤ort and time on the one hand and monetary reward on the other. We use our experimental data to conduct two primary sets of tests. First we determine the validity of the NIAS and NIAC conditions. Aggregate data aligns well with both conditions, and losses due to violations are small at the individual level. Second, we look to discriminate between rational inattention theory and random utility theory. We identify qualitative violations of the latter models. Overall, we conclude that rational inattention theory is a reasonable starting point for modeling stochasticity in choice when attentional e¤ort is chosen. Section 2 introduces our formulation of the rational inattention model with an unrestricted cost function. Section 3 derives the testable implications of this general model for state dependent choice data. Section 4 shows that this characterization is unchanged by addition of natural restrictions on the cost function. Section 5 contrasts rational inattention theory with more familiar theories of stochastic choice. Section 6 details our experimental design and presents experimental results. Section 7 relates our work to the prior literature and to further ongoing work. Section 8 concludes. 2 A General Model of Rational Inattention 2.1 Model We consider a decision making environment comprising a set of possible states of the world and a set of acts, the payo¤s of which are state dependent. A given decision problem consists of a prior distribution over states of the world and the subset of acts from which the decision maker (DM) must choose.8 The key assumption underlying models of rational inattention is that the true state of the world is knowable in practice, but obtaining (or processing) information is costly. Thus, prior to choosing what act to take, the DM chooses an attentional strategy. In line with the rational inattention literature, an attentional strategy is described in an abstract manner - as a stochastic mapping from states of the world to subjective posterior information states (equivalently as a sets of signals, the probabilities of which depend on the state of the world). Subject only to Bayes’ rule, the DM is free to choose any attentional strategy they wish, but will face a utility cost of doing so. Having selected an attentional strategy, the DM can condition choice of act only on the subjective state.9 Their payo¤ is then given by the expected utility of the chosen acts less the costs of attention. Figure 1 illustrates an attentional strategy of a …rm that is choosing which of two prices to charge (high or low). Pro…ts depend on the price chosen and the underlying state of the economy, 8 9 Both of which are assumed known to the DM. And possibly an independent randomizing device, by which we allow for mixed strategies. 3 which can be good, medium or bad (G, M and B) with probabilities 61 , 1 2 and 1 3 respectively. Figure 1: An Example of an Attentional Strategy In this example, the …rm has chosen an attentional strategy which produces two subjective states, or signals, R and S. If demand is good they receive signal R for sure, while if demand is bad they receive signal S for sure. If demand is medium, they receive each signal with probability 1 2 . Upon on receiving signal R, the …rm sets prices high. Conditional on receiving this signal. the probability of state G is 25 and of state M is 35 . The expected utility of action H is therefore 25 times the utility that H gives in state G and 53 times the utility that H gives in M . The …rm sets price L upon receipt of signal S, where the conditional probabilities are 47 M and 73 L. The net bene…t to the …rm from this attention strategy is this expected “gross bene…t” of the acts chosen net of attentional costs. Formally, a decision making environment is identi…ed by a …nite set of states of the world, = fm 2 Nj1 m M g and a …nite set of acts F , with F = 2F =; comprising all non-empty f denote subsets. We take as given a state dependent utility function U : F ! R and let Um the utility of act f in state m. De…ne = ( ) as the set of probability distributions over states. Given 2 , m denotes the probability of state m. A decision problem consists of a prior belief 2 over states of the world and a non-empty set of acts A 2 F from which the decision maker (DM) must choose. For any prior , de…ne = fm 2 j m > 0g as its associated support. An attentional strategy maps each state of the world to a probability distribution over subjective information states. Since we will be characterizing expected utility maximizers, we identify a subjective information state with its associated posteriors beliefs 2 . For a particular prior, the set of feasible attention strategies is the set of stochastic mappings from objective to subjective information states that satisfy Bayes’law. De…nition 1 Given 2 ; the set of feasible attentional strategies comprises all mappings : ! ( ) that have …nite support ( ) and that satisfy Bayes’ law, so that for all m 2 4 and 2 ( ), m = m m( M X ) j j( , ) j=1 where m( ) (m)(f g). De…ne =[ . 2 Note that m ( ) can be interpreted as the probability of information state conditional on true state m. Note that one can identify attentional strategies also as compound lotteries or temporal lotteries as analyzed in models that capture prefences over the timing of the resolution of uncertainty (Kreps and Porteus [1978]). Rational inattention assumes that subjects choose attentional strategies to maximize gross payo¤s net of information costs. The gross payo¤ associated with using an attention strategy 2 in decision problem ( ; A) is calculated assuming that acts are chosen optimally at each posterior state. Let G : F ! R denote the gross payo¤ of using a particular information strategy in a particular decision problem: " M # X X G( ; A; ) = m m ( ) g( ; A) 2 ( ) where g( ; A) = max f 2A m=1 M X f m Um : m=1 An attentional cost function maps priors and attentional strategies to the corresponding level of disutility. We allow costs to be in…nite for some strategies for two reasons. As a technical convenience, it enables us to ignore the Bayesian constraint by setting the cost of strategies inconsistent with the speci…ed prior to be in…nitely high.10 More importantly, it means that our model nests those that rule out certain types of information acquisition - for example by putting a hard limit on the mutual information between priors and posteriors (Sims [2003]), by allowing only partitional information structures (Ellis [2012]), or by allowing only normal signals (Mondria [2010]). These models can be accomodated in our framework by setting to in…nity the cost of infeasible strategies. To avoid triviality we assume that feasible attention strategies exist for all priors. De…nition 2 An attentional cost function is a mapping K : ! R with K( ; ) = 1 for 2 = and K( ; ) 2 R for some 2 . We let K denote the class of such functions. We make the standard assumption that attention costs are additively separable from the prizebased utility derived from the acts taken. We let ^ : K F ! map cost functions and decision problems into rationality inattentive strategies. These are the strategies that maximize gross payo¤s minus attention costs, ^ (K; ; A) = arg sup fG( ; A; ) 2 10 As in Mihm and Ozbek [2012]. 5 K( ; )g : 2.2 State Dependent Stochastic Choice Data The data that we use to test the model of rational inattention is state dependent stochastic choice data. For a given decision problem ( ; A) we observe the probability of choosing each act f 2 A in each state of the world m 2 . It is the fact that we observe choice probabilities in each state that di¤erentiates this from standard stochastic choice data. Formally, let q : ! (A) denote a state dependent stochastic choice function for a decision problem ( ; A), and let Q be the set of such mappings, with Q = [ 2 Q . qm (f ) is the probability of the DM choosing act f in state m. We denote as F (q) F the set of acts chosen with non-zero probability in some state of the world under stochastic choice function q, F (q) = ff 2 Ajqm (f ) > 0 for some m 2 g We assume that data has been gathered on some …nite set D of decision problems: De…nition 3 A state dependent stochastic choice data set is a pair (D; q) with D and q : D ! Q, with q ( ;A) 2 Q and F ( ;A) F q ( ;A) A. F …nite Data of this general form is standard in psychometric research and is gathered in the experiment detailed in section 6. The main theoretical aim of the paper is to characterize the conditions under which state dependent stochastic choice data is consistent with rational inattention. By this we mean that we can …nd an attentional cost function and resulting optimal attentional strategy that would generate the pattern of stochastic choice that we in fact observe. De…nition 4 Given decision problem ( ; A) 2 F, an attentional strategy with q 2 Q if there exists a choice strategy Y : ( ) ! (A), such that: 2 is consistent 1. Final choices are optimal: Letting Y (f ) denote the probability of choosing act f at Y (f ) > 0 some 2 ( ) =) M X m=1 f m Um M X m=1 g m Um 2 , all g 2 A. 2. The attention and choice functions match the data, X qm (f ) = m ( )Y (f ): 2 ( ) Note that we do not restrict the DM to pure strategies at any posterior. However, any act they choose with strictly positive probability must be utility maximizing given beliefs. ~ ~ ) if there exists De…nition 5 Data set (D; q) has a rational inattention representation (K; ~ ^ ~ K 2 K and ~ : D ! such that ~ ( ; A) 2 (K; ; A) and is consistent with q ( ;A) for all ( ; A) 2 D. 6 The existence of such a representation implies that the cost function is well-behaved, in the sense that an optimal attentional strategy exists. Throughout this paper, we assume that the DM’s expected utility function and prior beliefs over objective states are known to the researcher - only attention costs are not directly observable. This assumption is in line with the focus of the paper, but is not central to our approach. By enriching the data set, we could recover beliefs and preferences from the choice data of the DM, and use these as a starting point for our representation. In order to recover utility, we could replace the “Savage style”acts we use in this paper (which map deterministically from states of the world to prizes) with “Anscombe-Aumann” acts that map states of the world to probability distributions over the prize space. Assuming the DM does maximize expected utility, U could then be recovered by observing choices over degenerate acts (i.e. acts whose payo¤ are state independent).11 If we further add to our data set the choices of the DM over acts before the state of the world is determined (or at least in a situation in which they cannot exert any e¤ort to determine that state) then we can also recover the DM’s prior over objective states (again assuming expected utility maximization). 3 The Characterization The main theorem of this section establishes two intuitive conditions as necessary and su¢ cient for (D; q) to have a rational inattention representation. We use simple examples to illustrate the role of these conditions before stating the main theorem, the proof of which is in appendix A. The …rst condition - which establishes optimality of …nal choice given an attentional strategy - applies to each decision problem separately. The second - which establishes optimality of the attentional strategy - applies to all decision problems that share a speci…c prior. 3.1 Minimal Attentional Strategies The key to our approach is the observation that, if a DM is rationally inattentive, then one can learn much about their attentional strategy from state dependent stochastic choice. To begin with, one can identify the average posterior beliefs that a subject must have had when choosing each act. De…nition 6 Given q 2 Q and 2 , de…ned the revealed posteriors r( ( ;q) rm (f ) = m qm (f ) M X ;q) : F (q) ! by, : j qj (f ) j=1 If the DM chooses each act in at most one subjective state then the revealed posteriors will in fact be the posteriors that de…ne their attentional strategy. If they choose the same act in more than one subjective state then the revealed posterior will be equivalent to the weighted average of beliefs across all posteriors at which that act is chosen. 11 This is the approach taken by Ellis [2012]. Caplin and Martin [2011] present an alternative approach that allows for an unknown utility function. 7 We can use the revealed posteriors to construct a possible attentional strategy for each decision problem. We do so by treating the revealed posteriors as identifying all possible posteriors the attentional strategy can produce. De…nition 7 Given satisfy, 2 and q 2 Q, de…ne the minimal attentional strategy ( ;q) m ( X )= ff 2F (q)jr( ;q) (f )= for all m 2 ( ;q) 2 to qm (f ); g . If a rationally inattentive DM chose each act in at most one subjective state, and did not use mixed strategies at any posterior, then the minimal attentional strategy would be their true strategy. More generally, any attentional strategy consistent with the data must be more informative than the minimal attentional strategy, in the sense of statistical su¢ ciency. Intuitively, this means that the minimal attentional strategy can be obtained by “adding noise”to the true attentional strategy. De…nition 8 Attentional strategy 2 is su¢ cient for attentional strategy 2P (equivalently ij = 1 is a garbling of ) if there exists a j ( )j j ( )j stochastic matrix B 0 with j2 ( ) b all i and such that, for all j 2 ( ) and m 2 , X j bij m ( i ): m( ) = i2 ( ) Lemma 1 establishes that any consistent attentional strategy must be su¢ cient for the minimal attentional strategy. Lemma 1 Given decision problem ( ; A) 2 data, then is su¢ cient for ( ;q) : F and q 2 Q , if 2 is consistent with these Blackwell’s theorem establishes the equivalence of the statistical notion of “more informative than”(su¢ ciency) and the economic notion “more valuable than”. If attentional strategy is su¢ cient for strategy , then it yields (weakly) higher gross payo¤s in any decision problem. See Cremer [1982] for a statement and simple proof of Blackwell’s theorem. This result plays a signi…cant role in our characterization. Remark 1 Given decision problem ( ; A) 2 F and ; 2 G( ; A; ) 3.2 with su¢ cient for , G( ; A; ): No Improving Actions Switches Our …rst condition ensures that the choice of act at any given posterior is optimal. Consider the trivial case in which only one decision problem ( ; A) is observed. Suppose that there are two states 8 with 1 = 2 = 0:5, two available acts, A = ff; gg, and state dependent payo¤s (U1f ; U2f ) = (0; 10) and (U1g ; U2g ) = (20; 0). Finally suppose that the behavioral data set speci…es: 1 q1 (f ) = q1 (g) = ; 2 2 1 q2 (f ) = ; q2 (g) = : 3 3 One can readily con…rm that there is no attentional strategy consistent with this behavioral data. This is because, at the revealed posterior when f is chosen, it would be optimal to choose g: The posterior probability that the true state is 1 when f is chosen is 37 . Given these beliefs, the 3 4 40 payo¤ of taking act g is 37 :20 + 47 :0 = 60 7 , while the payo¤ of act f is 7 :0 + 7 :10 = 7 . If the minimal attention strategy was in fact employed, the revealed posterior associated with each act would be the only posterior at which this act was chosen. In this case existence of such an improving switch would be a direct violation of the rationally inattentive model. With a more general attention strategy, rational inattention implies that f must be weakly preferred to g at each state in which f is chosen. This means that f must also be weakly preferred to g at the weighted average of these posteriors which, by the requirement of consistency, forms the revealed posterior.12 The NIAS condition rules out cases in which there are improving switches of this form. It speci…es that, when one identi…es in the data the revealed posterior associated with any chosen act, this act must be optimal at that posterior. Condition D1 (No Improving Action Switches) Data set (D; q) satis…es NIAS if, for every ( ; A) 2 D and f 2 F ( ;A) , M X ( ;q) f rm (f )Um m=1 M X ( ;q) g rm (f )Um ; m=1 ( ;q) all g 2 A, where rm (f ) is the revealed posterior belief of state m when f is chosen in decision problem ; A 2 D 3.3 No Improving Attention Cycles Our second condition restricts choice of attentional strategy across decision problems that share the same prior. Essentially, it cannot be the case that the total gross utility can be increased by reassigning attentional strategies across decision problems that share the same prior. The following example illustrates a violation of this condition. Consider again the decision problem above with two equiprobable states and two available acts, A = ff; gg, and with the state dependent payo¤s, (U1f ; U2f ) = (0; 10); (U1g ; U2g ) = (20; 0): 12 See the proof of theorem 1 for details of this argument. 9 Suppose now that the observed choice behavior is as follows (using the choice set A as the identifying superscript), 2 =1 3 1 =1 3 q1A (f ) = q2A (f ) = q1A (g); q2A (g): Now consider a second decision problem di¤ering only in that the act set is B = ff; hg, with (U1h ; U2h ) = (10; 0), with the corresponding state dependent data set, 3 =1 4 1 =1 4 q1B (f ) = q2B (f ) = q1B (g); q2B (g): The speci…ed data looks problematic with respect to rational inattention. Act set A provides greater reward for discriminating between states, yet the DM is more discerning under act set B. To crystallize the resulting problem, note that, for behavior to be consistent with rational inattention for some cost function K it must be the case that, G( ; A; A G( ; B; B ) ) K( ; A K( ; B ) ) G( ; A; B G( ; B; A ) ) K( ; B ); K( ; A ): While we do not observe attentional strategies directly, it is immediate that G( ; i; i ) = G( ; i; i ) for i 2 fA; Bg. Furthermore, as i is su¢ cient for i , Blackwell’s theorem tells us that G( ; i; j ) G( ; i; j ) for i:j 2 fA; Bg (see Remark 1): Thus we can insert the minimal attention strategies in the calculation of gross bene…ts and the above inequalities must still hold. Substituting and combining the two conditions therefore yields, G( ; A; A ) + G( ; B; B ) G( ; A; B ) + G( ; B; A ); indicating that total gross bene…t across the two decision problems must be maximized by the assignment of minimal attention strategies observed in the data. Plugging in the minimal attentional strategies from our example data, we …nd that G( ; A; A ) + G( ; B; B ) = 17 21 , while 11 G( ; A; B ) + G( ; B; A ) = 17 12 . Thus, there is no cost function that can be used to rationalize this data The following general assumption ensures that there are no such cycles of gross utility improving strategy switches. Condition D2 (No Improving Attention Cycles) Data set (D; q) satis…es NIAC if, for any 2 and any set of decision problems ( ; A1 ); ( ; A2 ); :::::( ; AK ) 2 D with AK = A1 , K X1 G( ; Ak ; k ) k=1 where k = K X1 k=1 [ ;q( ;Ak )] . 10 G( ; Ak ; k+1 ); 3.4 Characterization Our main result is that NIAC and NIAS together are necessary and su¢ cient for a data set to have a rational inattention representation. Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es NIAS and NIAC. The key step in the proof that the NIAS and NIAC conditions are su¢ cient for (D; q) to have a rational inattention representation is connecting the model with the linear allocation problem analyzed by Koopmans and Beckmann [1957]. The cost function that we introduce is based on the shadow prices that decentralize the optimal allocation in that model. Our result is also strongly related to rationalizability conditions for quasi-linear preferences in the mechanism design literature (see Rochet [1987]). 4 Monotonicity, Mixtures and Normalization Theorem 1 states only that, if NIAS and NIAC hold, we can …nd some attentional cost function that rationalizes the data. No restrictions are placed on the form of the function. In this section we consider three natural restrictions on attentional cost functions: weak monotonicity with respect to su¢ ciency; feasibility of mixed strategies; and costless inattention. In principle these restrictions might place further conditions on stochastic choice data for it to be rationalizable, because they imply that costs of unchosen strategies may be constrained by those assigned to chosen strategies. Theorem 2 establishes that this is not the case: if state dependent stochastic choice is rationalizable, then it is rationalizable by a cost function that satis…es these three conditions. 4.1 Weak Monotonicity A partial ranking of the informativeness of attentional strategies is provided by the notion of statistical su¢ ciency (see de…nition 8). A natural condition for an attentional cost function is that more information is (weakly) more costly. Condition K1 K 2 K satis…es weak monotonicity in information if, for any ; 2 with su¢ cient for , K( ; ) 2 and K( ; ): Free disposal of information would imply this property, as would a ranking based on Shannon mutual information (see also Mihm and Ozbek [2012] and Ming [2013]).13 13 While in many ways intuitively attractive, this assumption may not be universally valid. In a world with discrete signals it may be very costly or even impossible to generate continous changes in information. Moreover the DM may be restricted to some collection of partitions [Ellis 2012, Gul et al. 2011] in which case less informative structures are essentially disallowed. It may not be possible to automatically and freely dispose of information once learned. 11 4.2 Mixture Feasibility In addition to using pure attentional strategies, it may be feasible for the DM to mix these strategies using some randomizing device. De…nition 9 Given + (1 ) 2 , attentional strategies ; 2 2 is de…ned by, m( all 1 m M and )= m( ) + (1 , and ) m( 2 [0; 1], the mixture strategy ); 2 ( ) [ ( ). The de…nition implies that the mixing is not of the posteriors themselves, but of the odds of the given posteriors. To illustrate, consider again a case with two equiprobable states. Let attentional strategy be equally likely to produce posteriors (:3; :7) and (:7; :3), with equally likely to produce posteriors (:1; :9) and (:9; :1). Then the mixture strategy 0:5 + 0:5 is equally likely to produce all four posteriors. A natural assumption is that a DM can choose to mix between two attentional strategies in this way, and pay the expected cost of such a mixture: for example, they could construct a strategy which involved ‡ipping a coin, then choosing strategy if the coin comes down heads and strategy if it comes down tails. In expectation the cost of this strategy would be half that of and half that of . Allowing such mixtures puts an upper bound on the cost of the strategy 0:5 + 0:5 . However, it does not pin down the cost precisely, because we do not rule out the possibility that there is a more e¢ cient way of constructing the mixed attentional strategy. Condition K2 Mixture Feasibility: for all 2 (0; 1), the cost of the mixture strategy K( ; ) 4.3 2 = and for any two strategies ; 2 + (1 ) 2 satis…es, K( ; ) + (1 and )K( ; ): Normalization It is typical in the applied literature to allow inattention at no cost, and otherwise to have costs be non-negative. Given weak monotonicity, non-negativity of the entire function follows immediately if one ensures that inattention is costless. Condition K3 Given 2 , de…ne I 2 as the strategy in which m ( ) = 1 for 1 m M . Attentional cost function K 2 K satis…es normalization if it is non-negative where realvalued, with K(I ) = 0 all 2 . 4.4 Theorem 2 Theorem 2 states that, whenever a rational inattention representation exists, one also exists in which the cost function satis…es conditions K1 through K3. Whatever one thinks of the above assumptions on intuitive grounds, even if any one or all of them are in fact false, any data set that can be rationalized can equally be rationalized by a function that satis…es all of these conditions. 12 Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention representation with conditions K1 to K3 satis…ed. This result has the ‡avor of the Afriat characterization of rationality of choice from budget sets, which states that choices can be rationalized by some utility function if and only if they can be rationalized by a non-satiated, continuous, monotone, and concave utility function. Note that the representation need not be unique. Conditions for recoverability are left for future research. 4.5 No Strong Blackwell Not all restrictions on the form of the cost function can be so readily absorbed as K1 through K3. For example, there are data sets satisfying NIAS and NIAC yet for which there exists no cost function that produces a rational inattention representation while respecting strict monotonicity, whereby, if is su¢ cient for 0 but 0 is not su¢ cient for , then K( ; ) > K( ; 0 ). A simple example with data on only one decision problem in which there are two equally likely states illustrates that one cannot further strengthen the result in this dimension. Suppose that there are three available acts A = ff; g; hg with corresponding utilities, (U1f ; U2f ) = (10; 0) ; (U1g ; U2g ) = (0; 10) ; (U1h ; U2h ) = (7:5; 7:5) : Consider the following state dependent stochastic choice data in which the only two chosen acts are f and g, 3 q1 (f ) = q2 (g) = = 1 q1 (g) = 1 q2 (f ): 4 Note that this data satis…es NIAS; given posterior beliefs when f is chosen, f is superior to g and indi¤erent to h, and when g is chosen it is superior to f and indi¤erent to h. It trivially satis…es NIAC since there is only one decision problem observed. We know from theorem 2 that is has a rational inattention representation with the cost of the minimal attention strategy K ( ; ) 0 and that of the inattentive strategy being zero, K( ; I ) = 0. Note that is su¢ cient for I but not vice versa, hence any strictly monotone cost function would have to satisfy K ( ; ) > 0. In fact it is not possible to …nd a representation with this property. To see this, note that both strategies have the same gross utility, G( ; A; ) = 1 2 3 4 10 + 1 2 3 4 10 = 1 7:5 = G( ; A; I ); where we use the fact that the inattentive strategy involves picking act h for sure. In order to rationalize selection of the inattentive strategy, it must therefore be that is no more expensive than I , contradicting strict monotonicity. 5 Rational Inattention and Random Utility In this section we compare the rational inattention model with alternative models in which stochasticity in choice stems from randomness in the utility function (e.g. McFadden [1974], Loomes and 13 Sugden [1995]).14 Such random utility models (RUMs) take as given a probability measure over some family of utility functions. Prior to making a choice, one utility function gets drawn from this set. The DM then chooses to maximize this utility function.15 There is a …rst order di¤erence between standard RUMs and the rational inattention model. While the latter produces choice behavior that di¤ers across states of the world, the former typically conditions out all observable states, as a result giving rise to state independent choice (e.g. Falmagne [1978], McFadden and Richter [1990], Clark [1996], McFadden [2005] and Gul and Pesendorfer [2006]). In this section we consider three distinct approaches to translating RUMs to settings in which there is an underlying state that is observable to the econometrician. We begin by considering an “uninformed”RUM in which the DM gathers no information other than their prior belief about the state of the world. We then consider the other extreme case of a “fully informed” RUM in which the DM knows the true state perfectly. Finally, we consider a “partially informed” RUM in which the DM receives an exogenous signal about the state of the world before maximizing their randomly selected utility function. All three RUM variants are distinct from the rational inattention model, in the sense that there is behavior which is consistent with rational inattention but not the RUM and vice versa. As formalized below, the Uninformed and Perfectly Informed RUMs place fundamental restrictions on the data that are not implied by rational inattention. The case of the Partially Informed RUM is more subtle, but (along with the other two variants) it implies a monotonicity property which is not required by rational inattention. In the interests of parsimony, we relegate speci…c examples of RUMs that violate NIAC and NIAS to appendix B. In section 6.6 we discuss how our experimental data helps us to di¤erentiate between rational inattention and RUMs. 5.1 Uninformed RUM Consider a DM who chooses according to a RUM without learning anything beyond the prior about the true state of the world. To maintain generality, de…ne the class of utility functions U to be all functions : F ! R, with ( ; f ) the utility assigned by 2 U to act f 2 A at prior 2 . Note that random expected utility (in the manner of Gul and Pesendorfer [2006]) is a special case in which the utility can be computed by weighting some underlying utilities on a prize space according to the prior probabilities. Letting denote the probability measure over U, a data set (D; q) generated by a prior only RUM is de…ned by, ( ;A) qm (f ) = f 2 Uj ( ; f ) > ( ; g) 8 g 2 Ag ; 16 for every ( ; A) 2 D and m 2 . The key behavior that is allowed by rational inattention but not by the Uninformed RUM is state dependence: in this prior only formulation, the stochastic choice function is state invariant, because the beliefs of the agent are state invariant. Clearly rational inattention allows for state dependence in the choice function. 14 Random choice models have been used in psychology since the work of Thurstone [1927] and Zermelo [1929]. In the case of choice over lotteries, the family of utility functions can be over the lotteries themselves or, following Gul and Pesendorfer [2006] over the underlying prize space, with the utility of a lottery equal to its expectation according to the selected utility function. 16 In the case of utility functions in which the utility maximizing elements of A are not unique, a tie breaking rule is needed. See Gul and Pesendorfer [2010]. 15 14 On the other hand, as we demonstrate in appendix B, the Uninformed RUM allows for violations of NIAS. This is obvious from the fact that such a DM has a single posterior belief, yet chooses many acts. The Uninformed RUM will, however, generate data that is consistent with NIAC: as the DM is always uninformed, their choice of attentional strategy can be rationalized by a cost function that puts arbitrarily high costs on any informative attention strategy. 5.2 Perfectly Informed RUM An alternative and possibly more natural interpretation of the RUM is that the DM is fully informed about the state of the world, then chooses in each state according to a RUM. Letting m 2 be the degenerate probability distribution on state m, a Perfectly Informed RUM would generate data of the following form: ( ;A) qm (f ) = f 2 Uj ( m; f ) > ( m ; g) 8 g 2 Ag : MM identify an intriguing connection between the Perfectly Informed RUM and rational inattention. One heavily used variant of the RUM is the logit model, which assumes that is constructed using a …xed “base”function 2 U which is then perturbed by an error term distributed according to an extreme value type-I distribution. Applying this assumption to the Perfectly Informed RUM leads to a state dependent stochastic choice function of the form, e ( ;A) qm (f ) = P ( m ;f ) g2A e ( m ;g) : MM show that one can provide an analogous characterization of the state dependent stochastic choice associated with the rational inattention when attentional costs are linear in Shannon mutual information, ( ;A) q~m (f ) =P Pfe f Um k g g2A P e g Um k ; where P f is the unconditional probability of choosing act f and k > 0 scales the disutility of attention relative to prizes. Thus, rational inattention with Shannon costs looks in certain respects like a state dependent logistic RUM. Despite the apparent similarity, there are major qualitative di¤erences between a Perfectly Informed RUM and the rational inattention model. As the above formula shows, the Perfectly Informed RUM implies that the probability of choosing an act in a particular state depends only on its payo¤ in that state, and is independent of the payo¤s of any act in other states of the world and of prior beliefs. For rationally inattentive models, this is generally not the case as the DM will choose not to be perfectly informed about the state, which implies that choice probabilities will be impacted by the relative performance of all acts in all states as well as prior beliefs. In the MM formulation, this dependence is re‡ected in the unconditional probabilities P f . In appendix B we show how Perfectly Informed RUMs can violate both NIAS and NIAC. In the former case this is because the DM can still choose inferior options with positive probability under the RUM. In the latter, it is because the perfectly informed RUM involves the DM acquiring a large amount of information even when that information is not instrumental for choice. 15 5.3 Partially Informed RUM, Monotonicity, and Stochastic Dominance The third and …nal RUM we consider involves a DM who receives an exogenous and imperfect signal about the state of the world before a utility function is randomly drawn. In formal terms, starting with prior 2 A, the DM receives a …xed informative signal : ! ( ) that determines the information at the point that the random utility function is drawn. In this case the generated data set for any ( ; A) 2 D will be of the form. X qm (f ) = m ( ) f 2 Ujv( ; f ) > ( ; g) 8 g 2 Ag ; 2 ( ) where ( ) is the set of possible posteriors associated with signal . The Partially Informed RUM does not exhibit either of the obvious restrictions on the data that we have so far discussed: choices can vary with the underlying state, and the payo¤s of an act in one state can a¤ect behavior in another. This makes di¤erentiating between the Partially Informed RUM and rational inattention more subtle. We pursue two approaches in the experiment that follows, as now detailed. Our …rst approach involves testing a simple monotonicity axiom that is an essentially universal feature of all RUMs, including those in which signals are received prior to making a decision. This axiom states that the addition of a new act to the set of available choices cannot increase the probability that one of the pre-existing options will be chosen (Gul and Pesendorfer [2006], see also Luce and Suppes [1965]). Monotonicity Axiom Given ( ; A) 2 F, h 2 F nA and m 2 , ( ;A) qm (f ) ( ;A[h) qm (f ): As MM show, the rational inattention model can lead to robust violations of monotonicity. Their canonical example involves two states of the world, with prior 1 on state 1. There are three acts that may be available, with the following payo¤s in expected utility units, U1f ; U2f = (0; 1) ; (U1g ; U2g ) = 1 1 ; 2 2 ; and U1h ; U2h = (Y; Y ) ; where Y > 0. Assuming information costs based on Shannon mutual information, MM show that it is optimal to pay no attention and choose the safe act g when only f and g are available, provided 1 is high enough. It is simply too expensive to bother trying to overturn the prior. However, with h available also, it becomes more important to learn the true state - increasingly so the higher is Y: The rationally inattentive agent may therefore select a more informative attention strategy. If this learning suggests to the DM that state 2 is more likely, then it is optimal to choose act f , producing a violation of monotonicity. In section 6.6 we report on an experiment designed to capture the intuition of the MM example. The second method for distinguishing the Partially Informed RUM from the rational inattention model involves identifying cases in which all utility functions have the same ranking. This can produce patterns whereby stochastic choice fails to respond to attentional incentives. To make this precise, note that in many re…nements of the general RUM (such as the random expected utility model of Gul and Pesendorfer [2006]), all possible utility functions have the property that 16 choice between stochastically ranked alternatives is deterministic. Moreover, many simple decision problems generate posteriors in which acts are always so ranked. Consider simple cases with two states and two acts ff; gg with variation in the incentive to learn: (U1f ; U2f ) = (0; Y ) and (U1g ; U2g ) = (Y; 0) for Y > 0. For any posterior belief 1 > 0:5, act f stochastically dominates g, while for 1 < 0:5, act g dominates f , regardless of the reward value, Y: Thus, if all utility functions obey stochastic dominance, any randomness in choice must be due to the signal which, by assumption, is invariant to Y . Thus data generated by a Partially Informed RUM should not respond, for example, to doubling the value of Y . In general we would expect a rationally inattentive agent to respond to such changes. 6 An Experimental Test of Rational Inattention We introduce an experiment that produces state dependent stochastic choice data at the subject level. We use the resulting data to implement our axiomatic tests of the rational inattention model. One goal of the experimental design is to provide clear separation between the predictions of the rational inattention model and standard random utility models. The experiments indeed con…rm that it is possible to discriminate between models. 6.1 Design Overview In a typical question in the experiment, a subject is shown a screen on which there are displayed 100 balls, some of which are red and some of which are blue. The state of the world is determined by the number of red balls on the screen. Prior to seeing the screen, subjects are informed of the probability distribution over such states. Having seen the screen they choose from a number of di¤erent acts whose payo¤s are state dependent. A decision problem is de…ned by this prior information and the set of available acts, as it is in section 2.1. A subject faces each decision problem 50 times, allowing us to approximate their state dependent stochastic choice function. In any given experiment, the subject faces 4 di¤erent problems. All occurrences of the same problem are grouped, but the order of the problems is block-randomized. There are several things to note about our experimental design. First there is no external limit (such as a time constraint) on any subject’s ability to collect information about the state of the world. If they so wished, subjects could perfectly determine the state on each question - a very small number of subjects do just this. We are therefore not studying limits to any subject’s perceptual ability to determine the state, as is traditional in many psychology experiments. At the same time, there is no extrinsic cost to the subject of gathering information. Therefore the extent to which subjects fail to discern the true state of the world is due to their unwillingness to trade cognitive e¤ort for monetary reward. Second, in order to estimate the state dependent stochastic choice function we treat the 50 times that a subject faces the same decision making environment as 50 independent repetitions of the same event. To prevent subjects from learning to recognize patterns, we randomize the position of the balls. The implicit assumption is that the perceptual cost of determining the state is the same for each possible con…guration of balls. We discuss evidence for order e¤ects in our results in section 6.3. Third, we assume that utility is a linear function of money. This may not be the case if subjects 17 are risk averse over the amounts available in this experiment. One approach to this problem would be to measure risk aversion using the multiple price list method of Holt and Laury [2002], then to use the estimated curvature to assign utility numbers to monetary prizes. We make use of this approach in Caplin and Dean [2013] when estimating the elasticity of information acquisition with respect to rewards. In the current context, the rational inattention model does a good job of explaining the data even without accounting for possible risk aversion. 6.2 Description of Experiments We run four di¤erent experiments. The …rst three are primarily designed to test NIAS and NIAC, while the fourth is designed to test for violations of monotonicity as described in section 5.3. Experiments 1 and 2 consist of decision problems with two states (48 and 52 red balls), a prior belief of 1 = 0:5 and two acts ff; gg. Act f is always superior in state 48 while g is superior in state 52. Experiment 1 examines the e¤ect of asymmetric changes in reward - changing only the value of choosing the correct option in state 48, while experiment 2 examines the e¤ect of symmetric changes in reward. Tables 1 and 2 describes the decision problems in these experiments (payo¤s are in US$). Table 1: Experiment 1 Decision Prior Payo¤s f Problem u48 uf52 ug48 1 1 0.5 10 0 0 2 0.5 20 0 0 3 0.5 10 0 5 4 0.5 30 0 0 Table 2: Experiment 2 Decision Prior Payo¤s f Problem u48 uf52 ug48 1 5 0.5 10 0 0 6 0.5 2 0 0 7 0.5 20 0 0 8 0.5 30 0 0 ug52 10 10 10 10 ug52 10 2 20 30 Experiment 3 studies whether subjects can adjust the states between which they more …nely di¤erentiate based on the available rewards. All decision problems in this experiment involve four equally likely states (29, 31, 69 and 71 red balls). There are four decision problems with two possible acts (d and e). The decision problems di¤er according to whether it is important to di¤erentiate between 29 and 31 or 69 and 71 red balls. Decision Problem 9 10 11 12 Table 3: Experiment 3 Payo¤s ud29 ud31 ud69 ud71 ue29 1 0 10 0 0 10 0 1 0 0 1 0 1 0 0 10 0 10 0 0 ue31 1 10 1 10 ue69 0 0 0 0 ue71 10 1 1 10 Experiment 4 performs a test of monotonicity inspired by the example of MM. Once again there are two equiprobable states (49 and 51 red balls). In decision problem 13 the DM must choose between a safe option a which pays out 23 in each state and an option b which pays out 25 if there are 51 red balls, but only 20 if there are 49. Decision problem 14 introduces a third alternative c which pays out 30 if there are 49 red balls and 10 if there are 51 red balls. Decision problems 15 and 16 increase the payo¤ of act c in state 49 but decrease it in state 51, increasing 18 the value of attention. Table 4: Experiment 4 Prior Payo¤s a a b Decision Problem u u u ub51 49 49 51 49 13 0.5 23 23 20 25 14 0.5 23 23 20 25 15 0.5 23 23 20 25 16 0.5 23 23 20 25 uc49 n/a 30 35 40 uc51 n/a 10 5 0 Each experiment was run on between 23 and 33 subjects.17 Each subject answers 200 questions as well as 1 practice question. At the end of the experiment, one question is selected at random for payment, in addition to a show up fee of $10. 6.3 Overview of Data In this section, we provide an overview of the data for experiments 1 and 2. In these experiments there are only ever two acts available and two states, and the correct act to take in each state is clearly de…ned. This allows us to establish a number of key features of our data. First, subjects make a signi…cant number of mistakes: they chose the wrong act on 35% of the trials overall. Second, subjects make signi…cantly di¤erent choices in the two di¤erent states - that is they do make use (at least partially) of the information available to them. Averaging across all individuals and decision problems, act f was chosen 68% of the time when the true state was 48 red balls and 38% of the time when there were 52 red balls (the hypothesis that choice behavior is the same in both states can be rejected at the 0.001% level).18 These patterns hold true at the individual level. Of the 62 subjects that took part in experiment 1 and 2, only 6% made mistakes in less than 10% of questions, while 84% had choice behavior that was signi…cantly di¤erent between the two states at the 10% level. These results suggest that our subjects are absorbing some information about the state of the world, but are not fully informed when they make their choice. We also use the overview data to test for order e¤ects due to, for example, learning or fatigue. Averaging over decision problems, the percentage of correct responses in blocks 1-4 was 68%, 65%, 66%, and 65% respectively. Regressing a binary variable indicating whether the correct choice was made on decision problem dummies and a dummy indicating the order in which the treatment was seen by the subject suggests that these di¤erences are not signi…cant: A test of the linear restriction that the block dummies are simultaneously equal to zero fails to reject the null hypothesis at the 17% level. We ignore order e¤ects in the remaining analysis. 6.4 Testing NIAS In the two state, two act, set up of experiments 1 and 2, NIAS implies existence of a cuto¤ posterior probability of state 48 that determines the optimal act. For higher such posteriors, act f is chosen, 17 29 subjects took part in experiment 1, 33 in experiment 2, 24 in experiment 3 and 23 in experiment 4. Each subject took part only in one experiment. 18 Estimated using a linear probability model with individual-level …xed e¤ects and standard errors clustered at the individual level. All statistical tests reported use this method. 19 while for lower posteriors, act g is chosen. Tables 5 and 6 summarizes these cuto¤s, and the extent to which posteriors that are revealed in the experiments are consistent with them.19 Decision Problem 1 2 3 4 Table 5: Experiment 1 Aggregate % Subjects f g Cuto¤ Rational 48 48 50% 67% 31% 90 33% 58% 34% 59 67% 78% 46% 82 25% 61% 21% 76 Decision Problem 5 6 7 8 Table 6: Experiment 2 Aggregate % Subjects f g Cuto¤ Rational 48 58 50% 62% 34% 82 50% 63% 33% 85 50% 66% 30% 85 50% 68% 32% 88 These tables show that subjects in this experiment by and large satisfy the NIAS conditions. The aggregate data (treating all data as if it was generated by a single subject) satis…es NIAS in all but one case (the probability of state 1 is 1% too high when g is chosen in problem 2. While there are some violations at the individual level, the losses associated with these violations are low. Figure c1 in appendix C shows the distribution of costs of NIAS failures for each subject (i.e., for each subject it calculates the actual expected value of their choice minus the expected value of the optimal choices given their posterior beliefs) in experiments 1 through 4. As a benchmark, these losses are compared to those that would have been observed from a population of decision makers choosing at random.20 The use of random benchmarks has been discussed by, for example, Beatty and Crawford [2011]. In each case, the observed distribution is signi…cantly di¤erent from the simulated distribution at the 0.01% level. Experiment 3 provides a somewhat di¤erent test of NIAS. In this experiment, act d is always superior in states 29 and 69 and act e is superior in state 31 and 71. For it to be optimal to choose option d, it must be the case that, ud29 29 + ud69 ue31 69 31 + ue71 71 : As, in experiment 3, ud29 = ue31 and ud29 = ue31 , and assuming that when d is chosen must be the case that, for d to be optimal, 71 69 29 31 29 > 31 , it ud29 : ud69 In fact, this condition is su¢ cient to ensure that the choice of act e is also rational.21 Table 7 shows the values of this cuto¤ for each decision problem, and the extent to which aggregate and 19 In order to calculate posterior beliefs we combine the conditional probabilities of choosing each act from each state estimated from the data with prior beliefs about the likelhood of each state, rather than the empirical likelihood of each state. 20 The procedure to construct the random behavior is as follows: for each decision problem and for each state, a random number is drawn for each available act. The probability of choosing each act from that state is then calculated as the value of the random number associated with that act over the sum of all random numbers. 21 This follows from the fact that d 71 d 29 = (1 (1 d 69 d 31 = 71 (d)) 29 (d)) 71 71 (d) 69 69 (d) P (d) 29 29 (d) P (d) 31 31 (d) (1 (1 P (d) = P (d) 69 (d)) 31 (d)) 20 = 71 (d) 69 (d) 29 (d) 31 (d) 71 (e) 69 (e) 29 (e) 31 (e) = e 71 e 29 e 69 e 31 individual level data is consistent with NIAS Table 7: Experiment 3 Aggregate Decision Problem d 29 d 71 Cuto¤ 1 10 9 10 11 12 b [0; 40%] [0; 40%] [0; 40%] [0; 40%] 13 14 15 16 % subjects rational -1.3 -0.6 -0.9 -0.8 10 1 1 DP d 31 d 69 88 96 80 92 Table 8: Experiment 4 Range 1 a c [40%; 100%] n/a [40%; 65%] [65%; 100%] [40%; 60%] [60%; 100%] [40%; 57:5%] [57:5%; 100%] Aggregate b 1 a 1 c 1 49 44 43 41 50 50 50 51 63 64 65 Experiment 4 provides a sterner test of NIAS because generally there are more acts available to the subject. For example, in decision problem 10, there are 3 available acts, and 3 regions of the posterior probability space in which each of the di¤erent acts is optimal. Despite this, in the aggregate data act a is always the optimal choice at its revealed posterior. Act c is optimal in all but problem 10, in which the posterior belief is too low (by 2%). Posterior probabilities of state 1 are generally slightly too high when b is chosen - by a maximum of 8% in decision problem 9. 6.5 Testing NIAC We next use our data to test whether subjects’ choices of information strategy are rationalizable by some cost function - in other words, whether they satisfy NIAC. For experiments 1 and 2, the fact that total surplus cannot be increased by switching information strategies between any two decision problems implies the following condition (assuming that both acts are chosen with positive probability in each case), 48 (f ) (f48 g48 ) + 52 (g) (g52 f52 ) 0, (1) where (x) indicates the change in x between the two decision problems. This expression has a natural interpretation. The …rst term is the change in the probability of choosing the right act in state 48 multiplied by the change in the bene…t of choosing the right act - i.e. the di¤erence between the payo¤ of act f and g in that state. The second term is the change in the probability of choosing the right act in state 52 multiplied by the bene…t of so doing. Thus the condition described implies that e 71 e 29 ud29 ud69 By assumption e 29 e 31 e 69 e 31 is negative, so rearranging tells us, ud29 e 29 + ud69 e 69 ue31 21 e 31 + ue71 e 71 : Experiment 1 is designed so that (g52 f52 ) = 0 for every pair or acts. Thus this condition implies a ranking of 48 (f i ) (or the probability of correct choice in state 48) across the di¤erent decision problems, 4 2 1 3 48 (f ) 48 (f ) 48 (f ) 48 (f ): Figure 4 shows that this ranking holds true in the aggregate. In experiment 2, for every pair of acts we have that (f48 f48 ) = (g52 g52 ). Thus this condition implies a ranking on 48 (f ) + 52 (g), or the total probability of choosing the right action across both states. The implied ranking is, 48 (f 8 )+ 52 (g 8 ) 48 (f 7 )+ 52 (g 7 ) 48 (f 5 )+ 52 (g 5 ) 48 (f 6 )+ 52 (g 6 ): Figure 6 shows that these implications broadly hold in the aggregate data. The total proportion of correct responses in decision problem 8 is higher than in decision problem 7 which is in turn higher than in decision problem 5. The total proportion of errors in decision problem 6 is slightly higher than that in problem 5 (by 0.5%) but the di¤erence is not statistically signi…cant. While the ordering is in line with the theory, it is clear that the elasticity in the response of information gathering to monetary incentives is quite low: the probability of making a correct choice rises from 65% in the $2 treatment to 70% in the $30 treatment (signi…cant at the 10% level). We consider this issue in the context of Shannon Mutual Information cost functions in ?. Experiment 1 Experiment 2 Figure 4: % of correct responses in state 48 (experiment 1) and both states (experiment 2).22 The equivalent of condition 1 for experiment 3 is 29 (d) (d29 e29 ) + 69 (d) (d69 e69 ) + 31 (e) (e31 d31 ) + 71 (e) (e71 d71 ) 0: Thus we …nd that, comparing decision problem 9 and 10 we have, 29 (d 10 )+ 10 31 (e ) 9 31 (e ) + 29 (d 9 ) 69 (d 10 )+ 10 71 (e ) 9 71 (e ) + 69 (d 9 ) ; meaning that the increase in the proportion of correct choices in states 29 and 31 must be bigger than the increase in proportion of correct choices in state 69 and 71 when shifting from decision problem 9 to 10. This makes sense, as relative to problem 9, problem 10 has higher rewards for correct decisions in the former two states that the latter two states. 22 Errors bars shown taking into account clustering at the subject level. 22 Comparing problems 9 and 10 to 11 implies, 9 )+ 9 71 (e ) 10 )+ 10 31 (e ) 69 (d 29 (d 69 (d 11 29 (d 11 71 (e ) )+ 11 )+ (C2) 11 31 (e ) (C3) Relative to problem 11, there should be a higher proportion of correct choices in state ! 69 and ! 71 in problem 9 and a higher proportion of correct choices in ! 29 and ! 31 in problem 10. Comparing problems 9 and 10 to 12 gives, 29 (d 69 (d 12 12 )+ 12 31 (e ) 12 71 (e ) )+ 9 )+ 9 31 (e ) (C4) 10 )+ 10 71 (e ) (C5) 29 (d 69 (d Comparing problems 11 and 12 tells us that the total number of correct choices should be higher in the latter that the former, 29 (d 12 )+ 69 (d 12 )+ 12 31 (e ) + 12 71 (e ) 29 (d 11 )+ 69 (d 11 )+ 11 31 (e ) + 11 71 (e ) Table 8 shows the extent to which the aggregate data satis…es these conditions Table 9 Condition C1 C2 C3 C4 C5 C6 Left Hand Side 27.5 132.7 155.7 151.1 138.3 289.4 Right Hand Side -3.7 127.4 131.5 128.2 132.7 258.9 P 0.01 0.36 0.05 0.03 0.34 0.17 In all cases, the left hand side values are higher than the right hand side, as required. These di¤erences are signi…cant at the 5% level except for conditions C3, C5 and C6. Applying bilateral NIAC to experiment 4 is slightly more complex. Comparing decision problems 14-16 the relevant condition is ( 49 (c) This implies the following ranking on 16 49 (c ) 49 (c) 16 51 (c ) uc49 51 (c)) 0 51 (c), 15 49 (c ) 15 51 (c ) 14 49 (c ) 14 23 51 (c ): In the aggregate data this ordering holds, though the di¤erences are small. The values of 51 (c) are 8.2, 7.4 and 5.5 for decision problem 16, 15 and 14 respectively. 23 49 (c) If it were the case that posterior beliefs when a is chosen in decision problem 13 are such it would be preferable to choose c14 (if available) we additionally have the restriction, 49 (c 14 ) 51 (c 14 ) 49 (c However this is not the case is our aggregate data. 23 13 ) 13 51 (c ) The tests described consider only bilateral comparisons of attention strategies The NIAC condition requires more than this: it must be impossible to increase total surplus through any reassignment of attention strategies between problems. This condition holds on the aggregate data for experiments 3, and 4. In experiment 2 it is violated in experiment 2 due to the slightly higher accuracy in decision problem 5 than in decision problem 4. In experiment 1 it holds conditional on actual choice at each posterior, but not given optimal choice at each posterior. Given that NIAS is violated in decision problem 2, it would in fact be optimal for act f to be chosen at both posteriors in this problem. However, in the strategies of decision problem 1 and 2 were swapped, it would be optimal to make di¤erent choices at di¤erent posteriors in both decision problems, which would in turn improve gross surplus. In order to test the full NIAC condition at the individual level, …gure c2 plots the distribution of actual surplus minus the maximal surplus possible by reassigning attention strategies to decision problems. The NIAC condition demands this number to be zero. As a comparator, we show the distribution obtained from random choice. 6.6 Comparison to Random Utility In section 5 we discussed three variants of RUM: Uninformed, Perfectly Informed and Partially Informed. All of these models have serious problems in explaining our data. Uninformed RUMs imply that choice probabilities should be the same in each state. As illustrated in section 6.3 this is clearly not the case in our data. Perfectly Informed RUMs imply that choice probabilities in any state should only be a function only of payo¤s in that state. This implies that, in experiment 1, choice probabilities in state 52 should be identical across decision problems 1-4. The probability of choosing act f in these four decision problems is 46%, 69%, 15% and 74% respectively, signi…cantly di¤erent at the 0.01% level. We illustrated above two important distinctions between the Partially Informed RUM and rational inattention. First, if subjects do not violate stochastic dominance in their choice, then in decision problems 1-3 choice should in fact be deterministic conditional on the signal received. We tested 24 subjects in experiment 1 for violations of stochastic dominance. We asked subjects a set of 10 question in which they were asked to choose between act f which paid $10 if there were 49 red balls and act g which paid $10 if there were 51 red balls. The prior probability of 49 red balls varied from 10% to 90%. However, for these questions, subjects did not get to see the screen with the balls, and had to make their decisions based purely on the prior. Thus, subjects obeyed stochastic dominance if and only if they chose act f when the prior on state 49 was greater than equal to 50%. We found that indeed 83% of subjects obeyed stochastic dominance, suggesting that for the majority of subjects all randomness we observe would have to be due to the exogenous signal. This cannot be squared with the increase in attentiveness as rewards increase in experiment 2: by assumption, the signal in the Partially Informed RUM is exogenous. Second, we argued that all RUMs, including the hybrid, must obey monotonicity. Experiment 4 shows that monotonicity can be violated, as suggested by MM. Table 10 shows the probability of 24 choosing option b in the four decision problems in this experiment. Table 10 Decision problem 13 14 15 16 49 (b) 51 (b) 25% 27% 29% 26% 26% 34% 39% 38% The introduction of act c increases the probability of choosing act b in state 51 from 26% to an average of 37% acriss decison problems 14-16.. While small, this increase is signi…cant at the 10% level. 7 Existing Literature Much of the work on rational inattention in economics can be traced back to Sims [1998] and Sims [2003] which characterized the behavioral impact of constraints on information processing in linear quadratic control problems.24 The rate of information ‡ow is measured using a the Shannon mutual information.25 Sims [2003] shows that such a constraint generates behavior similar to that of assuming that an agent observes the state of the world only noisily. However, the type of noise is determined endogenously, based on the incentives in the environment. Following this paper, mutual information constraints have been incorporated in an increasing number of economic settings, including consumption-savings problems (Sims [2006], Tutino [2008], and Mackowiak and Wiederholt [2010]), pricing problems (Mackowiak and Wiederholt [2009], Matejka [2010], Martin [2013]), monetary policy (Paciello and Wiederholt [2011]), and portfolio choice (Mondria [2010]). In part, the focus on mutual information as the measure of information is justi…ed by its central position in the information theory literature. The Shannon mutual information of two random variables is related to the expected length in bits of the optimally encoded signal needed to generate one from the other. It also has an axiomatic characterization which shows that information costs must be of this form if they are to obey certain intuitive properties (see for example Csiszár [2008]). Shannon mutual information costs also have interesting properties from an economic standpoint. As discussed in the text, Matejka and McKay [2011] demonstrate a strong relationship between mutual information based rational inattention and logit-style random choice. Cabrales et al. [2011] demonstrate a further interesting link between mutual information and economic behavior. They consider a ranking of information structures according to a “ruin averse” investor facing a class of no-arbitrage investment problems and show that the ranking of information structures based on willingness to pay is equivalent to that provided by mutual information. While much of the rational inattention literature has focussed on mutual information costs, a variety of other cost functions and constraints have been studied. Woodford [2012] points out that mutual information does not imply that less attention will be paid to rare events (as such attention is cheap in expectation), in violation of experimental results by Shaw and Shaw [1977]. He therefore proposes an alternative measure in which the cost of an information structure is evaluated according 24 Although the study [1961], Marschak [1971], 25 Although the study [1961], Marschak [1971], of costly Milgrom of costly Milgrom information aquisition in economics goes back much further - for example Stigler [1981]. information aquisition in economics goes back much further - for example Stigler [1981]. 25 to the related concept of the Shannon capacity. Gul et al. [2012] consider the behavior of households who are restricted to having “crude” consumption plans i.e. plans that are restricted to having at most n realizations. Nieuwerburgh and Veldkamp [2009] consider a more general information cost function, based on the distance between prior and posterior variance. Saint-Paul [2011] considers the case in which decision makers face Shannon cost, but can only choose discrete policy functions (essentially combining the approaches of Sims [2003] and Gul et al. [2012]). Reis [2006] considers the case of a binary information choice: in any given period either attention can be paid, and the state is fully revealed, or not, in which case no information is gathered. Even many of the articles that ostensibly use mutual information costs e¤ectively restrict the decision maker to choose Gaussian signals, implying additional constraints (see Sims [2006] for a discussion). A key strength of our approach is that our model nests all of the above costs functions. The costs of allowable attentional strategies can be captured by K, while the cost of inadmissible strategies can be set to in…nity. The NIAS and NIAC conditions therefore provide a test of the entire class of rational inattention models currently in use.26 A recent wave of decision theoretic literature has attempted to capture the observable implications of inattention, both rational and otherwise. Closest in spirit to our work is Ellis [2012], who works with a data set similar to ours - state dependent choice functions. Ellis [2012] initially asks under what conditions such choice data can be rationalized by a model in which the DM’s information is a partition on the underlying state space, with choices are optimal given this partition. The characterization is based on the identi…cation of cells in the partition by all objective states in which the same choice is made - an approach similar to that taken in our paper. This allows for the identi…cation of the preferences reveled by choices. The basic condition that Ellis [2012] applies is that revealed preference information has to be consistent with independence and dynamic consistency. Under these conditions (along with continuity and monotonicity) the coarsest partition that can rationalize the data is uniquely identi…ed. Ellis [2012] then goes on to make inattention “rational”by requiring that the partition in use for a particular set of acts is optimal in the set of partitions available to the DM. The additional observable implications of this data are (i) the irrelevance of independent acts - a version of the independence of irrelevant alternatives.(ii) a further application of the independence axiom, which states that, if more information is used when choosing from choice set A than from set B; then independence must hold. There are two key di¤erences between the theoretical section of our paper and Ellis [2012]. On the one hand, Ellis [2012] places weaker requirements on the data: unlike our approach, the DM’s utility function and prior beliefs are derived from behavior rather than directly observable. On the other Ellis [2012] considers a more restrictive class of information restrictions: the DMs in the Ellis [2012] model e¤ectively face a cost function which is zero for allowable partitions and in…nity for all other information structures. This restriction rules out any stochasticity in choice, as well as many commonly used information cost functions (such as those based on Shannon Mutual Information). A second decision theoretic approach to identifying rational attention is to examine choice over menus. Ergin and Sarver [2010] consider a model in which a decision maker makes choices over choice sets by optimally selecting a partition on (subjective and unobservable) states of the world, then choosing the best action conditional on that partition. They characterize the implications for such a model for choices between choice sets. Costly contemplation is characterized by an aversion to contingent planning: an agent would prefer to …nd out which set they are choosing from and 26 Note that we consider only the instrumental value of information, not any intrinsic value that information might have as in Grant et al. [1997]. 26 then choose from that set, rather that have to make contingent plans. Mihm and Ozbek [2012] extend this approach to the case in which there are observable states of the world, resulting in a representation similar to that considered in this paper. Our work is related to an ongoing project in which we aim to characterize choice behavior when the internal information state of the agent is not directly observable. Van Zandt [1994] provide an early negative result in this regard, showing that any choice behavior is rationalizable in a model that allows for hidden costly information acquisition if the state of the world is not observable. Caplin and Dean [2011] and Caplin et al. [2011] consider the case of sequential information search, using an extended data set to derive behavioral restrictions of search of this kind as well as of satis…cing behavior. Caplin and Martin [2011] introduce the NIAS condition to characterize subjective rationality in a single decision problem. Masatlioglu et al. [2012] characterize “revealed attention”, using the identifying assumption that removing an unattended item from the choice set does not a¤ect attention. Dillenberger et al. [2012] consider a dynamic problem in which the DM receives information in each period, characterizing the resulting preference over menus. In the psychology literature, theories to which we are close in spirit are signal detection theory (Green and Swets [1966]) and categorization theory. Unlike our approach, these models tend to assume that the attention strategy is …xed (commonly it is assumed that the DM gets a normal signal). A common feature is that the DM must choose the optimal action at each posterior. Based in part on these theories, there is an enormous experimental literature on signal detection and categorization in psychology (much of which uses state dependent stochastic choice data).27 Despite the psychological precedents, there is little experimental work on state dependent stochastic choice data within economics, and no work in either …eld that tests NIAC and NIAS directly. One related paper is Cheremukhin et al. [2011], which uses a formulation similar to Matejka and McKay [2011] to estimate a rationally inattentive model on subject’s choice over lotteries. They do not analyze the state dependence in the resulting stochastic choice data. 8 Conclusions As economists increasingly focus on attentional constraints, so the importance of rational inattention theory has grown. We characterize a general model of rational inattention which encompasses all models currently in the literature. The necessary and su¢ cient conditions are simple and readily testable. We …nd the model to do a qualitatively good job of explaining subject behavior in a simple experimental implementation. In contrast, traditional random utility models fail to capture important data features. In addition to further investigating the comparison with random utility models, we are currently exploring the behavioral content of more structured models of attention costs, in particular the Shannon model.28 We are also continuing to explore the implications of the model for behavior in important economic domains. 27 28 We do not attempt to summarize the literature here - see Verghese [2003] for a review. Caplin and Dean [2013] makes a start in this direction. 27 References Timothy K. M. Beatty and Ian A. Crawford. How demanding is the revealed preference approach to demand? American Economic Review, 101(6):2782–95, October 2011. Antonio Cabrales, Olivier Gossner, and Roberto Serrano. Entropy and the value of information for investors. Economics Working Papers we1104, Universidad Carlos III, Departamento de Economú a, March 2011. Andrew Caplin and Mark Dean. Search, choice, and revealed preference. Theoretical Economics, 6(1), January 2011. Andrew Caplin and Mark Dean. Rational inattention, entropy, and choice: The posterior-based approach. Memeo, Center for Experimental Social Science, New York University, 2013. Andrew Caplin and Daniel Martin. A testable theory of imperfect perception. NBER Working Papers 17163, National Bureau of Economic Research, Inc, June 2011. Andrew Caplin, Mark Dean, and Daniel Martin. Search and satis…cing. American Economic Review, 101(7):2899–2922, December 2011. Anton Cheremukhin, Anna Popova, and Antonella Tutino. Experimental evidence on rational inattention. Technical report, 2011. Raj Chetty, Adam Looney, and Kory Kroft. Salience and taxation: Theory and evidence. American Economic Review, 99(4):1145–77, September 2009. StephenA. Clark. The random utility model with an in…nite choice space. Economic Theory, 7:179–189, 1996. Jacques Cremer. A simple proof of blackwell’s &quot;comparison of experiments&quot; theorem. Journal of Economic Theory, 27(2):439–443, August 1982. Imre Csiszár. Axiomatic Characterizations of Information Measures. Entropy, 10:261–273, 2008. David Dillenberger, Juan Sebastian Lleras, Philipp Sadowski, and Norio Takeoka. A theory of subjective learning. Technical report, 2012. Andrew Ellis. Foundations for optimal attention. Memeo, Boston University, 2012. Haluk Ergin and Todd Sarver. A Unique Costly Contemplation Representation. Econometrica, 78(4):1285–1339, 2010. J C Falmagne. A Representation Theorem for Random Finite Scale Systems. Journal of Mathematical Psychology, 18:52–72, 1978. S Grant, A Kajii, and B Polak. Intrinsic preference for information. Technical report, 1997. D. M. Green and J. A. Swets. Signal detection theory and psychophysics. Wiley, New York, 1966. Faruk Gul and Wolfgang Pesendorfer. Random Expected Utility. Econometrica, 74(1):121–146, 2006. Faruk Gul, Wolfgang Pesendorfer, and Tomasz Strzalecki. Behavioral competitive equilibrium and extreme prices. Memeo, Princeton University, 2012. 28 C.A. Holt and S.K. Laury. Risk aversion and incentive e¤ects. American Economic Review, 92(5):1644–1655, 2002. Tjalling C. Koopmans and Martin Beckmann. Assignment problems and the location of economic activities. Econometrica, 25(1):pp. 53–76, 1957. Nicola Lacetera, Devin G. Pope, and Justin R. Sydnor. Heuristic thinking and limited attention in the car market. NBER Working Papers 17030, National Bureau of Economic Research, Inc, May 2011. Graham Loomes and Robert Sugden. Incorporating a stochastic element into decision theories. European Economic Review, 39(3-4):641–648, April 1995. Bartosz Mackowiak and Mirko Wiederholt. Optimal sticky prices under rational inattention. American Economic Review, 99(3):769–803, June 2009. Bartosz Adam Mackowiak and Mirko Wiederholt. Business cycle dynamics under rational inattention. CEPR Discussion Papers 7691, C.E.P.R. Discussion Papers, February 2010. CharlesF. Manski. The structure of random utility models. Theory and Decision, 8:229–254, 1977. Jacob Marschak. Economics of information systems. Journal of the American Statistical Association, 66(333):192–219, March 1971. Daniel Martin. Strategic pricing and rational inattention to quality. Memeo, New York University, 2013. Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y. Ozbay. Revealed attention. American Economic Review, 102(5):2183–2205, August 2012. Filip Matejka and Alisdair McKay. Rational inattention to discrete choices: A new foundation for the multinomial logit model. CERGE-EI Working Papers wp442, The Center for Economic Research and Graduate Education - Economic Institute, Prague, June 2011. Filip Matejka. Rationally inattentive seller: Sales and discrete pricing. CERGE-EI Working Papers wp408, The Center for Economic Research and Graduate Education - Economic Institute, Prague, March 2010. Daniel McFadden. Revealed stochastic preference: a synthesis. Economic Theory, 26(2):245–264, 08 2005. Maximilian Mihm and M. Kemal Ozbek. Decision making with rational inattention. Working paper, Social Science Research Network, 2012. Paul R Milgrom. Rational expectations, information acquisition, and competitive bidding. Econometrica, 49(4):921–43, June 1981. Jordi Mondria. Portfolio choice, attention allocation, and price comovement. Journal of Economic Theory, 145(5):1837–1864, September 2010. David J. Murray. A perspective for viewing the history of psychophysics. Behavioral and Brain Sciences, 16:115–137, 2 1993. 29 Koch C. Navalpakkam, V. and P Perona. Homo economicus in visual search. Journal of Vision, 9(1):1–16, January 2009. Stijn Van Nieuwerburgh and Laura Veldkamp. Information immobility and the home bias puzzle. Journal of Finance, 64(3):1187–1215, 06 2009. Luigi Paciello and Mirko Wiederholt. Exogenous information, endogenous information and optimal monetary policy. Technical report, 2011. Ricardo Reis. Inattentive Consumers. Journal of Monetary Economics, 53(8):1761–1800, 2006. Jean-Charles Rochet. A necessary and su¢ cient condition for rationalizability in a quasi-linear context. Journal of Mathematical Economics, 16(2):191–200, April 1987. Gilles Saint-Paul. A "quantized" approach to rational inattention. TSE Working Papers 10-144, Toulouse School of Economics (TSE), 2011. Babur De Los Santos, Ali Hortacsu, and Matthijs R. Wildenbeest. Testing models of consumer search using data on web browsing and purchasing behavior. American Economic Review, 102(6):2955–80, October 2012. M. L. Shaw and P. Shaw. Optimal allocation of cognitive resources to spatial locations. J Exp Psychol Hum Percept Perform, 3(2):201–211, May 1977. Christopher A. Sims. Stickiness. Carnegie-Rochester Conference Series on Public Policy, 49(1):317– 356, December 1998. Chris Sims. Implications of Rational Inattention. Journal of Monetary Economics, 50(3):665–690, 2003. Chris Sims. Rational Inattention: A Research Agenda. 2006. George J. Stigler. The economics of information. Journal of Political Economy, 69:213, 1961. Antonella Tutino. The rigidity of choice: Lifecycle savings with information-processing limits. Technical report, 2008. Timothy Van Zandt. Hidden information acquisition and static choice. CORE Discussion Papers , 1994017, UniversitÃl catholique de Louvain, Center for Operations Research and Econometrics (CORE), 1994. Preeti Verghese. Visual search and attention: A signal detection theory approach. 31(4):523–535, 2003. Neuron, Michael Woodford. Information-constrained state-dependent pricing. NBER Working Papers 14620, National Bureau of Economic Research, Inc, December 2008. Michael Woodford. Inattentive valuation and reference-dependent choice. Memeo, Columbia University, 2012. 30 9 9.1 Appendix A: Proofs Lemma 1 Lemma 1 Given decision problem ( ; A) 2 data, then is su¢ cient for ( ;q) : F and q 2 Q , if 2 is consistent with these Proof. Let 2 be an attention strategy that is consistent with q 2 Q in decision problem ( ; A) . First, we list in order all distinct posteriors i 2 ( ) for 1 i j ( )j. Given that is consistent with q, there exists a corresponding optimal choice strategy Y : f1; :::; Ig ! (A), with Y i (f ) denoting the probability of chosing act f 2 F (q) with posterior i , such that the attention and choice functions match the data, qm (f ) = I X i m( )Y i (f ): i=1 We also list in order all possible posteriors j 2 all chosen acts that are associated with posterior Fj ;q) ff 2 F jr( j ( ( ;q) ), 1 as F j , (f ) = j j j ( ( ;q) )j, and identify g: The garbling matrix bij sets the probability of j 2 given all choices associated with acts f 2 F j . X bij = Y i (f ): i 2 ( ) as the probability of f 2F j Note that this is indeed a j ( )j j ( )j stochastic matrix B Given j 2 ( ) and m 2 , note that, I X bij i=1 i m( ) = I X m( i=1 i ) X Y i (f ) = f 2F j X 0 with PJ j=1 b ij = 1 all i. qm (f ); f 2F j by the data matching property. It is de…nitional that m ( j ) is precisely equal to this, as the observed probability of all acts associated with posterior j 2 . Hence, m( j )= I X bij m( i ); i=1 as required for su¢ ciency. 9.2 Theorem 1 Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es NIAS and NIAC. 31 Proof of Necessity. To con…rm that existence of a rational inattention representation (K; ) of (D; q) implies that the NIAS condition is satis…ed, consider ( ; A) 2 D and a corresponding choice strategy Y : ( ( ; A)) ! (A) such that …nal choices are optima and that matches the data, X qm (f ) = m ( )Y (f ): 2 ( ( ;A)) Given f 2 q ( ;A) , consider all posteriors f 2 for which Y (f ) > 0, (f ) = f (( ( ; A))jY (f ) > 0g : Note that it must be the case that M X M X f m Um m=1 g m Um m=1 all g 2 A for all posteriors 2 , Note also that we can rewrite the revealed posterior beliefs as a weighted average of the true posterior beliefs 2 3 X 4 m ( )Y (f )5 m qm (f ) 2 ( ( ;A)) ( ;q) =: rm (f ) = Mm M X X q (f ) j j j qj (f ) j=1 = X 2 ( ( j=1 2 6 6 6 6 6 ;A)) 4 M X j=1 m 3 j m( )Y (f ) 7 7 7 7= M 7 X 5 q (f ) j j X mP ( 2 ( ( ;A)) jf ) j=1 Where the second line is obtained by dividing and multimplying each term in the sum by M X j m( j=1 )Y (f ), or the probability of state , and P ( jf ) indicates the probability of state given that f was chosen The value of choosing act f at its revealed posterior is therefore M X ( ;q) f rm (f )Um = m=1 X P ( jf ) X P ( jf ) 2 ( ( ;A)) 2 ( ( ;A)) = M X m=1 f m Um m=1 M X g m Um m=1 ( ;q) g rm (f )Um 8g2A Where the middle inequality stems from the fact that M X m=1 every M X 2 ( ( ; A)) 32 f m Um M X m=1 g m Um all g 2 A for Next we show that, if there exists a rational attention representation (K; ) of (D; q), then N IAC must hold For any sequence ( ; A1 ); ( ; A2 ); :::::( ; AJ ) 2 D with AJ = A1 , it must be the case that J X1 G( ; Aj ; ( ;Aj ) ) K( ; ( ;Aj ) J X1 ) j=1 J X1 G( ; Aj ; G( ; Aj ; ( ;Aj+1 ) ) K( ; ( ;Aj+1 ) j=1 ( ;Aj ) ) G( ; Aj ; ( ;Aj+1 ) J X1 ) j=1 K( ; ( ;Aj ) ) K( ; ( ;Aj+1 ) )) )=0 j=1 Where the last equality stems from the fatc that K( ; ( that J J X1 X1 G( ; Aj ; ( ;Aj ) ) G( ; Aj ; j=1 ;A1 ) ) = K( ; ( ;Aj+1 ) ( ;AJ ) ). This implies ) j=1 It is immediate that G( ; Aj ; ( ;Aj ) ) = G( ; Aj ; ( ;Aj ) ) 8 j, as the minimal attention strategy implies the same pattern of stochastic choice as does the original attention strategy. Moreover, by lemma 1, we know that ( ;Aj ) is su¢ cient for ( ;Aj ) 8 j, and so, by Blackwell’s theorem G( ; Aj ; ( ;Aj+1 ) ) G( ; Aj ; ( ;Aj+1 ) ) 8 j.(see remark 1). Thus the true attention strategies ( ;Aj ) can be replaced by the minimal attention strategies ( ;Aj ) in the above expression and the inequality will still hold, implying NIAC. Proof of Su¢ ciency. There are three steps in the proof that the NIAS and NIAC conditions are su¢ cient for (D; q) to have a rational inattention representation. The …rst step is to establish that the NIAC conditions ensures that there is no global reassignment of the minimal attention strategies observed in the data to decision problems ( ; A) 2 D that raises total gross surplus. The second step is use this observation to de…ne a candidate cost function on attentional strategies, K: ! R [ 1. The key is to note that, as the solution to the classical allocation problem of Koopmans and Beckmann [1957], this assignment is supported by “prices” set in expected utility units. It is these prices that de…ne the proposed cost function. The …nal step is to apply the NIAS conditions to show that K; represents a rational inattention representation of (D; q), where comprises minimal attention strategies. Consider any prior 2 such that there exists two or more sets A such that ( ; A) 2 D. Enumerate these sets as Al for 1 l L. De…ne the corresponding minimal attention strategies l for 1 l L as each is revealed in the corresponding data q ( ;Al ) . Note that the minimal attentions strategies may not all be distinct. In cases in which the same strategy appears more than once, one retains all copies so that the cardinality of the set of strategies precisely matches that of the set of underlying decision problems at L. With this, one can consider the set M of all matchings of minimal attention strategies as identi…ed by their index 1 l L with correspondingly indexed decision problems. Formally, each such matching is a 1-1 function m : f1; ::Lg ! f1; ::Lg in which strategy m(l) is applied with decision problem ( ; Al ). Given each such matching, one can de…ne the corresponding sum of gross utiilties as, S G (m) = L X G( ; Al ; m(l) ): l=1 The claim is that the NIAC condition implies that the identify map mI (l) = l maximizes this 33 sum over all matching functions m 2 M. Suppose to the contrary that there exists some alternative matching function that achieves a strictly higher sum, and denote this match m 2 M. In this case construct a …rst sub-cycle as follows: start with the lowest index l1 such that m (l1 ) 6= l1 . De…ne m (l1 ) = l2 and now …nd m(l2 ), noting by construction that m(l2 ) 6= l2 . Given that the domain is …nite, this process will terminate after some J L steps with m (lJ ) = l1 . If it is the case that m (l) = l outside of the set [Jj=1 lj , then we know the increase in the value of the sum is associated only with this cycle, hence, J X1 G( ; Alj ; lj )< j=1 J X1 G( ; Alj ; lj+1 ); j=1 directly in contradition to NIAC. If this inequality does not hold strictly, then we know that there exists some l0 outside of the set [Jj=1 lj such that m (l0 ) 6= l’. We can therefore iterate the process, knowing that the above strict inequality must be true for at least one such cycle to explain the strict increase in overall gross utility. Hence the identity map mI (l) = l indeed maximizes S G (m) amongst all matching functions m 2 M. The fact that mI (l) = l maximizes S G (m) enables us to apply the results of Koopmans and Beckmann [1957] who used linear programming techniques to solve allocation problems of this form. Their results directly imply that any solution to this problem of optimally matching production functions (the attentional strategies) with locations (the decision problems) can be decentralized by charging prices (not necessarily positive) for either resource, and leaving the owners of the other resource to maximize their pro…ts. To apply this to our problem, consider any 2 such that there exist sets ( ; Al ) 2 D for 1 l L with L 2, de…ne as the set of all minimal attention strategies, [( ;A)2D ( ;A) : The result of Koopmans and Beckmann directly implies existence of a real-valued function K : ! R that decentralizes the problem from the viewpoint of the owner of the decision problems, seeking to identify surplus maximizing attentional strategies to match to their particular problems. The de…ning characeristic of these costs is optimality of the revealed minimal attention strategies, G( ; Al ; all 2 l ) K ( l) G( ; Al ; ) K ( ); . To complete the de…nition of the cost function, consider all 2 such that there exist a unique decision problem ( ; A) 2 D, and set the cost of the corresponding minimal attentional strategy ( ;A) = 0 Now complete the function across all to zero K 2 such that there exist one or more ( ; A) 2 D by setting K( ; ) = 1 for 6= ( ;A) . Finally, for all 2 such that there is no ( ; A) 2 D, set K( ) = 0 all 2 and K( ; ) = 1 for 2 = . Note that we have now completed construction of a qualifying cost function K : ! R [ 1 that satis…es K( ; ) = 1 for 2 = and K( ; ) 2 R for some 2 . The entire construction was aimed at ensuring that the observed attentional strategy choices were always maximal, ( ; A) 2 ^ (K; ; A) for all ( ; A) 2 D. It remains to prove that ( ;A) is consistent with q ( ;A) for all ( ; A) 2 D. This requires us to show that, for all ( ; A) 2 D, the choice rule that associates with each 2 ( ( ;A) ) the certainty of choosing the associated act f 2 F ( ; A) as observed in the data is both optimal and matches the data. That it is optimal is the precise 34 content of the NIAS constraint, M X M X ( ;A) f rm (f )Um m=1 ( ;A) g rm (f )Um ; m=1 for all g 2 A. That this choice rule and the corresponding minimal attention function match the data holds by construction. 9.3 Theorem 2 Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention representation with conditions K1 to K3 satis…ed. Proof. The proof of necessity is immediate form theorem 1. The proof of su¢ ciency proceeds in four steps, starting with a rational inattention representation K; of (D; q) of the form produced in theorem 1 based on satisfaction of the NIAS and NIAC conditions. A key feature of this function is that, given 2 such that ( ; A) 2 D some A 2 F, the function K is real-valued only on the minimal information strategies f ( ;A) j ; A) 2 Dg associated with all corresponding decision problems, otherwise being in…nite. The …rst step is the proof is to construct for each such 2 a larger domain on which K is real-valued to satisfy four properties: to include all minimal attention strategies, ; to include the inattentive strategy, I 2 ; to be closed under mixtures so that ; 2 and 2 (0; 1) implies (1 ) 2 ; and to be “closed under garbling,” so that if 2 is su¢ cient for attentional strategy 2 , then 2 . The second step is, for each 2 , to de…ne a new function K that preserves the essential elements of K while being real-valued on the larger domain , and thereby to construct the full candidate cost function K : ! R [ 1. The third step is to con…rm that K 2 K and that K satis…es the required conditions K1 through K3. The …nal step is to con…rm that K; forms a rational inattention representation of (D; q). Given 2 such that ( ; A) 2 D some A 2 F, we construct the domain in two stages. First, we de…ne all attention strategies for which some minimal attentional strategy 2 is su¢ cient; j9 2 su¢ cient for g: S =f 2 Note that this is a superset of and that it contains I . The second step is to identify as the smallest mixture set containing S : this is itself a mixture set since the arbitrary intersection of mixture sets is itself a mixture set. By construction, has three of the four desired properties: it is closed under mixing; it contains , and it contains the inattentive strategy. The only condition that needs to be checked is that it retains the property of being closed under su¢ ciency: 2 su¢ cient for 2 =) 2 To establish this, it is useful …rst to establish certain properties of S is closed under garbling: 2 S su¢ cient for 35 2 =) 2 . S S. and of . The …rst is that Intuitively, this is because the garbling of a garbling is a garbling. In technical terms, the product of the corresponding garbling matrices is itself a garbling matrix. The second is that one can explicity express as the set of all …nite mixtures of elements on S , 8 9 J < = X j J 1 j = = jJ 2 N; ( 1 ; :: J ) 2 S ; 2 S ; j : ; j=1 where S J 1 is the unit simplex in RJ: To make this identi…cation, note that the set as de…ned on the RHS certainly contains S and is a mixture set, hence is a superset of . Note moreover that all elements in the RHS set are necessarily contained in any mixture set containing S by a process of iteration, making is also a subset of , hence …nally one and the same set. We now establish that if 2 is a garbling of some 2 , then indeed 2 . The …rst step is to express 2 as an appropriate convex combination of elements of S as we now know we can, J X j = : j j=1 with all weights strictly positive, j > 0 all j. Note that, given any such expression, one can generate another set of elements ~ j 2 S with the additional property that they all have precisely the same support as does , ( j ) = ( ). To see this, note that we can de…ne as each ~ j as a mixture of j and itself, j ~j = + (1 ) ; with weight 2 (0; 1) that is independent of j. With this, the mixture property = J X j ~ j is j=1 preserved, while the possible posteriors associated with each ~ j become the common set ( ). As a mixture of elements from the set S , note that ~ j 2 S . To prevent notation from proliferating, we assume that the initial set j in the above expression all have support ( j ) = ( ). Lemma 2 below establishes that in this case there exist garblings j of j 2 S such that, = J X j j ; j=1 establishing that indeed of j implies j 2 S . 2 since, with S closed under garbling, j 2 S and j a garbling Given 2 such that ( ; A) 2 D some A 2 F, we de…ne the function K on in three stages. First we de…ne the function KS on the domain S by identifying for any 2 S the corresponding set of minimal attention strategies 2 of which is a garbling, and assigning to it the lowest such cost. Formally, given 2 S , KS ( ) min f 2 j K( ): su¢ cient for g Note that KS ( ) = K( ) all 2 . To see this, consider ( ; A); ( ; A0 ) 2 D with ( ;A) su¢ cient for . By the Blackwell property, expected utility is at least as high using ( ( ;A) using for which it is su¢ cient, G( ; A; ( ;A0 ) ) G( ; A; 36 ( ;A) ): ( ;A0 ) ;A0 ) as At the same time, since K; ( ;A) 2 ^ (K; ; A), so that, G( ; A; is a rational attention representation of (D; q), we know that ( ;A) Together these imply that K( 2 . ) K( ( ;A) ) ( ;A) K( ) ( ;A0 ) G( ; A; ( ;A0 ) ), ) K( ( ;A0 ) ): which in turn implies that KS ( ) = K( ) all Note that KS ( ) also satis…es weak monotonicity on S on this domain, since if we are given ; 2 S with su¢ cient for , then we know that any strategy 2 that is su¢ cient for is also su¢ cient for , so that the minimum de…ning KS ( ) can be no lower than that de…ning KS ( ). The second stage in the construction is extend the domain of the cost function from As noted above, this set comprises all …nite mixtures of elements of S , 8 9 J < = X j J 1 j = = jJ 2 N; ( ; :: ) 2 S ; and 2 j 1 J S; : : S to . j=1 Given 2 , we take the set of all such mixtures that generate it and de…ne K ( ) to be the corresponding in…mum, K ( )=8 > < > : inf X J J2N; 2S J 1 ;f j gJ j=1 2 Sj = j j > ; j=1 J X 9 > = j=1 j KS ( j ): Note that this function is well de…ned since KS is bounded below by the cost of inattentive strategies and the feasible set is non-empty by de…nition of . We establish in Lemma 3 that the in…mum is achieved. Hence, given 2 , there exists J 2 N; 2 S J 1 ; and elements j 2 S with J X j such that, = j j=1 K ( )= J X j KS ( j ): j=1 We show now that K satis…es K2, mixture feasibility. Consider distinct strategies 6= 2 . We know by Lemma 3 that we can …nd J ; 2 N; corresponding probability weights ; 2 S ; J J X X j j j j , and such that, and elements ; 2 S with = , = j j j=1 K ( ) = j=1 J X j KS ( j j KS ( j ); j=1 K ( ) = J X ): j=1 Given 2 (0; 1), consider now the mixture strategy de…ned by taking each strategy j with j with probability (1 ) j . By construction, this mixture probability j and each strategy 37 strategy generates K ( ) that, K ( ) =[ J X + (1 j KS ( j ) )+ j=1 J X ] 2 and hence we know by the in…mum feature of (1 j j KS ( ) ) = K ( ) + (1 )K ( ); j=1 con…rming mixture feasibility. We show also that K satis…es K3, weak monotonicity in information. Consider ; 2 with su¢ cient for . We know by Lemma 3 that we can …nd J 2 N; 2 S J 1 ; and corresponding J X J j j j and such that, elements 2 S with …xed range ( ) = ( ) such that = j j=1 j=1 K ( )= J X j KS ( j ): j=1 j J j=1 We know also from Lemma 2 that we can construct such that each on its domain j is a garbling of the corresponding S , we conclude that, KS ( j j. 2 S such that = J X j j and j=1 Given that KS satis…es weak monotonicity KS ( j ): ) By the in…mum feature of K ( ) we therefore know that, K ( ) J X J X j j KS ( ) j ) = K ( ); j=1 j=1 con…rming weak monotonicity. j KS ( We show now that we have retained the properties that made K; a rational inattention representation of (D; q) for prior 2 . It is immediate that and the choice function that involves picking act f 2 F ( ;A) for sure in revealed posterior r( ;A) (f ) is consistent with the data, since this was part of the initial de…nition. What needs to be con…rmed is only that the revealed minimal attentional strategies are optimal. Suppose to the contrary that there exists ( ; A) 2 D such that, G( ; A; ) K ( ) > G( ; A; ( ;A) ) K ( ( ;A) ); for some 2 . By Lemma 3 we can …nd J 2 N; a strictly positive vector J X J j j and such that, 2 S , such that = corresponding elements j j=1 j=1 K ( )= J X j KS ( j ): j=1 By the fact that = J X j j and by construction of the mixture strategy, j=1 G( ; A; ) = J X j=1 38 j G( ; A; j ); 2 SJ 1; and so that, J X j G( ; A; j ) j KS ( ) > G( ; A; ( ;A) ) ( ;A) K ( ): j=1 We conclude that there exists j such that, j G( ; A; ) KS ( j ) > G( ; A; ( ;A) ) K ( ( ;A) ): Note that each j 2 S inherits its cost KS ( j ) from an element j 2 that is the lowest cost minimal attentional strategy according to K on set that is su¢ cient for j , KS ( j )=K ( j ); where the last equality stems from the fact (established above) that KS ( ) = K ( ) on 2 . Note by the Blackwell property that each strategy j 2 o¤ers at least as high gross value as the strategy for which it is su¢ cient, so that overall, G( ; A; j ) K ( j ) G( ; A; j ) KS ( j ) > G( ; A; ( ;A) ) K ( ( ;A) ): To complete the proof it is su¢ cient to show that, K ( ) = K ( ); on 2 : With this we derive the contradiction that, G( ; A; j ) K ( j ) > G( ; A; in contradiction to the assumption that K; (D; q). ( ;A) ) K ( ( ;A) ); formed a rational inattention representation of To establish that K ( ) = K ( ) on 2 , note that we know already that KS ( ) = K ( ) on 2 . If this did not extend to K ( ), then we would be able to identify a mixture strategy 2 su¢ cient for ( ;A) with strictly lower expected costs, K ( ) < K ( ). This see that this is not possible, note …rst from Lemma ?? that all strategies that are consistent with ( ; A) and q ( ;A) are su¢ cient for ( ;A) . Weak monotonicity of K on then implies that the cost K ( ) of any mixture strategy su¢ cient for ( ;A) satis…es K ( ) K ( ), as required. The …nal and most trivial stage of the proof is to ensure that normalization holds. We …rst normalize each function K ( ) for those 2 for which ( ; A) 2 D for some A 2 F. In such cases we note that I 2 S , so that KS (I ) 2 R according to the rule immediately above. If we renormalize this function by subtracting KS (I ) from the cost function for all attention strategies associated with this prior then we impact no margin of choice and do not interfere with mixture feasibility, weak monotonicity, or whether or not we have a rational inattention representation. Hence we can avoid pointless complication by assuming that K (I ) = 0 from the outset so that this normalization is vacuous. We have now fully-speci…ed a cost function K : ! R for all 2 such that ( ; A) 2 D some A 2 F. All that remains to complete our de…nition of the candidate cost function K is to expand the domain to include all inattentive strategies for the irrelevant priors 2 for which there is no corresponding ( ; A) 2 D. We de…ne = I in such cases and set and K (I ) = 0. Note that this single element domain is trivially closed under garbling and under mixtures. With this, we 39 de…ne the candidate cost function K : cost functions as de…ned above, ! R [ 1 by patching together the set of prior-based K ( ) if 2 1 if 2 = : K( ; ) = Note that weak monotonicity implies that the function is non-negative on its entire domain. It is immediate that K 2 K, since K( ; ) = 1 for 2 = and all domains for 2 contain the corresponding inattentive strategy I on which K( ; ) is real-valued. It is also immediate that K satis…es K3, since K (I ) = 0 by construction. It also sati…es K1 and K2, and represents a rational inattention representation, completing the proof. Lemma 2 If = J X j j 1 ( j) 2 SJ with J 2 N; = ( ) all j, then for any garbling 1 with j j of , there exist garblings = J X j j j J j=1 > 0 all j, and of j 2 2 , with such that, ; j=1 Proof. By assumption, there exists a j ( )j j ( )j matrix B with for all k 2 ( ), X k ( ) = bik m ( i ): m i2 P k bik = 1 all k and such that, ( ) Given that ( j ) = ( ) all j, this same matrix can be applied to all vectors garblings j of each j , X j k bik jm ( i ): m( ) = i2 j to generate ( ) It is clear that this satis…es the required condition = J X j j since, j=1 m( k )= X i2 b ik )= X i2 ( ) Lemma 3 Given m( i 2 b ( ) ik J X j i j m( ) = j=1 , there exists J 2 N; J X j=1 2 SJ 1, K ( )= j=1 40 i2 j KS ( j ): bik jm ( i ) = J X j k j m ( ): j=1 ( ) and elements such that, J X j X j 2 S with = J X j=1 j j Proof. By de…nition K ( ) is the in…mum of J X j KS ( j) j J j=1 over all lists j=1 = J X j j. 2 S such that We now consider a sequence of such lists, indicating the order in this sequence j=1 j (n) J(n) , j=1 in parentheses, J(n) with = X j (n) j (n) such that in all cases there are corresponding weights (n) 2 S J(n) 1 and that achieve a value that is heading in the limit to the in…mum, j=1 J(n) lim n !1 X j j (n)KS ( (n)) = K ( ). j=1 A …rst issue that we wish to avoid is limitless growth in the cardinality J(n). The …rst key observation is that, by Charateodory’s theorem, we can reduce the number of strictly positive J (n) X j (n) to have cardinality J (n) weights in a convex combination = M + 1. We j (n) j=1 J (n) con…rm now that we can do this without raising the corresponding costs, X j (n)KS ( j (n)). j=1 Suppose that there is some integer n such that the original set of attentional strategies has strictly higher cardinality J(n) > M + 1. Suppose further that the …rst selection of J 1 (n) M +1 such posteriors for which there exists a strictly positive probability weights 1j (n) such that = J 1 (n) X 1 j (n) j (n) has higher such costs (note WLOG that we are treating these as the …rst J 1 (n) j=1 1 j (n) attention strategies in the original list). It is convenient to de…ne so that we can express this inequality in the simplest terms, J(n) X = 0 for J 1 (n)+1 j J(n) J(n) j 1 j (n)KS ( (n)) > j=1 X j (n)KS ( j (n): j=1 This inequality sets up an iteration. We …rst take the smallest scalar 1 1 j (n) = 2 (0; 1) such that, j (n): J 1 (n) That such a scalar exists follows from the fact that 1 X j=1 J(n) 1 j (n) = X j (n) = 1, with all components j=1 in both sums strictly positive and with J(n) > J 1 (n). We now de…ne a second set of probability weights 2j (n), 1 1 (n) j (n) j 2 : (n) = j 1 1 41 J(n) for 1 j J(n). Note that these weights have the property that = X 2 j (n) j (n) and that, j=1 J(n) X J(n) 2 j j (n)KS ( (n)) = j=1 X j=1 " 1 1 (n) j 1 j (n) 1 # J(n) KS ( j (n)) < X j (n)KS ( j (n): j=1 By construction, note that we have reduced the number of strictly positive weights 2j (n) by at least one to J(n) 1 or less. Iterating the process establishes that indeed there exists a set of no more than M + 1 posteriors such that a mixture produces that …rst strategy and in which this mixture has no higher weighted average costs than the original strategy. Given this, there is no loss of generality in assuming that J(n) M + 1 in our original sequence. With this bound on cardinality, we know that we can …nd a subsequence of attentional strategies which all have precisely the same cardinality J(n) = J M + 1 all n. Going further, we j (n) 1 . First, we can select can impose properties on all of the J corresponding sequences n=1 subsequences in which the ranges of all corresponding attention functions have the same cardinality independent of n, ( j (n)) = K j j (n) for 1 j J. With this, we can index the possible posteriors jk (n) 2 ( j (n)) in order, j 1 k K and then select further subsequences in which these posteriors themselves converge to limit posteriors, jk (L) = lim jk (n) 2 : n!1 We ensure also that both the associated state dependent probabilities themselves and the weights J(n) X j (n) converge, = j (n) in the expression j (n) j=1 lim jk m n!1 lim n!1 (n) = jk m (L); j (n) = j (L): The …nal selection of a subsequence ensures that, given 1 j J, each j (n) 2 S has its value de…ned by precisely the same minimal attentional strategy j 2 as the least expensive among those that were su¢ cient for it and hence whose cost it was assigned in function KS . Technically, for each 1 j J, KS ( j (n)) = K ( j ); for 1 n 1: this is possible because the data set and hence the number of minimal attention strategies is …nite. We …rst use these limit properties to construct a list of limit attention strategies J X j for 1 j J. Strategy j (L) has range, with = j j=1 ( j j (L)) = [K k=1 42 jk (L); j (L) 2 S with state dependent probabilities, j (L) jk Note that the construction ensures that = jk m (L): (L) = m J X j (L). j (L) To complete the proof we must j=1 establish only that, K ( )= J X j j (L)KS ( (L)): j=1 We know from the construction that, for each n; J X j (n)KS ( j (n)) = j=1 J X j (n)K ( j ): j=1 Hence the result is established provided only, KS ( j (L)) K ( j ); which is true provided j being su¢ cient for all j (n) implies that j is su¢ cient for the corresponding limit vector j (L). That this is so follows by de…ning B j (L) = [bik (L)]j to be the limit of any subsequence of the j ( j )j K j stochastic matrices B j (n) = [bik (n)]j which have the de…ning property of su¢ ciency, X j i (n) m ( jk (n)) = [bik (n)]j m ( ); i2 ( j) for all jk (n) 2 ( j (n)) and 1 m M . It is immediate that the equality holds up in the limit, establishing that indeed j is su¢ cient for each corresponding limit vector j (L), con…rming …nally that KS ( j (L)) K ( j ) and with it establishing the Lemma. 43 10 Appendix B: Examples of NIAS and NIAC Violations from RUMs To be completed 44 11 Appendix C: Individual Costs of NIAS and NAIC Violations Experiment 1 Experiment 2 Experiment 3 Experiment 4 Figure c1: $ losses due to NIAS violations - actual and simulated subjects 45 Experiment 1 Experiment 2 Experiment 3 Experiment 4 Figure C2: $ losses due to NIAC violations - actual and simulated subjects 46