Empirical Welfare Analysis for Discrete Choice: Some New Results Debopam Bhattacharya University of Cambridge Spring, 2016. Abstract This paper explores the problem of empirical welfare-analysis in multinomial choice settings, allowing for completely general consumer-heterogeneity and income-e¤ects. We present results on welfare-analysis under three practically important scenarios, viz., (i) simultaneous pricechange of multiple options, (ii) introduction/elimination of a choice-alternative, and (iii) choice among non-exclusive goods. Our results do not follow from Bhattacharya (Econometrica, 2015). In program-evaluation contexts, these results would enable estimation of compensated programe¤ects, i.e., how much the subjects themselves value the program, and the resulting deadweight loss. Welfare-analysis under income-endogeneity is also brie‡y discussed. The methods are illustrated with hypothetical policy experiments using survey data from India on teenagers who make the choice between school, work and home-stay. 1 Introduction Welfare analysis, based on the compensating and equivalent variation (CV and EV, henceforth), lies at the heart of economic policy evaluations. Although theoretically well-understood, these measures are rarely calculated or reported as part of econometric program evaluation. In fact, empirical studies – both reduced-form and structural – typically adopt a paternalistic view and evaluate a policy intervention in terms of its uncompensated e¤ects on individual outcomes. But they typically ignore the heterogeneity of policy-impact on individual welfare which requires computation of compensated e¤ects. For example, in an educational context, researchers typically evaluate the I am grateful to Phil Haile, Jerry Hausman, Whitney Newey and seminar participants at the University of Wisconsin for feedback and suggestions. All errors are mine. 1 impact of a tuition subsidy program via its net e¤ect on college-enrolment (c.f. Ichimura and Taber, 2002, Kane, 2003). But they stop short of evaluating how much the subsidy is valued by the potential bene…ciaries themselves, viz., the lump-sum income transfer that would result in the same individual utilities as the subsidy, and any resulting deadweight loss thereof.1 The present paper develops methods for simple calculation of such welfare e¤ects as part of program evaluation studies in the common setting of discrete choice, without requiring the researcher to make any restrictive assumptions on the nature of preference heterogeneity or income e¤ects. In practical settings, unobservable heterogeneity in consumer preferences makes empirical welfare analysis a challenging problem. Nonetheless, some advances based on Roy’s identity have recently been made for the case of a continuous good, such as gasoline demand, under general preference heterogeneity (c.f., Hausman and Newey, 2016 and Lewbel and Pendakur, 2015). However, many real-life individual decisions involve discrete choice, such as college-attendance, choice of commuting method, school-choice, retirement decisions, and so forth. The Roy’s identity based methods used for continuous choice cannot be applied in these settings owing to the non-smoothness of individual demand in price and income. Until very recently, available methods for welfare analysis in discrete choice settings were based on restrictive and arbitrary assumptions on preference heterogeneity, e.g., quasi-linear preferences implying absence of income e¤ects (c.f., Domencich and McFadden, 1975, Small and Rosen 1981), or parametrically speci…ed utility functions and heterogeneity distributions (Herriges and Kling, 1999, Dagsvik and Karlstrom, 2005, Goolsbee and Klenow, 2006). Two key concerns with these parametric approaches are (a) model mis-speci…cation leading to erroneous substantive conclusions, and (b) identi…cation of welfare distributions solely from functional form assumptions. Recently, Bhattacharya, 2015 (DB15, henceforth) has shown that for heterogeneous consumers facing choice between mutually exclusive discrete alternatives, the marginal distributions of equivalent/compensating variation (EV/CV) resulting from a ceteris paribus price-change of an alternative can be expressed as closed-form transformations of conditional choice probabilities. These results hold under fully unrestricted heterogeneity and income-e¤ects across consumers. The present paper complements DB15, and makes three contributions. First, in Section 2, it is shown that in a discrete choice setting, welfare e¤ects of simultaneous price changes of several alternatives continues to 1 Stiglitz (2000, page 276) notes that empirical researchers typically ignore income e¤ects owing to the perceived di¢ culty of calculating them, and Goolsbee (1999, page 10) points out that whereas the economic theory of welfare largely relates to compensated elasticities, common program evaluation studies in public …nance typically report uncompensated e¤ects. Hendren (2013) discusses this point and some related issues in more detail. 2 remain well-de…ned whereas the analog of Marshallian Consumer Surplus becomes path-dependent. Next, EV/CV distributions are shown to be expressible as closed-form transformations of estimable choice-probabilities (Theorem 1). These results cover situations where some price changes are negative, some zero and some positive. The key issue here is that although welfare e¤ects of simultaneous change in multiple prices are well-de…ned, their distributions cannot be obtained by iterating the single price-change result of DB15. This is because the income at which welfare distributions are to be evaluated varies in an unobservable way across individuals from the second iteration onwards, and thus cannot be conditioned on. Consequently, new results are required for welfare analysis in these situations. Multiple price-changes are the likely "general equilibrium" consequences of a single initial price change of a product with substitutes. For example, a school-tuition subsidy in an area with child labor is likely to raise children’s wages in response. The impact of such simultaneous price-changes on consumer welfare is usually the key consideration for regulatory agencies with regard to their decision on whether to implement the proposed change (c.f., Willig et al, 1991). It has been common practice in applications to use the so-called log-sum formula (c.f., Small and Rosen, 1981, Train, 2009) for welfare comparisons. This formula, though convenient, is based on strong, unsubstantiated assumptions like absence of income e¤ects and extreme valued heterogeneity, and potentially leads to erroneous substantive conclusions (see our empirical example below for a concrete illustration of such errors).2 In the present paper, we show that under completely general preference heterogeneity and income e¤ects, one can express welfare-distributions resulting from multiple simultaneous pricechanges in terms of choice probability functions, thereby reducing welfare-analysis to the problem of estimating choice probabilities. Crucially, our welfare-expressions (a) hold when income e¤ects are non-negligible, as is likely for bigger purchases like children’s education and consumer durables, and (b) apply to arbitrary patterns of price changes across alternatives, thereby making the results useful across a large range of empirical situations. Section 3 of the present paper concerns welfare e¤ects resulting from elimination of an existing alternative, or, equivalently, "retrospective" calculation of welfare gain from introduction of a choice alternative, e.g., banning teenage wage-labour in a poor country. Previous empirical studies on this problem (e.g. Hausman’s celebrated study on breakfast-cereal brands) either ignored con2 Some of these restrictive parametric assumptions have subsequently been replaced with less stringent parametric assumptions, c.f. Dagsvik and Karlstrom, 2005, McFadden and Train, 2000, Herriges and Kling, 1999, Goolsbe and Klenow, 2006. All of these still require speci…cation of the dimension and distribution of unobserved heterogeneity and functional form of utility functions about which no a priori information is available. 3 sumer heterogeneity, and implicitly assumed a “representative consumer”model, (Willig et al, 1991, Hausman, 1996, Hausman and Leonard, 2002, Hausman, 2003),3 and/or worked under restrictive parametric assumptions (c.f., Petrin, 2002). In contrast, our results here allow for (a) unrestricted consumer heterogeneity, and (b) arbitrary e¤ects of introducing/eliminating the alternative on the price of existing alternatives, thereby making them applicable to a wide variety of practical situations.4 Importantly, these results are robust to failure of parametric assumptions, and clarify which features of welfare distributions can and which ones cannot be identi…ed from demand data without functional form assumptions. Indeed, our key result, outlined in theorem 2 below, shows that Hicksian welfare distributions in this case can again be expressed as closed-form transformations of choice-probabilities without any assumption on heterogeneity. Nonetheless, identi…cation of the tails of these distributions require observation of demand at prices where aggregate demand at an adjusted income level reaches zero. At high income levels, such prices may not be observed since producers have no incentive to set a price where aggregate demand for high income people is zero. In that case, a lower bound on the welfare distributions can be obtained using the value of aggregate demand at the highest observed price.5 A corollary of our main result is that the heuristic empirical practice of calculating welfare e¤ects of new goods by integrating choice probabilities from the current price to in…nity yields the average CV, if prices of substitutes remain unchanged; however, calculation of the average EV and/or allowing for prices of substitutes to change entail di¤erent expressions. These results can be used mutatis mutandis for analyzing welfare e¤ects of eliminating an existing alternative simply by reversing the labels of EV and CV. Section 4 of this paper considers multinomial choice with non-exclusive alternatives. For example, if a cable-TV company o¤ering a sports package and a movie package raises product-prices, then the resulting welfare calculations requires new results because the packages are not exclusive alternatives for a potential consumer. We show that for multiple non-exclusive discrete goods, the welfare distribution resulting from a single price-change can be directly expressed as a closed-form functional of choice-probabilities but that for multiple price-changes cannot, unlike the case of 3 See Lewbel (2001) for an illuminating discussion of demand and welfare analysis with a representative consumer vis-a-vis allowing for preference heterogeneity. 4 Welfare calculations in the two scenarios described above correspond to situations where the price vectors before and after are given. The process through which the …nal price vector following the relevant change is calculated requires modelling the supply side. See, Hausman and Leonard, 2002, page 256-8, for an illustration of the methodology, which is now well-known. See also Berry and Pakes, 1993 for an overview of this topic. 5 For common parametric models, like mixed logit, these issues are assumed away, and welfare distributions are point-identi…ed simply via functional form assumptions. 4 multinomial choice among exclusive options. Accordingly, we also show that one can construct non-trivial bounds on these distributions. Taken together, these three results provide new insights into welfare analysis in discrete choice situations, as well as providing practitioners with useful empirical tools for evaluating policy changes in real-life settings. In section 5, we discuss practical implementation of these results, and state a practically useful result, viz. that under income endogeneity and corresponding to a price increase, the EV but not the CV can be used for legitimate welfare analysis even in the absence of instruments or control functions. A concrete empirical illustration of our methods is provided in Section 6. Using household survey data from India on teenage children’s choice between school, wage-work and staying at home, we calculate welfare impacts for two hypothetical policy changes - viz., (i) a tuition subsidy accompanied by a rise in wage, and (ii) elimination of the work option via child-labor regulations. In this application, income e¤ects are shown to be large, and their e¤ect on welfare calculations important in both magnitude and in respect of their policy implications. Proofs of all theoretical results and details of numerical calculations for the empirical illustration are provided in the appendix. 2 Multiple Price-Changes Set-up and notation: Consider a multinomial choice situation where alternatives are indexed by j = 1; :::; J; individual income is denoted by Y , and price of alternative j by Pj . Individual utility from choosing alternative j is Uj (Y of unknown dimension; Pj ; ), j = 1; :::; J, where denotes individual heterogeneity is distributed in the population with unknown marginal CDF F ( ). De…ne the structural choice probability for alternative j evaluated at price vector p and income y, denoted fqj (p; y)g, j = 1; :::; J, as qj (p; y) = Z 1 Uj (y pj ; r) > max fUk (y k6=j pk ; r)g dF (r) . (1) In words, if we randomly sample individuals from the population, and o¤er the price vector p and income y to each sampled individual, then a fraction qj (p; y) will choose alternative j, in expectation. Assumption 1 Assume that for each and for each j = 1; :::; J, the utility function Uj ( ; ) is strictly increasing. 5 Assumption 1 simply says that corresponding to making any choice, every consumer is strictly better o¤ if they have more numeraire left in their pocket. Note that this assumption leaves the dimension of heterogeneity completely unspeci…ed, and says nothing about how utility changes with unobserved heterogeneity. Consider arbitrary price changes from p0 (p10 ; p20 ; :::; pJ0 ) to p1 (p11 ; p21 ; :::; pJ1 ). Assume without loss of generality that pJ1 pJ0 pJ 1;1 pJ 1;0 ::: p11 p10 . (2) That is, label the alternative with the smallest price change (the smallest could be a negative number, representing a fall in price) alternative 1, the next smallest as alternative 2 and so on. De…ne EV as the solution S to the equation max fU1 (y = max fU1 (y p11 ; ) ; U2 (y S p21 ; ) ; :::; UJ (y p10 ; ) ; U2 (y p20 pJ1 ; )g S; ) ; :::; UJ (y pJ0 S; )g . (3) Similarly, de…ne CV as the solution S to the equation max fU1 (y + S = max fU1 (y p11 ; ) ; U2 (y + S p10 ; ) ; U2 (y p21 ; ) ; :::; UJ (y + S p20 ; ) ; :::; UJ (y pJ0 ; )g . pJ1 ; )g (4) Corresponding to the multiple price-changes, one can attempt to de…ne the change in average consumer surplus via the line-integral CS (L) = Z X J qj (p; y) dpj , (5) L j=1 where L denotes a path from p0 to p1 . The negative sign stems from the fact that rise in price leads to a loss in consumer surplus. In the set-up described above, we …rst show that for arbitrary price changes, the EV and CV are well-de…ned under assumption 1, but the Marshallian Consumer Surplus is not. We then establish the …rst key result of the paper, viz., that the marginal distributions of individual-level EV and CV are point-identi…ed, and closed-form expressions for their CDFs can be obtained as functionals of the structural choice probabilities. 6 It is well known that for continuous choice with multiple prices changing simultaneously, the Marshallian Consumer Surplus is generically unde…ned in the sense that the corresponding lineintegral is path-dependent, but the Hicksian CV and EV continue to remain well-de…ned (c.f., Tirole, 1988, page 11). Our …rst result establishes that the same conclusion is also true for discrete choice. Proposition 1 Consider the multinomial choice set-up with J alternatives. Consider a price change from p0 (p10 ; p20 ; :::; pJ0 ) to p1 (p11 ; p21 ; :::; pJ1 ). Under assumption 1, the individ- ual compensating and equivalent variations are uniquely de…ned. (Proof in Appendix) Marshallian Consumer Surplus: It can be shown that for discrete choice with multiple price changes, the integral (5) is path-dependent. In the appendix, we demonstrate this for the case with 3 alternatives (J = 3). Thus it is no longer true that the average CS is identical to the average EV, as found in DB15 for the case of a ceteris paribus price increase of a single alternative. 2.1 Welfare Distributions for Multiple Price-Changes DB15 showed that in the above setting, when the price of a single alternative changes ceteris paribus, the resulting CV and EV distributions can be expressed as closed-form functionals of choice-probabilities. We begin by noting that one cannot iterate this single price-change result to obtain the welfare distribution for simultaneous changes in the prices of multiple alternatives. To see this, consider a choice among three alternatives (J = 3), and suppose that price of alternative 1 changes from p10 to p11 , and that of 2 from p20 to p21 , and price of 3 is unchanged at p3 . Suppose we try to calculate the overall CV, starting with, say, price change of alternative 1, followed by 2 (the order in which we do this does not matter, by path independence) and applying theorem 2 of DB15 at each stage. Suppose the CVs corresponding to the two price changes are denoted by S1 and S2 respectively. Then, by de…nition, max fU1 (y p10 ; ) ; U2 (y p20 ; ) ; U3 (y p3 ; )g = max fU1 (y + S1 p11 ; ) ; U2 (y + S1 p20 ; ) ; U3 (y + S1 1 0 0 8 > > > > p11 ; A ; U2 @y + S1 + S2 U1 @y + S1 + S2 > > | {z } | {z } < S =S, overall CV 0 1 = max > > > > > U @y + S1 + S2 p3 ; A . > | {z } : 3 S 7 p3 ; )g 1 9 > > > p21 ; A ; > > > = > > > > > > ; Then using theorem 2 of DB15, we can get the marginal distribution of S1 but we cannot get the marginal distribution of S1 +S2 because the price of both alternative 1 and 2 have changed between lines 1 and 3 of the previous display and so theorem 2 of DB15 does not apply. Secondly, because we cannot calculate the value of the CV S1 for an individual (we can only calculate its distribution across all individuals), we cannot apply theorem 2 of DB15 to calculate the marginal distribution of S2 , since the income at which we could potentially apply the theorem depends on S1 which is unknown, unlike y which is a …xed known constant. Thus, a new result is required for welfare analysis corresponding to simultaneous changes in multiple prices, and it is given by the following theorem. Theorem 1 Consider the multinomial choice set-up with J exclusive alternatives. Consider a price change from p0 (p10 ; p20 ; :::; pJ0 ) to p1 pJ1 pJ0 pJ (p11 ; p21 ; :::; pJ1 ). Assume without loss of generality that 1;1 pJ 1;0 ::: p11 p10 . Under assumption 1, the marginal distribution of the individual EV evaluated at income y is given by Pr (EV a) 8 > 0 if a < p11 p10 , > > 0 1 > > > > p ; :::; p ; > P 11 j1 > j @ A if pj1 pj0 a < pj+1;1 pj+1;0 , a 0, 1 > > k=1 qk > < pj+1;0 + a; :::; pJ0 + a; y 0 1 = > > p a; :::; p a; P 11 j1 > j > @ A if pj0 pj1 a < pj+1;0 pj+1;1 , a < 0, 1 j > > k=1 qk > > p ; :::; p ; y a > j+1;0 J0 > > > : 1 if a pJ1 pJ0 , j J 1, (6) J 1, while that of the individual CV evaluated at income y is given by Pr (CV a) 8 > 0 if a < p11 p10 , > > 0 1 > > > > p11 ; :::; pj1 ; > Pj > @ A if pj1 pj0 a < pj+1;1 pj+1;0 , a 0, 1 > > k=1 qk > < pj+1;0 + a; :::; pJ0 + a; y + a 1 0 = > > p a; :::; p a; P 11 j1 > j > A if pj0 pj1 a < pj+1;0 pj+1;1 , a < 0, 1 j < J @ > > k=1 qk > > p ; :::; p ; y > j+1;0 J0 > > > : 1 if a pJ1 pJ0 , 8 j J 1, (7) 1, where qk s are de…ned above in equation (1). (The separate entries for a 0 and a < 0 in each line of (6) and (7) arise from accommodating rise and fall of prices, respectively). The distributional results cover positive, zero and negative price-changes. For example, in the 3-alternative case, suppose alternative 2 is the outside option with price p21 = p20 = 0, and p11 p10 < 0 < p31 p30 , then (7) becomes 8 > 0 if a < p11 p10 , > > > > > < q1 (p11 a; 0; p30 ; y) if p11 p10 a < 0, a) = > > q1 (p11 ; 0; p30 + a; y + a) + q2 (p11 ; 0; p30 + a; y + a) if 0 > > > > : 1 if a p p30 . 31 Pr (CV a < p31 p30 , Finally, note that in the above results, the CDFs are non-decreasing because of assumption 1. For instance, for the CV, we require q1 p11 ; p20 + a0 ; p30 + a0 ; y + a0 q1 (p11 ; p20 + a; p30 + a; y + a) whenever p11 p10 a < a0 . But this is true because the LHS is the probability of the event U1 (y + a p11 ; ) U1 y + a0 p11 ; () U1 y + a0 p11 ; ) (8) max fU2 (y p20 ; ) ; U3 (y p30 ; )g max fU2 (y p20 ; ) ; U3 (y p30 ; )g , since a0 > a, 8 9 < U2 (y + a0 (p20 + a0 ) ; ) ; = max , : U (y + a0 (p + a0 ) ; ) ; 3 30 whose probability is the RHS of the previous display. Indeed, inequality (8) can be interpreted as a Slutsky/Revealed Preference inequality for discrete choice. Remark 1 Note that the above results hold no matter whether price and income are endogenous or not. Endogeneity a¤ ects how the choice probabilities will be estimated, not the relationship between welfare distributions and the structural choice probabilities which is what the above results establish. See Section 5 below for a more detailed discussion of welfare estimation under income-endogeneity.6 6 It is also implicit throughout (as in DB15) that the price changes do not alter the population of interest, e.g. a tuition subsidy in a district might attract outsiders with a strong preference for education to migrate in, altering the distribution of preferences relative to the staus-quo. In other words, the price changes considered here are assumed to be modest enough to have no impact on the distribution of preference heterogenity. 9 3 Two Extensions 3.1 Elimination of an Alternative Consider a setting of multinomial choice among exclusive alternatives f1; :::; J + 1g. Suppose the alternative J + 1 is eliminated subsequently, which can potentially a¤ect consumer welfare by both restricting the choice set and also by a¤ecting the prices of other alternatives. Assume that we have data on a cross-section of individual choices in the pre-elimination situation. We wish to calculate the distribution of Hicksian welfare e¤ects that would result from eliminating the J + 1th alternative. Previous analyses of the problem, notably Hausman, 2003, had provided convenient, o¤-the-shelf formulae for estimation of welfare e¤ects for a "representative" consumer, ignoring the impact of unobserved heterogeneity in preferences. Incorporation of preference heterogeneity and developing the associated welfare analysis reveals interesting di¤erences between CV and EV based formulae and identi…ability of their distribution, as will be shown below. Toward that end, consider an individual with initial income y and unobserved heterogeneity whose utility from consuming alternative j at price pj is given by Uj (y pj ; ). The problem is to …nd the distribution of welfare arising from elimination of alternative J + 1 as varies across indi- viduals. Suppose from an initial price vector (p11 ; :::; pJ1 ; pJ+1 ), the eventual price vector becomes (p10 ; :::; pJ0 ). De…ne the CV as the income compensation in the …nal situation necessary to attain the initial utility. Formally, CV is the solution S to the equation max fU1 (y p11 ; ) ; :::; UJ (y = max fU1 (y + S pJ1 ; ) ; UJ+1 (y p10 ; ) ; :::; UJ (y + S pJ+1 ; )g pJ0 ; )g . Analogously, the EV is de…ned as the income reduction in the initial situation necessary to attain the eventual utility, i.e., max fU1 (y = max fU1 (y S p11 ; ) ; :::; UJ (y p10 ; ) ; :::; UJ (y S pJ1 ; ) ; UJ+1 (y S pJ+1 ; )g pJ0 ; )g . De…ne the initial (i.e., pre-elimination) choice probabilities, for alternatives k = 1; :::; J + 1, as qk (p1 ; :::; pJ+1 ; y) = Pr Uk (y As before, assume WLOG that pJ;0 pJ;1 pk ; ) ::: max j2f1;:::;J+1gnfkg p10 Uj (y pj ; ) . (9) p11 . We now state the main result describing the marginal distribution of EV and CV de…ned above. The proof appears in the appendix. 10 Theorem 2 Assume that for each j = 1; :::; J + 1, Uj ( ; ) is strictly increasing. Let qk ( ; :::; ; y) be as de…ned in (9). Then Pr (CV a) 8 > > 0 if a < p10 p11 , > > 0 1 > > > > p ; :::; pj0 ; > > B 10 C > > Pj B C > > q B pj+1;1 + a; :::; pJ1 + a; pJ+1 + a; C , if pj0 pj1 a < pj+1;0 pj+1;1 , j = 1; ::; J 1, a > k=1 k > @ A > > > < y+a 0 1 = > > p a; :::; p a; P 10 j0 > j > @ A , if pj0 pj1 a < pj+1;0 pj+1;1 , 1 j < J 1, a < 0, > > k=1 qk > > pj+1;1 ; :::; pJ1 ; pJ+1 ; y > > > > > > 1 qJ+1 (p10 ; :::; pJ0 ; pJ+1 + a; y + a) , if pJ0 pJ1 a, a 0, > > > > > : 1 qJ+1 (p10 a; :::; pJ0 a; pJ+1 ; y) , if pJ0 pJ1 a < 0. 0, On the other hand, = Pr (EV a) 8 > 0 if a < p10 p11 , > > 0 1 > > > > p10 ; :::; pj0 ; > Pj > @ A , if pj0 > > k=1 qk > > > pj+1;1 + a; :::; pJ1 + a; pJ+1 + a; y > < 0 1 pj1 a < pj+1;0 p10 a; :::; pj0 a; Pj > A , if pj0 pj1 a < pj+1;0 qk @ > > k=1 > > p ; :::; p ; p ; y a j+1;1 J1 J+1 > > > > > > 1 qJ+1 (p10 ; :::; pJ0 ; pJ+1 + a; y) , if pJ0 pJ1 a, a 0, > > > > : 1 qJ+1 (p10 a; :::; pJ0 a; pJ+1 ; y a) , if pJ0 pJ1 a < 0. As in the previous theorem, the pairs of results for a < 0 and a pj+1;1 , a pj+1;1 , 1 j<J 0, 1 1, a < 0, 0 correspond to which existing alternatives have become more and less expensive, respectively, following the elimination of the J +1th alternative. Also, note that in order to calculate the probabilities appearing in theorem 2, we need to observe adequate cross-sectional variation in the price of all J + 1 alternatives in the pre-elimination period. Corollary 1 If elimination of the alternative has no e¤ ect on prices of the other alternatives, then pj1 = pj0 for all j = 1; :::; J, and the above results simplify to 8 < 0 if a < 0, Pr (CV a) = : 1 q J+1 (p10 ; :::; pJ0 ; pJ+1 + a; y + a) if 0 11 (10) a. j<J 1, Pr (EV 8 < 0 if a < 0, a) = : 1 q J+1 (p10 ; :::; pJ0 ; pJ+1 + a; y) if 0 (11) a. These corollaries have clear intuitive interpretation. For example, consider the result (10). Recall that CV is de…ned as the solution S to max fU1 (y p10 ; ) ; :::; UJ (y = max fU1 (y + S pJ0 ; ) ; UJ+1 (y p10 ; ) ; :::; UJ (y + S pJ+1 ; )g pJ0 ; )g . Observe that any individual who does not consume the J + 1th product in the pre-change situation su¤ers no welfare change relative to the …nal situation when the product becomes unavailable. Therefore, we only need to consider those individuals who consume the product in the pre-change situation. The CV is positive and less than a for those individuals who, with the additional income a, would enjoy a higher utility than what they are currently getting from consuming the J + 1th alternative. Since buying this alternative at price pJ+1 with income y yields the same utility as buying it at price pJ+1 + a with income y + a, the probability of CV being less than a equals the probability of not buying the alternative at price pJ+1 + a when income is y + a. Corollary 2 If eliminating the J+1th alternative has no e¤ ect on prices of the other alternatives, then the average values of individual welfare change are given by Z 1 E (CV ) = qJ+1 (p10 ; :::; pJ0 ; r; y + r pJ+1 Z 1 E(EV ) = qJ+1 (p10 ; :::; pJ0 ; r; y) dr, pJ+1 ) dr, (12) (13) pJ+1 using change of variable r = pJ+1 + a. The expression for expected EV is commonly used as a "heuristic" measure of the welfare e¤ect of introducing a new product. Thus it follows from the above discussion that if elimination of the alternative entails no price change for the other alternatives, then the commonly used expression is identical to the mean EV, but ceases to be so if one is interested in the mean CV, or prices of substitutes also change. In order to calculate the above expressions nonparametrically, a researcher needs to observe demand up to the price where choice probability becomes zero. Typically, in a dataset, one is unlikely to observe such prices, since producers have no incentive to raise prices where revenue is zero. But then one can obtain a lower bound for E (CV ) by integrating up to the highest price 12 observed in the dataset where demand from consumers with income y is non-zero. This is in contrast to the case of price changes for multiple alternatives, reported in equation (6) above, where welfare distributions are nonparametrically identi…ed as long as the hypothetical price-changes are within the range of the observed price data. Of course, for parametric choice probabilities, e.g. random coe¢ cient logit, expressions like (13) are identi…ed directly from functional form. 3.2 Non-exclusive Discrete Choice We now change the set-up described above to allow for non-exclusive choice. Accordingly, assume that there are two binary choices which are non-exclusive among themselves. For example, suppose choice 1 for a household is whether to subscribe to a sports package o¤ered by a cable TV network which costs P1 and choice 2 is whether to subscribe to a movie package which costs P2 . A household then has four exclusive options –f1g ; f2g ; f1; 2g ; f0g (where f0g denotes choosing none of the two packages) with respective utilities U1 (Y P1 ; ), U2 (Y P2 ; ), U12 (Y P1 P2 ; ) and U0 (Y; ), respectively. Consider the CV corresponding to a rise in the price of the sports package from p10 to p11 with the price of the movie package …xed at p2 . The CV evaluated at income Y = y is the solution S to the equation 8 9 < U0 (y + S; ) ; U1 (y + S p11 ; ) ; = max : U (y + S p ; ) ; U (y + S p p2 ; ) ; 2 2 12 11 8 9 < U0 (y; ) ; U1 (y p10 ; ) ; = = max . : U (y p ; ) ; U (y p p2 ; ) ; 2 2 12 10 (14) The marginal distribution of CV corresponding to this single price change is point-identi…ed in this case. The intuition for this result is as follows. Suppose the price P1 changes from p10 to p11 with the price P2 …xed at p2 . Now, group option f1g and f1; 2g together (call it group A) and options f0g and f2g together and call it group B. De…ne def " = (p2 ; ) VA (y def p1 ; ") = max fU1 (y def p1 ; ) ; U12 (y VB (y; ") = max fU0 (y; ) ; U2 (y p1 p2 ; )g , p2 ; )g . Correspondingly (14) becomes max fVA (y + S p11 ; ") ; VB (y + S; ")g = max fVA (y 13 p10 ; ") ; VB (y; ")g . (15) If the U functions are strictly increasing and continuous in the …rst argument for each , then so are VA ( ; ") and VB ( ; ") for each ". Now we can apply theorem 1 of DB15 for binary choice to this problem and get the marginal distribution of the compensating variation. For example, for 0 a < p11 p10 , theorem 1 of DB15 gives Pr (CV a) = Pr [VB (y + a; ") VA (y + a (p10 + a) ; ")] 2 3 max fU0 (y + a; ) ; U2 (y + a p2 ; )g 6 8 9 7 6 < U1 (y + a (p10 + a) ; ) ; = 7 = Pr 6 7 5 4 max : U (y + a (p + a) p ; ) ; 12 10 2 = q0 (p10 + a; p2 ; y + a) + q2 (p10 + a; p2 ; y + a) . Thus 8 > > 0 if a < 0, > < a) = q0 (p10 + a; p2 ; y + a) + q2 (p10 + a; p2 ; y + a) ; if 0 > > > : 1 if a p p . Pr (CV 11 a < p11 p10 , 10 The key fact enabling us to write (14) as the binary choice CV (15) is that P2 is being held …xed at p2 ; if P2 also varied across individuals, then the distribution of " would vary beyond the variation of and the binary formulation would no longer be applicable. Indeed, for multiple price changes in the non-exclusive alternatives case, welfare distributions can no longer be written in terms of choice probabilities. To see this, consider a simultaneous rise in P1 and P2 from (p10 ; p20 ) to (p11 ; p21 ). Assume that 0 < p11 p10 < p21 p20 . Then the CV is de…ned via 8 9 < = U0 (y + S; ) ; U1 (y + S p11 ; ) ; max : U (y + S p ; ) ; U (y + S p p21 ; ) ; 2 21 12 11 = max fU0 (y; ) ; U1 (y p10 ; ) ; U2 (y p20 ; ) ; U12 (y p10 p20 ; )g . (16) Now, Pr [S = p11 2 6 6 = Pr 6U1 (y 4 2 = Pr 4U1 (y p10 ] p10 ; ) p10 ; ) 8 > > U (y; ) ; U2 (y p20 ; ) ; U12 (y p10 p20 ; ) ; > < 0 max U0 (y + p11 p10 ; ) ; U2 (y + p11 p10 p21 ; ) ; > > > : U (y p p21 ; ) 12 10 8 93 < U2 (y p20 ; ) ; U12 (y p10 p20 ; ) ; = 5. max : U (y + p ; p ; ) 0 11 14 10 93 > > > =7 7 7 > 5 > > ; (17) This equals the probability of an individual picking the bundle (1; y among the bundles (1; y p10 ), (2; y p20 ), (12; y p10 p10 ) when facing choice p20 ) and (0; y + p11 p10 ). In order that we can identify this probability, there has to be a mass of individuals who face choice over these four bundles. This requires the four bundles to lie in the same budget set; in other words, we need a combination of alternative speci…c prices p1 ; p2 and income y such that the four bundles are respectively identical to (1; y p1 ), (2; y p2 ), (f1; 2g ; y p1 = y + p11 p10 y y y p1 = y p10 y p2 = y p20 p1 p2 = y p10 p2 ) and (0; y ), i.e., p20 . But adding the …rst and fourth equation in the previous display we get 2y p1 p2 = 2y + p11 2p10 p20 (18) whereas adding the 2nd and 3rd equation gives 2y p1 p2 = 2y p10 p20 , (19) Now, equations (18) and (19) can hold simultaneously if and only if p11 = p10 which reduces to the single price change case. Therefore, if both price changes are positive, we cannot observe anyone in the data who might face the choice between bundles (1; y (0; y + p11 p10 ), (2; y p20 ), (12; y p10 p20 ) and p10 ); therefore, (17) cannot be expressed as a choice-probability. But for any generic probability distribution on , the probability (17) will be positive. One can nonetheless bound the probability (17) using revealed preference arguments. In particular, let 8 < p ;p ;y : y 1 2 A = : y p1 p 2 8 < p ;p ;y : y 1 2 B = : y p1 p2 p1 y p1 y y p10 , y p2 p10 p20 , y y + p11 y p10 , y p2 p10 p20 , y y + p11 15 y y 9 p20 ; = , p10 ; 9 p20 ; = . p ; 10 Then an upper bound can be obtained as 8 2 < U2 (y p20 ; Pr 4U1 (y p10 ; ) max : U (y + p 0 11 8 2 < inf Pr 4U1 (y p1 ; ) max : (p1 ; p2 ; y )2A = ) ; U12 (y p10 p10 ; ) U2 (y 93 p20 ; ) ; = 5 ; p2 ; ) ; U12 (y p1 U0 (y ; ) inf q1 (p1 ; p2 ; y ) . (p1 ; p2 ; y )2A These bounds arise from the fact that if, e.g., y p1 by assumption 1, so that the probability of U1 (y y p10 , then U1 (y p1 ; ) exceeding that same number. Similarly, y that U2 (y U2 (y p2 ; ) so that the probability of U2 (y number is smaller than that of U2 (y U1 (y p10 ; ), p10 ; ) exceeding a speci…c number is smaller than that of U1 (y p20 ; ) p1 ; ) 93 p2 ; ) ; = 5 ; p2 y p20 implies p20 ; ) being exceeded by a p2 ; ) being exceeded by it, etc. By a similar logic, a lower bound on (17) is given by 8 93 2 < U2 (y p20 ; ) ; U12 (y p10 p20 ; ) ; = 5 Pr 4U1 (y p10 ; ) max : U (y + p ; p10 ; ) 0 11 8 93 2 < U2 (y p2 ; ) ; U12 (y p1 p 2 ; ) ; = 5 Pr 4U1 (y p1 ; ) max sup : U (y ; ) ; 0 (p1 ; p2 ; y )2B q1 (p1 ; p2 ; y ) . sup (p1 ; p2 ; y )2B These upper and lower bounds cannot coincide, implying that (17) cannot be point-identi…ed unless p11 = p10 . To see this, note that by assumption 1, the only way for the lower and upper bound to coincide would be that the inf and sup are attained at the point of intersection of A and B where each weak inequality holds as an equality (otherwise the distribution of can be chosen such that inf in the interior of A is strictly larger than the sup in the interior of B). But we saw above that this is impossible unless p11 = p10 . The conclusion from the above discussion is that in a discrete choice setting when alternatives are non-exclusive, the distributions of individual welfare change for a single ceteris paribus price rise can be expressed directly as a choice probability, but for simultaneous increase in several alternatives’ prices, they can only be bounded. This is in contrast to the multinomial case with exclusive alternatives where welfare distributions are identi…ed for both single (shown in DB15) and multiple (shown above) price changes. An intuition for the result is that in the non-exclusive 16 case, there is a systematic relationship between the prices of the di¤erent (composite) options, so that in a dataset we can never observe independent price variations across the various (composite) options, no matter how many price combinations are observed across the individual alternatives. Remark 2 Note that this distinction between the exclusive and non-exclusive options would not be apparent if one started with a parametric model of choice; indeed, welfare distributions would appear to be identi…able in both cases for arbitrary price-changes. This identi…cation would, of course, arise solely from functional form assumptions. 4 Discussions on Implementation The identi…cation results reported in theorems 1 and 2, and the associated corollaries are fully nonparametric in that no functional form assumptions are required to derive them. When implementing these results in practical applications, one can therefore estimate conditional choice probabilities nonparametrically, e.g., using kernel or series regressions and use these estimates to calculate welfare distributions. For instance, recall the 3-alternative case, leading to (??), where we have, say, p11 p10 < 0 = p21 p20 < p31 p30 . For any random variable X with CDF F ( ) and …nite mean, the expectation satis…es Z 0 Z E (X) = F (x) dx + 1 (1 F (x)) dx; 0 thus the mean CV from (??) is given by Z 0 Z q1 (p11 a; p20 ; p30 ; y) da + p11 p10 1 p31 p30 q3 (p11 ; p20 ; p30 + a; y + a) da, (20) 0 where qj (p1 ; p2 ; p3 ; y) denotes the choice probability of alternative j when the price of the three alternatives are (p1 ; p2 ; p3 ) and income is y. If the dataset is of modest size, so that kernel regressions are imprecise, then one can alternatively use a parametric approximation to the choice probabilities. Numerical integration routines are now available in popular software packages like STATA and MATLAB and can be used to calculate the integrals of choice probabilities, in the same way that consumer surplus was traditionally calculated by earlier researchers. A speci…c numerical illustration based on household survey data from India is provided below in Section 5. 4.1 Endogeneity In some applications, observed income may be endogenous with respect to individual choice. A natural example is when omitted variables, such as unrecorded education level, can both determine 17 individual choice and be correlated with income. Under endogeneity, one can de…ne welfare distributions either unconditionally, or conditionally on income, analogous to the average treatment e¤ect and the average e¤ect of treatment on the treated, respectively, in the program evaluation literature. In either case, the observed choice probabilities would potentially di¤er from the structural choice probabilities which appear in our welfare distribution results, and control function type restrictions would typically be required for estimating them.7 In this context, an important and useful insight, not previously noted, is that for a price-rise, the distribution of the incomeconditioned EV is not a¤ected by income endogeneity, whereas that of the CV is (for a fall in price, the conclusion is reversed for CV and EV respectively). To see why, recall the three alternative case discussed above, and de…ne the conditional-on-income structural choice probabilities as Z qjc p1 ; p2 ; p3 ; y 0 ; y = 1 Uj y 0 pj ; max Uk y 0 pk ; dF ( jy) , k6=j where F ( jy) denotes the distribution of the unobserved heterogeneity current income is y. Now, for a real number a, satisfying p11 p10 for individuals whose a < p21 p20 , it is easy to see that similar to equations (??) and (??), the distributions of EV at a, evaluated at income y, conditional on current income being y, are given by Pr (EV ajInc = y) = q1c (p11 ; p20 + a; p30 + a; y; y) , while for CV it is given by Pr (CV ajInc = y) = q1c (p11 ; p20 + a; p30 + a; y + a; y) . Now, q1c (p11 ; p20 + a; p30 + a; y; y), by de…nition, is the fraction of individuals currently at income y who would choose alternative 1 at prices (p11 ; p20 + a; p30 + a), had their income been y. But this is directly observable in the data since the realized and hypothetical incomes are the same, and therefore, no corrections are required owing to endogeneity. However, q1c (p11 ; p20 + a; p30 + a; y + a; y) is the fraction of individuals currently at income y who would choose alternative 1 at prices (p11 ; p20 + a; p30 + a), had their income been y + a. This fraction is counterfactual and not directly estimable because the distribution of is likely to be di¤erent across people with income y + a relative to those with income y, due to endogeneity. To summarize, if the objective of welfare 7 When individual choice data are available, endogeneity of price is typically of lesser concern, because an individ- ual’s choice or her omitted characteristics are less likely to a¤ect the market price she faces. However, if there are unobserved choice attributes, then price endogeneity may be an important empirical concern. Blundell and Matzkin (2015) and Berry and Haile (2015) have made important recent advances for handling price endogeneity in demand analysis with general heterogeneity. 18 analysis is to calculate the EV distribution resulting from price rise for individuals whose realized income equals the hypothetical income, then endogeneity of income is irrelevant to the analysis. This suggests that if exogeneity of income is suspect and no obvious instrument or control function is available, then a researcher can still perform meaningful welfare analysis based on the EV distribution at current income. 5 An Illustration We now provide a simple empirical illustration of theorems 2 and 3 by estimating average welfare e¤ects resulting from two hypothetical policy changes in the context of children’s school attendance in India. This exercise is in the spirit of Hausman (1981)’s illustration of welfare calculations for wives’ labor supply in the United States. The data come from a large-scale household survey, conducted by the Indian National Sample Survey Organization in 2004-5. Our subsample of interest consists of approximately 25000 households from across India, which have a single child of secondary-school age (15-18 years) and belong to one of the two dominant religious groups –Hindu and Muslim. Each child faces three alternatives –stay at home (option 1), work for a wage (option 2), or attend school (option 3). We drop the two states of Kerala and Himachal Pradesh which typically score unusually high relative to the rest of India on human development indicators including children’s education. For income y, we use the monthly per capita household expenditure (mpce), as is standard in empirical studies on developing countries; we use the term "income" throughout the rest of this section to mean mpce. For potential price faced by a household related to schoolattendance, we use the median annual educational expense (divided by 12) per school-going child across all households in the same income stratum in the village or urban block where the household resides.8 The reason for using the median expense is that some households might spend excessively large amounts on educational resources, which does not represent the true "price" envisaged by a typical household making the choice between sending and not sending their child to school. The price of working is taken to be the negative of the average wage for children aged 15-18 in the village or urban block where the household resides. The price of staying at home is normalized to 8 An advantage of the Indian NSS is that it divides each village and urban block into two income strata (labelled "second stage stratum" in the survey documentation) and samples households from both strata. In forming our measure of potential price faced by a household, we use median school expenses across households in the same income strata and the same village as the reference household. In cases where no household from a village and a speci…c income stratum sends any child to school, we use the median educational expenditure in the entire village as the potential price. 19 zero. All money amounts are expressed as Indian rupees per month (Rs), with 1 Indian rupee=0.02 US dollars in 2004. Summary statistics are presented in Table 1. In table 2, we show semi-elasticities for price and income from a multinomial logit regression of school-attendance and working conditional on a set of household characteristics, using home-stay as the base outcome. Results are shown separately by gender, for a "typical" child, de…ned as Hindu, at age 15 and coming from a household with 5 members, adult literacy rate 60% and at the lowest decile of monthly expenditure, viz., y = Rs 1800. These semi-elasticities represent the increase in probability of choosing the respective outcome corresponding to a doubling of prices or income. Income e¤ects are seen to be highly signi…cant for every choice, for each activity and for both genders. This suggests that including income e¤ects can potentially make a signi…cant impact on welfare calculations. To illustrate the calculation of welfare e¤ects, we consider two hypothetical policy changes. The …rst is a simultaneous price change for schooling and working induced by a tuition subsidy and a potential general equilibrium increase in wages. The second is elimination of the work option with no induced change on school fees, as would result from enforcement of a child labor legislation. Details of all numerical calculations are shown in the Appendix. Policy Change 1: For the …rst policy simulation exercise, we consider a change from initial price of schooling of Rs 790 and initial wage Rs 25 to a …nal school-price of Rs 45 and wage Rs 256 with the price of staying at home (i.e., the outside option) constant at 0. These change in prices correspond to a 2 standard deviation fall from the ninth decile of school-fees and a 1.5 s.d. rise from the lowest decile for child wage. We calculate the average welfare gain at various hypothetical income values for these price changes for the typical household, de…ned above. Note that it is not a priori obvious which one among the tuition subsidy e¤ect, which bene…ts the comparatively richer whose children are more likely to attend schools, or the wage rise e¤ect, which bene…ts the comparatively poorer whose children work, will dominate, and so how welfare e¤ects will vary with income is an empirical question. To use our formuale for welfare change in this context, we need to estimate the choice probabilities for schooling and working at the speci…ed covariate values. We model these choice probabilities via cubic splines in prices and income (see appendix for details). This formulation should be interpreted as an approximation to the underlying nonparametric choice probabilities. The alternative to the above calculations is the Small and Rosen (1981) approach which ignores income-e¤ects, assumes Type 1 extreme valued errors and leads to average welfare calculations via the popular 20 "log-sum" formula, c.f., Train, 2009, page 52. Results: In Figure 1, we plot the estimated mean welfare change at various values of income for female children. Two interesting features of the CV and EV graphs are evident from the …gure, viz., (a) welfare e¤ects rise with income, because at relatively higher levels of income, a larger proportion of children attend schools and gain from the fall in tuition (i.e. the tuition e¤ect dominates), with the mean EV/CV reaching the entire tuition subsidy of Rs 745 at the highest income levels, and (b) the log-sum based estimates of welfare gains, which do not vary with income due to assumed lack of income e¤ects, also do not lie in between the EV and CV estimates for most of the income range, suggesting potentially serious under/over estimation of welfare e¤ects due to neglected income-e¤ects. To focus on mean welfare at a speci…c income, in Table 3, we present results on mean welfare e¤ects of the price changes at the lowest percentile of income, together with corresponding 95% con…dence intervals, by gender. We see that welfare gains for male teenagers is slightly higher than for females owing to higher incidence of school-attendance. Furthermore, for both male and female children, the Logsum based estimates are outside the 95% con…dence intervals for the average EV and CV. The last two columns in Table 3 present mean deadweight loss calculations, obtained by subtracting the mean welfare gains from the sum of the subsidy payments and the additional wage-payments of employers of child labourers. The entries show that deadweight losses are estimated to be much smaller – up to nearly 100 times – using the logsum method than the semiparametric estimates using our formulas above. This can also be inferred from Fig.2, where the estimated mean welfare e¤ects using the logsum formulae are much higher at low income levels. The reason why ignoring income e¤ects lead to over-estimation of mean welfare at low incomes and under-estimation at high incomes is intuitive. When income e¤ects are ignored, the estimated choice probability at a speci…c price vector is e¤ectively the true choice probability as a function of price and income averaged over the distribution of income. Since the good in question is normal, the income-averaged choice probability is higher than the true income-conditioned choice probability at lower incomes and lower than it at higher incomes. Given that subsidy-eligible individuals are typically poorer, over-estimating private welfare gains by ignoring income-e¤ects should therefore be a bigger concern in contexts like the one above. Policy Change 2: The second policy change we consider is a hypothetical child-labor regulation, mandating removal of the "work" option. For simplicity, we assume that there is no change in school tuition as a result of the legislation since the fraction of working teenagers is relatively small. 21 The corresponding welfare analysis can be performed using corollary 1 above. It is important to stress that here welfare loss refers to the loss in private utility from reduced choice, ignoring any external social bene…t of eliminating child labor. Accordingly, consider a child with demographic characteristics given above, family income y = Rs 1800 (lowest decile) and facing school-fees of Rs 146 per month who initially had the option to work at wage Rs111 per month (Rs 146 and Rs 111 being the respective medians in the data), and then the work option is eliminated. The question is what would be the average CV/EV resulting from this restriction on choice. Using corollary 1, these are given by integrating the adjusted probability of working E (EV ) = E(CV ) = Z 1 q2 ( 111 + a; 146; 1800 + a) da, Z0 1 q2 ( 111 + a; 146; 1800) da. 0 Ignoring income e¤ects would lead to the Marshallian Consumer Surplus which is given by E(CS) = Z 1 q^2 ( 111 + a; 146) da, 0 where q^2 (p2 ; p3 ) is the estimated probability of working at wage p2 and school fee p3 . Results: In Figure 2, we plot the mean welfare loss by gender against income, as measured via average EV, average CV and the average consumer surplus which ignores income e¤ects. Both the CV and EV indicate that the welfare e¤ect diminishes to zero rapidly as income rises; at higher income levels very few children work, and so eliminating the option to work has little impact on their welfare. In contrast, the average Marshallian consumer surplus does not vary with income due to assumed lack of income e¤ects; more importantly, its magnitude does not lie between the EV and CV estimates for most of the income range, suggesting potentially serious under/over estimation of welfare e¤ects. It is also apparent that male children, who engage in wage-work much more than females, su¤er a larger welfare loss from reduced choice. In table 3, we record the mean EV/CV at the …rst decile of income separately by gender; it is apparent that removal of the work option leads to a welfare loss for males equal to a sixth of average teenage wage, whereas that for females is about half that amount. 6 Summary and Conclusion In this paper, we have shown how to conduct empirical welfare analysis in multinomial choice settings, allowing for completely general consumer heterogeneity and income-e¤ects. The paper 22 considers three scenarios – (a) simultaneous change in prices of multiple alternatives, (b) the introduction of a new alternative or elimination of an existing alternative, possibly accompanied by price-changes of other alternatives; and (c) situations where choice-alternatives are non-exclusive. The key contributions are to show that (i) Hicksian welfare changes are well-de…ned under a mild monotonicity assumption on utilities in all these cases, (ii) in cases (a) and (b) the marginal distributions of CV and EV can be expressed as simple closed form functions of choice probabilities without making any assumption on the functional forms of utilities, preference heterogeneity or income e¤ects, and (iii) this last conclusion fails when alternatives are non-exclusive. Our results are empirically illustrated through a hypothetical policy simulation using school/work choice data for Indian teenagers. The illustration shows that in real empirical examples, income e¤ects can be substantial, and that mean welfare e¤ects vary systematically by income –a highly policy-relevant point that would be missed if one uses conventional "log-sum" type formulae popular in applied work. In particular, corresponding to elimination of the "work-for-wage" option, private welfare losses will be systematically over-estimated at relatively higher incomes and under-estimated at lower incomes if one erroneously assumes away income-e¤ects. These shortcomings can be easily corrected by the methods developed in the present paper. More broadly, our methods can be used in program evaluation studies to calculate the program’s value to the subjects themselves, i.e., compensated treatment e¤ects, and the resulting deadweight loss, without any assumption on the nature of preference heterogeneity and income e¤ects in the population. 23 Table 1: Summary Statistics, N=24,793 Variable At home Working In school School‐fees Local wage Monthly Expense Age Mean 0.19 0.20 0.61 168 123 4063 16.13 Std. Dev. 0.39 0.40 0.49 383.41 170 6030 1.70 Min 0 0 0 0 0 1 15 Max 1 1 1 12000 1800 80000 18 Hhd size 5.84 2.53 1 Female 0.45 0.49 0 36 1 Fraction literate 0.34 0.23 0 Muslim SCST 0.15 0.30 0.35 0.46 0 0 1 1 1 Notes: Summary statistics for children aged 15‐18 in India, in 2004. Data are from the Indian National Sample Survey, 61st round, Schedule 10 and includes households from 31 states with a single child between 15‐18 yrs. Table 2: Semi‐Elasticities (t‐stat) for MN Logit; Base outcome=Home‐stay Male, N=13,675 Covariate School‐fees Local wage Income Female, N=11,118 Work Attend School Work Attend School 0.015 (8.50) 0.004 (2.43) ‐0.071 (‐10.7) ‐0.02 (‐11.41) ‐0.006 (‐3.08) 0.11 (14.09) 0.006 (7.66) 0.002 (2.23) ‐0.03 (‐9.57) ‐0.04 (‐11.7) ‐0.01 (‐2.7) 0.19 (14.6) Notes: Increase in probability of choice due to doubling of prices or income, controlling for age, religion, household size and literacy, for a “typical” child, defined as Hindu, aged 15, coming from a household of size 5, literacy rate 60% and at lowest income decile, viz., Rs 1800. For example, the highlighted coefficient ‐0.04 says that a doubling of school fees from the average faced by a typical household reduces the probability of school attendance (vis‐à‐vis staying at home) of a girl by 4 percentage points. 24 600 650 700 750 Figure 1: Mean Welfare Gain in Rupees per Month for Girls from Tuition Subsidy and Simultaneous Wage Rise, by Income 0 2000 4000 6000 8000 10000 Income Mean_CV Logsum Mean_EV Figure 1: Mean welfare gain in monthly rupees per capita based on the Logsum approximation, and semiparametric CV and EV resulting from tuition subsidy of Rs 538 and simultaneous wage rise of Rs 235, for Hindu teenage girls at age 15, living in households with household size 5 and adult literacy rate 60%, where 1 Rupee=0.02 US $. Table 3: Mean Welfare Gain in Rupees per Month from Tuition Subsidy and Wage Rise by Gender Evaluated at First Percentile of Income Mean Mean Welfare Gain 95% CI Deadweight 95% CI (Rs/month) Loss (Rs/month) Male Log‐Sum 731 [729, 732] 0.19 [0.12 , 0.40] Semiparametric CV 681 [673, 689] 20.08 [17.36, 23.46] Semiparametric EV 688 [679, 696] 13.39 [9.98, 18.89] Log‐Sum 715 [711, 719] 0.47 [0.03, 1.17] Semiparametric CV 638 [616, 660] 34.45 [9.37, 33.08] Semiparametric EV 661 [643, 687] 26.62 [14.19, 36.33] Female Notes: Mean welfare gain in monthly rupees per capita evaluated at the first income percentile of Rs 1000, resulting from a tuition subsidy of Rs 745 and simultaneous wage rise of Rs 221, evaluated at age 15, for Hindu households of size 5 and adult literacy rate 60%, where 1 Rupee=0.02 US $. 25 Figure 2. Mean Private Welfare Loss in Rupees per Month from Elimination of Work Option, by Income 40 30 20 10 0 0 10 20 30 40 50 Female 50 Male 0 2000 4000 6000 Income Mean_CV 8000 10000 0 2000 Mean_EV 4000 6000 Income Mean_CV Mean_CS 8000 10000 Mean_EV Mean_CS Note: Mean welfare loss in Rupees per month per person from removal of wage‐work option. Graph for Hindu teenagers at age 15, living in households of size 5 and adult literacy rate 60%. Mean welfare is evaluated at median monthly tuition of Rs 200, where 1 Rupee=0.02 US $. Table 4: Mean Welfare Loss at Lowest Income Decile, by Gender, due to Elimination of Work Option for Teenagers Mean EV Mean CV (Rs/month) (Rs/month) Male Std Err Female Std Err 22.19 4.39 28.64 5.55 12.09 2.64 15.09 4.35 Mean private welfare loss in monthly rupees per capita resulting from eliminating work for wage option, evaluated at monthly per capita expenditure level of Rs 1800 (lowest decile) for Hindu children at age 15, living in households of size 5 and adult literacy rate 60%. Mean income for the population is about Rs 4000, and mean wage Rs 124 where 1 Rupee=0.02 US $. 26 References [29] Berry, S. and P. Haile (2014). Identi…cation in Di¤erentiated Products Markets Using Market Level Data, Econometrica, 82 (5), pp. 1749-1798. [29] Bhattacharya, D. (2009). Inferring Optimal Peer Assignment from Experimental Data. Journal of the American Statistical Association 104, pp. 486-500. [29] Bhattacharya, D. (2015). Nonparametric Welfare Analysis for Discrete Choice, Econometrica, 83(2), 617–649. [29] Blundell, R. and J. Powell (2003). Endogeneity in Nonparametric and Semiparametric Regression Models, in Advances in Economics and Econometrics, Cambridge University Press, Cambridge, U.K. [29] Dagsvik, J. and A. Karlstrom (2005). Compensating Variation and Hicksian Choice Probabilities in Random Utility Models that are Nonlinear in Income, Review of Economic Studies 72 (1), 57-76. [29] Domencich, T. and D. McFadden (1975). Urban Travel Demand - A Behavioral Analysis, North-Holland, Oxford. [29] Goolsbee, A. 1999. Evidence on the High-Income La¤er Curve from Six Decades of Tax Reform. Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 30(2), pages 1-64. [29] Goolsbee, A. and P. Klenow. Valuing Consumer Products By The Time Spend Using Them: An Application To The Internet. American Economic Review, vol. 96, 108-113. [29] Hausman, J. (1981). Exact Consumer’s Surplus and Deadweight Loss, The American Economic Review, Vol. 71, 4, 662-676. [29] Hausman, J. (1996). Valuation of new goods under perfect and imperfect competition, The Economics of New Goods. University of Chicago Press, 207-248. [29] Hausman, J. and T. Leonard (2002). The Competitive E¤ects of a New Product Introduction: A Case Study, The Journal of Industrial Economics, Vol. 50 (3), pages 237–263. [29] Hausman, J. (2003). Sources of Bias and Solutions to Bias in the Consumer Price Index, The Journal of Economic Perspectives, Vol. 17, No. 1, 23-44. 27 [29] Hausman, J. and Newey, W. (2015). Individual Heterogeneity and Average Welfare, CEMMAP working paper, 42/14, forthcoming, Econometrica. [29] Hendren, N. (2013). The Policy Elasticity. NBER Working Paper #19177 [29] Herriges, J and C. Kling (1999). Nonlinear Income E¤ects in Random Utility Models, The Review of Economics and Statistics, Vol. 81, No. 1, 62-72. [29] Hicks, J. (1946). Value and Capital, Clarendon Press, Oxford. [29] Ichimura, H., and C. Taber (2002). Semiparametric reduced-form estimation of tuition subsidies. American Economic Review (2002): 286-292. [29] Kaldor, N. (1939). Welfare propositions of economics and interpersonal comparisons of utility. The Economic Journal, pp. 549-552. [29] Kane, T. J. (2003). A quasi-experimental estimate of the impact of …nancial aid on collegegoing. No. w9703. National Bureau of Economic Research. [29] Kitamura, Y. and J. Stoye (2013). Nonparametric Analysis of Random Utility Models. Testing, mimeo. Cornell and Yale University. [29] Lewbel, A. (2001). Demand Systems with and without Errors, The American Economic Review, Vol. 91, No. 3. [29] Lewbel. A. and K. Pendakur (2015). Unobserved Preference Heterogeneity in Demand Using Generalized Random Coe¢ cients, mimeo. Boston College. [29] D. McFadden, K. Train (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics 15 (5), 447-470. [29] Newey, W. (1994). Kernel Estimation of Partial Means and a General Variance Estimator. Econometric Theory, 10, 233-253. [29] Petrin, A. (2002). Quantifying the Bene…ts of New Products: The Case of the Minivan, Journal of Political Economy, 110, 705-729. [29] Small, K. and Rosen, H. (1981). Applied Welfare Economics with Discrete Choice Models. Econometrica. Vol. 49, No. 1, 105-130. [29] Stiglitz, J (2000). Economics of the public sector. W. W. Norton. 28 [29] Train, K. (2009). Discrete Choice Methods with Simulation, Cambridge University Press, Cambridge, UK. [29] Willig, R., S. Salop, and F. Scherer (1991). Merger analysis, industrial organization theory, and merger guidelines. Brookings Papers on Economic Activity. Microeconomics 281-332. 29 Appendix Path-dependence of Line Integral De…ning Marshallian Consumer Surplus for Discrete Choice Consider a setting with three mutually exclusive alternatives with initial prices p0 and …nal prices p1 (p10 ; p20 ; p30 ) (p11 ; p22 ; p33 ). Let y denote income and qj (p; y) denote the choice probability of alternative j when the price vector is p and income is y. Then the change in average consumer surplus arising from the price change from p0 to p1 can be de…ned via the line integral Z CS (L) = q1 (p; y) dp1 + q2 (p; y) dp2 + q3 (p; y) dp3 , L where L denotes a path L (t) from t = 0 to t = 1 such that L (0) L (1) (p10 ; p20 ; p30 ) and p0 (p11 ; p22 ; p33 ). Consider two di¤erent such paths p1 L1 (t) = (p10 + t (p11 L2 (t) = p10 + t2 (p11 Then CS (L1 ) = 2 Z 0 16 But = CS (L2 ) 2 Z 0 16 p10 ) ; p20 + t (p21 2t (p11 6 6 + (p21 4 + (p31 p20 ) ; p30 + t (p31 p10 ) ; p20 + t (p21 (p11 6 6 + (p21 4 + (p31 p10 ) p30 )) p20 ) ; p30 + t (p31 q1 (p0 + t (p1 p20 ) q2 (p0 + t (p1 p30 ) q3 (p0 + t (p1 p30 ) . p0 ) ; y) 3 p10 ) ; y 7 7 p0 ) ; y) 7 dt: 5 p0 ) ; y) p10 ) q1 p0 + t (p1 p0 ) + t2 t (p11 p20 ) q2 p0 + t (p1 p0 ) + t2 t (p11 p30 ) q3 p0 + t (p1 p0 ) + t2 t (p11 3 7 7 p10 ) ; y 7 dt; 5 p10 ) ; y which would in general di¤er from CS (L1 ). Thus the CS is not well-de…ned for simultaneous change in multiple prices. Note that if only p1 changes, then CS (L1 ) = (p11 CS (L2 ) = Z1 p10 ) Z1 [q1 (p10 + t (p11 p10 ) ; y)] dt; 0 2 (p11 p10 ) q1 p10 + t2 (p11 p10 ) ; y tdt 0 = (p11 p10 ) Z1 q1 (p10 + r (p11 0 = CS (L1 ) , 30 p10 ) ; y) dr, substituting t2 = r and we get back path independence. Thus the loss of path-independence arises only for multiple simultaneous price-changes. Proof of Proposition 1 Proof. Let p0 = (p10 ; p20 ; :::; pJ0 ) denote the initial price vector and p1 = (p11 ; p21 ; :::; pJ1 ) denote the …nal price-vector. Suppose that two numbers S and T with S 6= T solve the equation for the EV. Then, by de…nition, max fU1 (y = max fU1 (y p11 ; ) ; :::; UJ (y p10 pJ1 ; )g S; ) ; :::; UJ (y pJ0 S; )g , (21) (22) and max fU1 (y p11 ; ) ; :::; UJ (y pJ1 ; )g = max fU1 (y p10 T; ) ; :::; UJ (y pJ0 T; )g , max fU1 (y p10 T; ) ; :::; UJ (y pJ0 T; )g = max fU1 (y p10 S; ) ; :::; UJ (y pJ0 S; )g . so that (23) Since each Uj ( ), j = 1; :::J, is strictly increasing by assumption 2 if S > T , each term within fg on the RHS of (23) will be strictly smaller than the corresponding term on the RHS. Therefore, each term within fg on the RHS will be strictly smaller than the maximum value in the LHS. Since there are …nitely many terms on the RHS, the maximum will also be strictly smaller than the maximum on the LHS, a contradiction. Similarly, if S < T , then the RHS of (23) will be strictly larger than the LHS. Therefore in order for (23) to hold, we must have S = T . An exactly analogous argument works for the CV where the analogous equalities are max fU1 (y p11 + S; ) ; :::; UJ (y = max fU1 (y p10 ; ) ; :::; UJ (y = max fU1 (y p11 + T; ) ; :::; UJ (y Again, by assumption 1, this implies S = T . Proof of Theorem 1. First, consider EV. 31 pJ1 + S; )g pJ0 ; )g pJ1 + T; )g . Note that the EV is de…ned by max fU1 (y p11 ; ) ; U2 (y = max fU1 (y S p21 ; ) ; :::; UJ (y p10 ; ) ; U2 (y p20 pJ1 ; )g S; ) ; :::; UJ (y pJ0 S; )g . (24) The …rst step is to establish that 8 9 < = max fU1 (y p11 ; ) ; U2 (y p21 ; ) ; :::; UJ (y pJ1 ; )g EV a, . : max fU1 (y a p10 ; ) ; U2 (y p20 a; ) ; :::; UJ (y pJ0 a; )g ; Indeed, it is obvious from (24) that EV (25) a will imply the RHS inequality inside the f g in (25). To see the converse, assume that max fU1 (y p11 ; ) ; U2 (y max fU1 (y a p21 ; ) ; :::; UJ (y p10 ; ) ; U2 (y pJ1 ; )g p20 a; ) ; :::; UJ (y pJ0 a; )g . (26) Now (26) and equation (24) imply that max fU1 (y S p10 ; ) ; U2 (y p20 S; ) ; :::; UJ (y pJ0 S; )g max fU1 (y a p10 ; ) ; U2 (y p20 a; ) ; :::; UJ (y pJ0 a; )g , pJ0 S; )g pj0 i.e., for all j = 1; :::; J, we have that max fU1 (y S p10 ; ) ; :::; UJ (y Uj (y a; ) . If the maximum on the LHS of the previous display is the kth term, i.e., Uk (y S pk0 ; ), then choosing j = k on the RHS, we have that Uk (y S pk0 ; ) Uk (y whence, applying assumption 1, it follows that S pk0 a; ) , a. This establishes (25). Now we work from (25) to derive the CDF of the EV. To do this, …rst note that if pl1 pl0 pl+1;1 pl+1;0 , l = 1; :::; J U1 (y p11 ; ) ; U2 (y a< 1, then the inequality (26) can hold if and only if the LHS max is one of p21 ; ) ; :::; Ul (y pl1 ; ) but not Ul+1 (y pl+1;1 ; ) ; :::; UJ (y pJ1 ; ). To see this, suppose to the contrary that the max on the LHS of (26) is obtained for some k satisfying J k > l. Then (26) implies Uk (y pk1 ; ) max fUj (y j=1;:::J a pj0 ; )g Uk (y Given assumption 1, i.e., monotonicity of Uk ( ; ), it follows that pk1 a contradiction since k > l. 32 a pk0 ; ) . a + pk0 , i.e., a pk1 pk0 , Therefore, if a satis…es pl1 Pr (EV 2 6 6 6 = Pr 6 6 6 4 6 6 6 + Pr 6 6 6 4 +::: > > > : max max 2 6 6 = Pr 6 4 (1) +::: 2 6 6 + Pr 6 4 8 > > > < > > > : 8 > U1 (y > > > > > < > > > > > > : max 2 6 6 + Pr 6 4 pl+1;0 , then max 8 < U1 (y p21 ; ) ; :::; Ul (y l J Ul (y p11 ; ) ; U2 (y Ul+1 (y pl0 J0 pl1 ; ) p21 ; ) ; :::; Ul p10 ; ) ; U2 (y Ul (y a pl0 ; ) ; :::; UJ (y U1 (y : U (y l+1 pJ0 pl1 ; ) ; pl+1;0 ) ; :::; UJ (y U2 (y a p20 p11 ; ) p21 ; ) ; :::; Ul (y U1 (y 1;1 ; a; ) ; :::; a a pl pJ1 ; ) U1 (y U2 (y 1 (y pl+1;1 ; ) ; :::; UJ (y pJ0 p21 ; ) p11 ; ) ; :::; Ul (y pl1 ; ) ; pl+1;0 ) ; :::; UJ (y Ul (y pJ0 pl1 ; ) a; ) 3 9 7 = 7 7 5 a; ) ; 3 9 7 = 7 7 5 ; a; ) pj1 y a 9 7 7 7 ); > > > 7 > > > = 7 7 7 7 > 7 > > > 5 > > ; pj;0 , for all j = 1; :::; l, since a. The above probability equals q1 (p11 ; p21 ; :::; pl1 ; pl+1;0 + a; :::; pJ0 + a; y) +q2 (p11 ; p21 ; :::; pl1 ; pl+1;0 + a; :::; pJ0 + a; y) ::: + ql (p11 ; p21 ; :::; pl1 ; pl+1;0 + a; :::; pJ0 + a; y) , 33 3 3 9 7 = 7 7 U1 (y p11 ; ) ; :::; Ul 1 (y pl 1;1 ; ) ; 5 max : U (y a p ; pJ0 a; ) l+1 l+1;0 ; ) ; :::; UJ (y 8 < where the equality marked (1) uses the fact that y pl1 pl1 ; ) ; :::; UJ (y l0 : U (y l+1 8 < 3 p11 ; ) 9 7 7 7 > pJ1 ; ) ; > > = 7 7 7 U1 (y a p10 ; ) ; U2 (y p20 a; ) ; :::; > 5 > > ; Ul (y a pl0 ; ) ; :::; UJ (y pJ0 a; ) 3 U2 (y p21 ; ) 9 7 7 7 > U1 (y p11 ; ) ; :::; Ul (y pl1 ; ) ; :::; UJ (y pJ1 ; ) > > = 7 7 7 U1 (y a p10 ; ) ; U2 (y p20 a; ) ; :::; > 5 > > ; U (y a p ; ) ; :::; U (y p a; ) 8 > > U (y > < 2 2 6 6 6 6 6 + Pr 6 6 6 6 4 a < pl+1;1 a) max 2 pl0 if a 0 and equals = q1 (p11 a; p21 +q2 (p11 a; :::; pl1 a; p21 ::: + ql (p11 a; pl+1;0 ; :::; pJ0 ; y a; :::; pl1 a; p21 a) a; pl+1;0 ; :::; pJ0 ; y a; :::; pl1 a) a; pl+1;0 ; :::; pJ0 ; y a) , if a < 0. This is precisely expression (6). Next consider CV, de…ned as the solution S to the equation max fU1 (y + S = max fU1 (y p11 ; ) ; U2 (y + S p10 ; ) ; U2 (y The …rst step is to see that CV max fU1 (y p20 ; ) ; :::; UJ (y pJ1 ; )g pJ0 ; )g . (27) a is equivalent to p10 ; ) ; U2 (y max fU1 (y + a p21 ; ) ; :::; UJ (y + S p20 ; ) ; :::; UJ (y p11 ; ) ; U2 (y + a pJ0 ; )g p21 ; ) ; :::; UJ (y + a pJ1 ; )g . (28) The necessity is obvious by assumption 1. Su¢ ciency follows because (27) and (28) imply that max fU1 (y + S p11 ; ) ; U2 (y + S p21 ; ) ; :::; UJ (y + S max fU1 (y + a p11 ; ) ; U2 (y + a p21 ; ) ; :::; UJ (y + a Then there must be at least one j such that the RHS max is Uj (y + a must be larger than Uj (y + S Next, suppose a satis…es pj1 pj1 ; ), implying by assumption 1 that S pj0 therefore, by assumption 1, Uj+1 (y + a a < pj+1;1 pj+1;0 . Then y + a pj+1;1 ; ) < Uj+1 (y 34 pJ1 ; )g pJ1 ; )g . (29) pj1 ; ), which, by (29) a. pj+1;1 < y pj+1;0 , and pj+1;0 ; ). Therefore, the RHS max in (28) must be one of U1 (y + a Pr (CV 2 6 6 6 6 6 j X 6 = Pr 6 6 6 k=1 6 6 6 4 2 6 6 6 6 j X 6 6 = Pr 6 6 6 k=1 6 6 6 4 = j X k=1 2 6 6 Pr 6 4 p11 ; ) ; :::; Uj (y + a pj1 ; ). Accordingly, we have that a) Uk (y + a 8 > U1 (y p10 ; ) ; :::; Uk 1 (y pk 1;0 ; ) ; Uk (y pk0 ; ) ; > > > > > > > Uk+1 (y pk+1;0 ; ) :::; UJ (y pJ0 ; ) > < max U1 (y + a p11 ; ) ; :::; Uj 1 (y + a pj 1;1 ; ) ; > > > > > Uj (y + a pj1 ; ) ; Uj+1 (y + a pj+1;1 ; ) ; > > > > : :::UJ (y + a pJ1 ; ) 3 Uk (y + a pk1 ; ) 8 9 7 7 > > U1 (y p10 ; ) ; :::; Uk 1 (y pk 1;0 ; ) ; 7 > > > > 7 > > > > 7 > > > > > > Uk+1 (y pk+1;0 ; ) :::; UJ (y pJ0 ; ) > > 7 7 < = 7 max 7 U1 (y + a p11 ; ) ; :::; Uj 1 (y + a pj 1;1 ; ) ; 7 > > > > 7 > > > > > Uj (y + a pj1 ; ) ; Uj+1 (y + a pj+1;1 ; ) > > 7 > > > 5 > > > > : ; :::UJ (y + a pJ1 ; ) 3 Uk (y + a pk1 ; ) 8 9 7 < U1 (y + a p11 ; ) ; :::; Uj (y + a pj1 ; ) ; = 7 7. 5 max : U ; (y p ; ) ; :::U (y p ; ) j+1 j+1;0 J J0 The second equality follows from the fact that k assumption 1 that Uk (y + a a < pj+1;1 Uk (y pk1 ; ) Uk (y pk;0 ; ) > Uk (y + a 2 k=1 6 6 Pr 6 4 j and pj1 pj0 pk;0 ; ) Uk (y + a 8 < U1 (y + a max : U (y Uk (y + a pk1 ; ) for all k 3 pk1 ; ) 9 7 7 p11 ; ) ; :::; Uj (y + a pj1 ; ) ; = 7 5 ; pj+1;0 ; ) ; :::UJ (y pJ0 ; ) 8 P j < 0, k=1 qk (p11 ; :::; pj1 ; pj+1;0 + a; :::; pJ0 + a; y + a) if a P j : a; :::; pj1 a; pj+1;0 ; :::; pJ0 ; y) if a < 0. k=1 qk (p11 Proof of theorem 2: The EV solves max fU1 (y p11 ; ) ; :::; UJ (y = max fU1 (y + S > > > > > > > > > ; 7 7 7 7 7 7 7 7 7 7 7 7 5 a, and so we have by pk1 ; ) for all k > j. Finally, j+1 9 > > > > > > > > > = 3 pk0 ; ). The third equality follows from pj1 pj+1;0 , so by assumption 1, Uk (y j X pk1 ; ) pJ1 ; ) ; UJ+1 (y p10 ; ) ; :::; UJ (y + S 35 pJ0 ; )g . pJ+1 ; )g pj0 j and Proof. Now, by the same logic as the one leading to equation (25) in the previous proof, Pr [S 2 = Pr 4 a] max fU1 (y p11 ; ) ; :::; UJ (y max fU1 (y + a pJ1 ; ) ; UJ+1 (y p10 ; ) ; :::; UJ (y + a pJ+1 ; )g pJ0 ; )g Recall the assumption that (WLOG) pJ0 Therefore, if a < p10 pJ1 pJ 1;0 pJ 1;1 ::: p10 p11 , then for each j = 1; :::; J, the Uj (y strictly larger than the corresponding Uj (y + a pj1 a < pj+1;0 5. (30) p11 . pj1 ; ) on the LHS of (30) will be pj0 ; ), contradicting that the RHS max exceeds the LHS max. Thus there can be no probability mass below p10 pj0 3 p11 . If for some j 2 f1; :::J 1g, pj+1;1 , then each of the …rst j terms on the RHS of (30) is at least as large as the corresponding term on the LHS, while the (j + 1)th term onwards on the RHS are smaller than the corresponding terms on the RHS. This means that one of these …rst j terms on the RHS must be the maximum and it must also exceed the j + 1th term onwards on the RHS. Thus, for pj0 pj1 a < pj+1;0 2 pj+1;1 , the probability that S a equals 8 93 > > > maxk0 2f1;:::;jgnk fUk0 (y + a pk0 0 ; )g ; > > > j 6 < =7 X 6 7 Pr 6Uk (y + a pk0 ; ) max 7 maxl j+1 fUl (y pl1 ; )g ; > > 4 5 > > k=1 > > : ; U (y p ; ) J+1 = j X J+1 qk (p10 ; :::; pj0 ; pj+1;1 + a; :::; pJ1 + a; pJ+1 + a; y + a) . k=1 When a pJ0 pJ1 , then (30) reduces to 2 3 max fU1 (y p11 ; ) ; :::; UJ (y pJ1 ; ) ; UJ+1 (y pJ+1 ; )g 5 Pr 4 max fU1 (y + a p10 ; ) ; :::; UJ (y + a pJ0 ; )g 2 3 max fUJ (y pJ1 ; ) ; UJ+1 (y pJ+1 ; )g 5 = Pr 4 max fU1 (y + a p10 ; ) ; :::; UJ (y + a pJ0 ; )g 2 3 max fUJ (y pJ1 ; ) ; UJ+1 (y pJ+1 ; )g 5 = 1 Pr 4 > max fU1 (y + a p10 ; ) ; :::; UJ (y + a pJ0 ; )g 3 2 UJ+1 (y pJ+1 ; ) (1) 5 = 1 Pr 4 > max fU1 (y + a p10 ; ) ; :::; UJ (y + a pJ0 ; )g 8 < 1 qJ+1 (p10 ; :::; pJ0 ; pJ+1 + a; y + a) if pJ0 pJ1 a = . : 1, otherwise 36 (1) Equality = follows from the fact that if a UJ (y + a pJ0 pJ1 , then UJ (y pJ1 ; ) must be no more than pJ0 ; ). The proof of CV is very similar and is omitted for the sake of brevity. Numerical Calculations for Empirical Illustration Policy change 1: Let alternatives 1, 2, 3 denote staying at home, working and going to school, respectively. We wish to compute welfare distributions at income 1800 resulting from a simultaneous tuition subsidy of Rs 745 and wage rise of 20, viz., from initial school-price of 790 and initial wage 25 to a …nal school-price of 45 and wage 256. To do this, …rst consider the reverse price change p0 = (0; 256; 45), p1 = (0; 25; 790), and compute EV and CV distributions for this change using formulae given by equations (??) and (??). Pr (EV a) = E (EV ) = 8 > > > > > > < 0, if a < 0, q1 ( 256 + a; 45 + a; 1800) if 0 > > 1 > > > > : Z Pr (CV a) = E (CV ) = 0 745, f1 q1 ( 256 + a; 45 + a; 1800)g da + Z 745 q3 ( 25; 45 + a; 1800) da, (31) 221 0 if a < 0, q1 ( 256 + a; 45 + a; 1800 + a) if 0 > > 1 > > > > : Z 1 if a a < 745, 231 0 8 > > > > > > < q3 ( 25; 45 + a; 1800) if 231 a < 231, q3 ( 25; 45 + a; 1800 + a) if 231 1 if a a < 231, a < 745, 745, 231 f1 q1 ( 256 + a; 45 + a; 1800 + a)g da + Z 745 q3 ( 25; 45 + a; 1800 + a)(32) da. 231 We incorporate observed covariates into the choice probabilities as a linear index and use a sieve approximation to the price and income terms for j = 2; 3, viz., qj (x; p2 ; p3 ; y) = 0 + 0 cj x + 0 j m (p2 ; p3 ; y) where m ( ; ; ) represent cubic splines, and the standard logit CDF , (33) ( ) is used as a link-function to keep the choice probability between 0 and 1. The results reported here correspond to 5 equispaced knots for income and three equi-spaced knots for each of the two prices. This speci…cation minimized the cross-validation criterion of leave-one-out in-sample prediction of the actual choice out of all speci…cations using between 3 and 5 knots for each variable. Going beyond 5 knots 37 posed di¢ culty with estimation owing to the very large dimensional optimization involved. The integrals in (31) and (32) are calculated numerically using the Trapezoidal rule via the STATA command "integ". The coe¢ cient ’s are estimated via maximum likelihood. The average welfare gain measured in terms of CV for the actual change is the average EV welfare loss calculated from (31) and average welfare gain in terms of EV for the actual change is the average CV welfare loss calculated from (32). These numbers are reported in Table 3, top row. The log-sum calculations, on the other hand yield E (EV ) = E (CV ) " ( exp 1 ln = j j exp = 1 ln j j 01 + ( 01 + ( 02 02 exp (b02 + exp (b02 + b0x2 x b0x2 x + + x2 x2 0 x1 ) x + p20 + exp 0 x1 ) x + p21 + exp 45 ) + exp (b03 + b0x3 x 790 ) + exp (b03 + b0x3 x 03 03 256 ) + 1 25 ) + 1 01 + ( 01 + ( x3 x3 0 x1 ) x + p30 + 1 0 x1 ) x + p31 + 1 , where the bs represent the normalized alternative speci…c intercepts and slopes and j j is the constant marginal utility of income. The b’s and are estimated via the constrained "mlogit" command in STATA which imposes constant marginal utility of income by requiring price coe¢ cients to be identical across alternatives. The EV and CV for the actual policy change, viz., rise in prices, are the CV and EV respectively from the price rise derived above. The deadweight loss is obtained by subtracting the mean welfare gains from the sum of the subsidy payments and the additional wage-payments of employers of child labourers to individuals at income 1800 rupees, i.e., DW L = (p31 p20 p30 ) q3 (p20 ; p30 ; 1800) q2 (p20 ; p30 ; 1800) + p21 q2 (p21 ; p31 ; 1800) E (W elf are) = 745 +256 q3 ( 256; 45; 1800) q2 ( 256; 45; 1800) 25 q2 ( 25; 45; 1800) E (W elf are) , where E (W elf are) is either the Log-sum given by (34), or the EV given by (31) or the CV given by (32). 38 )# (34) Policy Change 2: Recall the speci…cation qj (x; p2 ; p3 ; y) = 0 + 0 cj x + 0 pj m (p2 ; p3 ; y) , which yields E (EV ) = Z 1 q2 ( 111 + a; 146; 1800 + a) da 0 = Z 1 0 0 E(CV ) = = Z Z0 1 0 where the 1 + 0 c2 x + 0 2m ( 111 + a; 146; 1800 + a) da, q2 ( 111 + a; 146; 1800) da 0 + 0 c2 x + 0 2m ( 111 + a; 146; 1800) da, coe¢ cients and the m ( ) function are the same as in (33). The integral up to in…nity is numerically computed simply by taking a large upper limit of integration such that taking any higher limit does not change the answer by more than 0.1%. 39