Inequality, Poverty and their evolution in the US: Consumption and Income information in the Consumer Expenditure Survey. Orazio Attanasio, Erich Battistin and Andrew Leicester. London 26.10.2004 Notes prepared for the National Poverty Center's ASPE-Initiated Workshop on Consumption among Low-Income Families. Washington DC November 5th 2004. I. Introduction. In these notes we discuss three main issues. We start with that of the quality of household level data on consumption in the US and their use for the measurement of inequality, poverty and their evolution. Consumption has some important advantages over income as a measure of well-being, ranging from the fact that it is more likely to reflect ‘permanent income’ and more generally the effect that different intertemporal smoothing mechanisms of shocks to the fact that it is more likely to be directly related to ‘utility’ and ‘welfare’ than income. It is therefore worrying and puzzling that the basic facts that come out of the main available surveys are not consistent. We will argue that the consumption survey has some important problems and they have got worse in recent years. These need, in our opinion, to be resolved, as these data can be extremely important for a variety of policy issues. These problems none withstanding, we proceed to show some evidence on the evolution of poverty as measured by consumption and income in the CEX. In particular, we focus on: (i) the number of ‘poor’ according to the different definitions; (ii) how much the ‘poor’ consume as a fraction of their income; (iii) the extent to which some observable (human capital, race) are linked to the probability of being poor and how that has changed over time. The level of consumption and its relation to income is only one aspect of the behaviour of poor households that is of some interest. A relatively under-researched agenda is, in our opinion, the modelling of consumption components by poor individuals. From a policy perspective, it is important to consider what are the components of consumption that are most important for poor households, how do they change with the level of total consumption, income, relative prices and so on. This analysis is important because it can be very informative both on the nature of consumption among poor households and on the mechanisms through which poor households cope with shocks to their income. In the last part of the paper we present some preliminary evidence on modelling food shares in total consumption for the whole population and for sub-groups. This is just an example of the type of analysis that can and should be pursued. In the conclusions we mention possible extensions to this analysis and some future research projects. II. Consumption and Income : available data and their quality. We will not re-iterate the importance of having exhaustive and high quality consumption data in household level surveys. This issue has been discussed at length elsewhere and we will take it for granted. In the US there are mainly two household level data bases that contain information on consumption. The first is the PSID. The big advantage of the PSID is that it is a longitudinal data base that has been collected since 1967 and has been proven of very high quality in a variety of dimensions. The main problem with the PSID, in terms of the consumption data, is the fact that the information contained in it is far from being exhaustive. The PSID contains information on food consumption (at home and outside the home) and on a few additional items. In addition to providing very synthetic and incomplete information, the questions asked to gather this information are somewhat ambiguous, especially in terms of the time horizon to which they refer. The other large data set that contains consumption information is, of course, the Consumer Expenditure Survey (CEX). The CEX has a long history, going back to the beginning of the 20th century. However, only in 1980 did the survey become a continuous and consistent survey. The CEX is made of two independent samples. The first, and largest, sample is referred to as the Interview sample and is a rotating panel. Around 5,000 households are interviewed 4 times per year and in each of these interview they report information on a the expenditure in the previous three months on a long list of commodities, which should almost exhaust the items that make Personal Consumption Expenditure in the NIPA accounts.1 While the information on most commodities is very detailed, the one on food is not. There is basically a catch all question on ‘total food in the home’. Food is particularly problematic in the Interview Survey as it is the one question on which there are two important changes in the methodology used to collect the information, one in 1982 and one in 1987. In 1999 the sample size of this survey was increased by about 30%. The second component of the CEX is the so called Diary Survey. This is a somewhat smaller independent sample of 5,000 households. Households in this sample are observed for two weeks, and during this time they keep detailed diaries of all their expenditures. From 1980 to 1985 the Diary Survey only collected information on ‘frequently purchased items’. These include very detailed information on food and on a number of other items. Since 1986, the Diary Survey includes all commodities, including retrospective information on durables. Interestingly, in addition to the detailed diary information, for food at home the Diary Survey also contains the same retrospective questions asked in the Interview Survey. Both surveys, but especially the Interview Survey, contain a large set of additional variables ranging from income and labour supply variables, to detailed demographic information. The Interview Survey also contains a number of extremely detailed modules on a variety of themes, ranging from vehicles and vehicle loans to education, housing and so on. The main purpose of these data sets is the computations of the weights for the CPI. The BLS typically uses information from the Diary Survey for frequently purchased items and from the Interview Survey for the remaining items. This procedure implicitly recognizes the plausible fact that the information on frequently purchased items in the Diary Survey is more accurate than in the Interview Survey, while the information on other commodities is more reliable in the Interview Survey. Partly because the CEX was only started in 1980, until relatively recently the survey has not been used extensively, especially in comparison to other data sources, such as the PSID, the CPS, and the NLSs. An important question, therefore, is the reliability of the data. In this respect, one can check whether the information one extracts from the CEX is consistent with the information from other data sources. As is often the case, there are good news and bad news. The good news is that information on wages and on pre-tax income is remarkably consistent with the information from other data sources such as the CPS. If one studies the pattern of inequality using hourly wages measures from the CEX, one gets, with a little bit additional noise (explained by the smaller sample size) the same general picture that one gets out of the CPS. This is particularly comforting as it signals that the CEX should not be affected by particularly nasty composition problems and the like. The bad news is that when one looks at consumption, the picture is not very comforting. As the CEX is pretty much the only data source containing consumption information, the only possible comparison is to National Income and Product Account (NIPA) data. In Figure 1, we plot ‘grossed up’ CEX data for consumption along with PCE data. First, the CEX aggregates represent only a relatively small fraction of PCE from NIPA. What is worse is that the ratio of CEX consumption to NIPA PCEs has been deteriorating over time and is now, for total consumption, roughly 60% (down 1 The two main exceptions are imputed rent on owner occupied housing and personal care items. from 65% until the early 1990s. There is no convincing explanation for this deterioration of the data (in relation to the aggregate counterpart) during the last decade that we are aware of. In addition to the comparability with aggregate data there are additional problems of consistency between the two components of the Survey. Given the different methodology used in the Interview and Diary surveys, it is not particularly surprising that one gets different levels of aggregate expenditures from the two surveys. In Figure 2, we plot the aggregated Diary and Interview data for (log) non durable consumption. The comforting feature of these data is that, over time, the two series are roughly parallel.2 However, if one moves to consider distributional aspects, which are the focus of this note, the picture changes. The story about the evolution of inequality that emerges from the two surveys is considerably different. Figure 3, which is taken from Attanasio, Battistin and Ichimura (2004) (ABI), shows the evolution of the standard deviation of the log for non durable consumption in the diary and the interview survey. According to the latter survey, inequality in consumption, especially after the early 1990s, is substantially flat. This contrasts strongly with the evolution of income and wage inequality over the same period and would have strong implications in terms of the smoothing mechanisms and insurance markets available to US households. On the other hand, if one looks at the picture that emerges from the diary data, one gets a substantial increase of inequality throughout the 1990s. ABI rule out a number of simple explanations for this puzzling evidence (such as differences in sample compositions, changes in the frequency of purchases and so on), and propose a methodology that uses the assumption that the IS provides good measurements for some commodities while the DS provides good measurements for others. The problem, of course, is that to compute the variance of total non durable consumption one needs the covariance between the expenditure in the two sets of commodities. ABI use some assumptions that allow them to identify the path of inequality from 1982 to 2000 combining the information from the two data sources. We reproduce their evidence in Figure 4. Not surprisingly, the path of inequality is somewhere in between the one implied by the DS and the one implied by the IS. When looking at the distribution of consumption it becomes much harder to combine the information from the two data sets: for means it can be without any problems, for the variance, we can do it by making some (strong) assumptions. For the other features of the distribution, it is nearly impossible, without even stronger assumptions. The main lesson that comes out of this discussion is that the CEX data is affected by serious quality problems, which are particularly difficult to ignore if one wants to analyze the dynamics of inequality or other distributional issues. In what follows we will proceed and use the CEX data despite these problems. It is clear, however, that they should be kept in mind when interpreting our results. Before moving to the analysis of poverty in the CEX, it is however worth asking the question: are the data problems discussed in this section unavoidable? Is consumption in rich developed countries intrinsically impossible to measure without substantial measurement error? To answer this question the experience of other countries can be instructive. The UK has had a continuous consumption survey, the Family Expenditure Survey (FES), since the late 1960s and consistent consumption definitions can be constructed since 1974. A number of studies have compared the evolution of the FES data to the NIPA data. The contrast with the US situation is stunning: Tanner (1998) shows that when grossing up the FES data, one gets nearly 95% of national account PCE. And the difference is easily accounted for by definitional differences (mainly in owned occupied housing which is not included in the FES and the exclusion of institutionalized individuals). In Figure 5, we reproduce a picture that compares rates of growth of consumption in the FES and NIPA reproduced from Attanasio, Blow, Hamilton and Leicester (2004): while there are some blips (notably the 1991-1992 data), the correspondence between the two series is remarkable. 2 Another comforting feature is that if one looks at the retrospective question asked in the Diary on food in, one gets figures that are remarkably close to those of the Interview Survey. The methodology used in the FES survey is much more similar to the one used in the Diary Survey than in the Interview, in that it is based on two weeks diaries. However, it has also a considerable retrospective component and a number of procedures that capture accurately several important items (such as utilities etc.). Finally, it is larger than the CEX diary (at about 7,000 households per year) for a population that is considerably smaller (and presumably more homogenous) than the one in the US. III. Consumption and Income Poor Households in the US. In this section, we use the CEX to identify poor households according to income and consumption. In doing so we have to select a sample, define consumption and income and so on. Many of these choices are arbitrary and a more complete analysis would explore several alternatives and develop several loose ends. However, the material we present is meant to be indicative of the type of analysis that can be conducted with these data. The sample will be made of households headed by an individual aged 25 to 60 and excluding single mothers. This implies that we exclude three important groups that might be of particular interest in the analysis of poverty, such as elderly individuals and single mothers. All figures are deflated using the CPI. As income we consider before tax total family income. The main motivation, at this point, is the fact that taxes are not extremely well measured in the CEX. Some progress, however, can be made in this direction. As consumption we take total non durable consumption expenditure. Both income and consumption are ‘equivalized’ using the OECD adult equivalent scales. A household is defined as ‘poor’ in terms of consumption (income) if it has consumption (income) below 60% of median consumption (income). Most of the analysis will be executed both with data from the IS and the DS. In Figures 6 and 7 we plot the evolution of poverty rates over time computed in terms of income and consumption, first with the Interview and then with the Diary survey. Several interesting features emerge from these pictures. First, both in the Diary and the Interview Surveys, income poverty rates are higher than consumption poverty rates and they are roughly comparable, between 25% and 30%. In both surveys, the income poverty rates increase during the 1990s. The stories from the two surveys diverge when we consider consumption. First, the level of consumption poverty rates is considerable larger when we look at the DS. It should be remembered that the concept we are considering is a relative, rather than absolute concept. Therefore, this evidence points to a substantially different (and more unequal) distribution in the DS than in the IS. For the IS the poverty rate is between 16% and 19%. Consistently with the puzzle discussed in the previous section, poverty rates are substantially flat over time in the IS, while those from the DS increase considerably. When looking at the whole distribution of consumption, it is difficult to integrate the two data bases, one of which contains reliable information on one set of commodities and the other from the other. A fact that has received some attention (Meyer and Sullivan, 2003; Sabelhaus 2000) is that households in the lowest income quantiles consume substantially more than their income. In Figures 8 and 9 we plot, for three representative years, consumption and income (smoothed) against income quintiles for the lowest income quintiles. Notice that in all the pictures, the consumption graph is remarkably flat and, up to almost the 10th percentile, it is considerably larger than income. We next move to consider how the probability of being poor (in terms of income and consumption) is related to the membership of particular groups and how that has changed between the 1980s and 1990s. We do this by running a simple probit for the ‘poor’ indicator on a constant, a dummy for group membership, a dummy for the decade and their interactions. The groups we consider (in turn) are high school dropouts and blacks. Of course the size of the first group declines considerably in size in the 1990s. We report the results of this exercise in Table 1 for high school dropouts and in Table 2 for blacks. The coefficients should be interpreted as the change in the probability of being poor (according to the relevant definition). Table 1: effect of on the probability of being poor Interview survey Diary survey Consumption Income Consumption Income poverty poverty poverty poverty High school 0.227 0.314 0.164 0.332 dropouts (0.005) (0.005) (0.015) (0.016) 1990s 0.003 0.015 0.014 0.012 (0.002) (0.002) (0.006) (0.006) High school 0.015 0.066 0.034 0.040 drop* 1990s (0.005) (0.006) (0.016) 0.017 Fraction of poor 0.174 0.269 0.251 0.267 Black 1990s black* 1990s Fraction of poor Table 2 effect of on the probability of being poor Interview survey Diary survey Consumption Income Consumption Income poverty poverty poverty poverty 0.149 0.162 0.178 0.147 (0.006) (0.007) (0.019) (0.021) 0.006 0.023 0.020 0.018 (0.002) (0.002) (0.005) (0.006) -0.022 -0.012 -0.021 -0.020 (0.005) (0.007) (0.017) (0.020) 0.174 0.269 0.251 0.267 In the interview survey, being a high school dropout increases the probability of being income poor by 0.31 (from a basis of 0.27) and increases the probability of being consumption poor by 0.227 (from a basis of 0.17). In the 1990s there is slight (and insignificant in the case of consumption) increase in the probability of being poor. However, high school dropouts fare marginally worse, especially in terms of income. If we look at the Diary figures, we find that, in terms of consumption, high school dropouts have fared considerably worse in the 1990s than indicated by the IS. More generally, the 1990s seem a worse decade than what indicated by the IS. The evidence on the effect of race is consistent in the two surveys. In both cases and for both definitions of poverty, blacks are between 15 and 18% more likely to be poor. In both survey and for both definitions the 1990s see a slight (albeit not significant) improvement. As before, we see that the 1990s register an overall increase in poverty rates in three out of the four columns, the exception being the consumption definition in the Interview Survey. IV Food shares The study of expenditure shares and more generally demand systems has, of course a long tradition. However, I am not aware of many studies that have looked at Engel curves for the US and, in particular, for poor households. In this section we present some preliminary and simple analysis of food shares in the CEX sample we have been using and for some sub-samples. We start by looking at the shares of food in and food out as a total of non durable consumption. In Table 3 we report the median share for the whole sample, the sample of consumption poor and the sample of households headed by a high school dropout. Moreover, we report the median in each of the two decades in our sample. These data are from the Interview Survey. As to be expected, the share of food at home in total non durable consumption for the consumption poor is considerably higher than in the overall sample. Moreover, while for the overall sample the share increases slightly in the 1990s relative to the 1980s, the increase is much larger for the consumption poor. For the high-school dropouts, the share of food in is in the middle but it goes up at the same pace as for the consumption poor. Table 3 median share of food in non durable expenditure for the consumption poor Interview survey Total Consumption poor High school dropouts Food in Food out Food in Food out Food in Food out 1980-1989 0.242 0.060 0.363 0.039 0.306 0.039 1990-2001 0.256 0.056 0.386 0.035 0.324 0.034 Next we estimate some simple Engel curves. In particular, we estimate the following relationship for total food. (1) wij = θ X i + ν 1 ln(qi ) + ν 2 (ln(qi )) 2 + ε i j where wij is the share of consumption of food in total non durable consumption by household i , X is a vector of control variables, including year dummies, family composition and age variables as well as socio-economic indicators, q is total non-durable consumption and ε ij a residual term. An equation like (1) can be derived from a (rank-two) demand system and describes how expenditure share for food (or other commodities) changes with total expenditure. Year dummies control for changes in relative prices. Notice that the consumption share is allowed to be a quadratic function of the log of total consumption, in line with Banks, Blundell and Lewbell (1996). The estimation of equation (1) presents a number of econometric problems, ranging from the possibility that total non durable expenditure is correlated with the residual terms, either because of unobservable taste shocks not completely captured by the observables X or because of the presence of measurement error. The latter can be particularly problematic if one wants to estimate (1) using individual level instruments (see Lewbel, 1996). In what follows we simply report the OLS estimates of equation (1) for the whole sample and two sub-samples: the ‘consumption poor’ and the high school dropouts. ν1 ν2 R^2 (n. obs.) Whole sample -0.105 (0.007) 0.00006 (0.00053) 0.2034 (192639) Table 4 Engel curves for food Consumption poor 0.421 (0.051) -0.045 (0.005) 0.0351 (32882) High school dropouts 0.044 (0.020) -0.012 (0.002) 0.1778 (26706) In Table 4, to save space, we report only the estimates of ν 1 and ν 2 for the Interview Survey. The estimates from the Diary survey where qualitatively similar. The Engel curve for the whole population is unremarkable: the quadratic term is very small and not significantly different from zero. The linear term is negative, indicating that food is a necessity and consistently with the evidence in Table 3. However, when we focus on the consumption poor or on the high school dropouts, things become a bit more interesting. In both cases the quadratic term is strongly significant. The linear term is now positive and the quadratic negative, indicating an inverse U shaped relationship, for the poorest segment of the population, between the share of food and total non durable consumption expenditure. Taken at face value, this evidence means that, at very low levels of total consumption, food is a luxury, in that its share increases with total consumption. This type of pattern can be found in several developing countries data. While the peak of the curve is such that very few households are on the increasing portion of the Engel curve, it is nonetheless interesting to notice that such a group exist and that for a much larger group, the share of food declines slowly with total consumption. Clearly these results need to be investigated in depth before any strong statement can be made. V Conclusions and thoughts for future research . In this note we have discussed three issues: (i) consumption data quality and reliability; (ii) the dynamics of poverty and inequality in the last two decades and (iii) the composition of consumption among the poor and in particular the share of food. Each of these issues constitutes an important research topic in its own right, so that the discussion here was necessarily superficial. However, the discussion above gives an idea of an entire research agenda that can be opened. We conclude these notes with a list of potential topics that develop what is discussed above. (i) (ii) (iii) (iv) What more can be learned about the evolution of consumption distributions by combining data sets? To what extent can one use structural models, economic theory and other pieces of evidence to fill in the missing bits? Many other groups are worth looking at: single mothers, the elderly and children are particularly important; Much work can be done in exploring the composition of consumption. The evidence in the last section is suggestive of the fact that food is not necessarily a necessity for the poorest. There is much detailed information in the Diary survey about the components of food. These could be analyzed to find out how and what the poor (according to different definitions) eat. Exploring the composition of consumption in other dimensions and over time can also be useful to establish the importance of different mechanisms the poor use to cope with shocks. In this respect it could be important to exploit time and geographic variation. References Attanasio, Battistin and Ichimura (2004): “What really happened to consumption inequality in the US” Attanasio, Blow, Hamilton and Leicester (2004): “Booms and Busts in the UK” Banks, Blundell and Lewbell (1996) Lewbell (1996) Meyer and Sullivan (2003) Figure 1: CEX and PCE 2900 2700 2500 2300 2100 1900 1700 1500 8 8 8 8 8 9 9 9 9 9 CE 9 9 9 9 9 10 PC Figure 2: CEX Diary and Interview Survey Mean log expenditure on non-durables 6.60 6.50 6.40 6.30 82 83 84 85 86 87 88 89 90 91 92 Interview data 93 94 95 96 97 Diary data 98 99 100 101 Figure 3: Inequality in the Diary and Interview data Standard deviation of log expenditure on non-durables 0.75 0.70 0.65 0.60 0.55 0.50 0.45 82 83 84 85 86 87 88 89 90 91 92 Interview data 93 94 95 96 97 98 99 100 101 Diary data Figure 4. Combining information from Diary and Interview Survey What happens to overall consumption inequality before 1986? 0.58 0.56 0.54 0.52 0.50 0.48 Combined Interview data -6 -4 -2 0 per cent 2 4 6 8 10 Figure 5. Real per capita expenditure growth rates, UK FES and ONS (National Accounts), 1975 – 2001 1975 1980 1985 1990 1995 year FES ONS Source: National Accounts from ONS; Author’s calculations from FES/EFS 2000 Figure 6:Poverty Rates (threshold is 60% of median in each year): Interview Survey (mean) povinco (mean) povcons .32 .3 .28 .26 .24 .22 .2 .18 .16 .14 .12 80 82 84 86 88 90 year 92 94 96 98 100 98 100 Figure 7: Poverty Rates: Diary Survey (from 1986 only) (mean) povinco (mean) povcons .32 .3 .28 .26 .24 .22 .2 .18 .16 .14 .12 80 82 84 86 88 90 year 92 94 96 Figure 8 Mean expenditure and income by income percentile, 1985, 1990, 1995, 2000 (Interview survey) – bottom 35% of income distribution mean monthly expenditure mean monthly income year==85 year==90 year==95 year==100 1500 1000 500 0 1500 1000 500 0 0 5 10 15 20 25 30 0 35 5 10 15 20 25 30 35 pinc Graphs by year Figure 9 Mean expenditure and income by income percentile, 1990, 1995, 2000 (Diary Survey) – bottom 35% of income distribution mean monthly expenditure mean monthly income year==90 year==95 1500 1000 500 0 0 year==100 1500 1000 500 0 0 5 10 15 20 25 30 35 pinc Graphs by year 5 10 15 20 25 30 35 Figure 10 Engel Curve for total food (food in + food out), Interview Survey, whole population -.2 engel -.4 -.6 -.8 -1 4 2 6 lnc1 8 10 Engel curve for food Figure 11 Engel Curve for total food (food in + food out), Interview Survey, consumption poor only 1 engelpov .95 .9 .85 3 4 5 lnc1 Engel curve for food - Poor only 6 Figure 12 Engel Curve for total food (food in + food out), Interview Survey, High school dropouts Interview survey 0 engeldrop −.2 −.4 −.6 2 4 6 lnc1 8 Engel curve for food − High school dropouts 10