A New Dataset on Global Income Distribution

advertisement
A New Dataset on Global Income Distribution
Charles Ackah, Maurizio Bussolo,
Rafael De Hoyos, and Denis Medvedev
Development Prospects Group
The World Bank
Do not quote
Feb 29, 2008
1. Introduction
For many years, particularly following the influential ideas of Simon Kuznets (1955),
discussions about the relationship between income inequality and a country’s aggregate
income level have taken centre stage in development economics research. Since Kuznets
made his famous proposition about the existence of an inverted-U shaped relationship
between income inequality and economic development, many researchers have felt
compelled to try to validate this hypothesis by often looking at past trends in search of
any evidence that development truly hurts the poor. Empirically, some studies have found
that a country’s rate of economic growth is negatively correlated with its initial level of
inequality (see Ahluwalia, 1976; Deininger and Squire, 1998) while others have failed to
demonstrate any systematic association between economic growth and the distribution of
income (see Bourguignon and Morrisson, 1990; Roland Benabou, 1997; Anand and
Kanbur, 1993; Li, Squire and Zou, 1998). A major difficulty in the empirical literature
lies with the choice of an appropriate definition of global inequality. Another problem
with the previous literature is the issue of the reliability of the underlying data used for
distributional analysis.
While the literature has made important strides in addressing what constitutes an
appropriate measure of inequality, analyses of global income distribution are still plagued
with serious data problems, including the limitations of traditional databases and the poor
comparability of data despite some obvious improvements in the availability of income
1
inequality data mainly spawned by the pioneering work of Deininger and Squire (1996).
It is fair to say that the dataset compiled and made freely available by Deninger and
Squire have led to a remarkable improvement in the availability of secondary data
facilitating the analysis of world income inequality, which was hitherto difficult to
contemplate. However, utilizing the existing available data for distributional analysis are
not without costs or problems and applied researchers using such data face non-trivial
limitations in their ability to study the effects of global income inequality. The
importance of having reliable and comparable individual (or household) level survey data
that has long been acknowledged as appropriate for comparative analysis of levels and
trends in global income distribution cannot be overemphasized (see Milanovic 2002).
Moreover, since there has been a recent considerable interest and concern about the
distributional effects of increasing globalization, there is even a more present need for
reliable datasets that permits meaningful comparison of inequality not only within
countries but across regions and nations. Indeed, being able to accurately measure the
relative positions of every country (or individual) within the global income distribution is
a necessary condition for evaluating whether any policy initiative (such as the removal of
agricultural distortions, for example, or globalization, more generally) would increase or
reduce world inequality. This is particularly important if one is interested in knowing the
global distributional impacts of any such phenomenon.
We set the stage by presenting the first ever household survey-based global distributional
data. We should stress that we are not here attempting to critique the strengths and
weaknesses of the existing traditional databases since none is actually comparable to this
new data on global income distribution. Atkinson and Brandolini (2001) provide a
detailed critique of the use of secondary datasets, including the Deninger and Squire’s
celebrated dataset, and we do not want to repeat that in this paper. In fact, there are a
number of important limitations to our present data as well. For example, the data has no
time dimension (i.e. there is only one observation per country) so they are not suitable for
examining changes over time of global income distribution. We are, for example, largely
focusing on snapshots (or levels) of the distribution – such as the global distribution of
2
income in 2000.1 The data, nonetheless, is an interesting departure from the existing
databases. In particular, this database is not a mere compilation of secondary crosscountry inequality indices. Instead, it is an actual presentation of a truly global income
distribution based entirely on household survey data. Our ultimate goal is to assemble
existing representative household data for all countries on the globe, standardize them so
that they are internationally comparable, and make the minimal distribution data available
to researchers in a ready-to-use format for the analysis of global income distribution. This
goal is motivated by the publicly non-availability of data on global income distribution
limiting the ability of researchers to identify the positions of each country in the global
income distribution. It is important to note that our global data also easily permits
interested researchers to construct the Deninger and Squire-like within-country Gini
coefficients. We hope this work will contribute to a more informed policy debate about
the dynamics in the global income distribution and the emergence of the new global
middle class.
The main objective of this manuscript is to introduce our new global dataset on income
distribution and to discuss the procedures followed in assembling the data as well as
acknowledging any remaining limitations. The next section briefly surveys previous
attempts at improving the availability of cross-country income distribution data. Section 3
introduces the dataset and discusses the data sources. This section also discusses the
limitations of the data and how we dealt with the issues of comparability. We then
present some descriptive evidence on global income inequality in Section 4. In Section 5,
we combine our data with the Global Income Distribution Dynamics (GIDD) tool to
provide an illustrative example of some of the benefits of such a novel data. Section 6
concludes the paper with some suggestions for future work.
1
Although concentrating on a snapshot of the distribution, the underlying data allows for a forward-looking
analysis of changes in the distribution of income arising from anticipated policy changes, and we provide
an illustration in Section 5 of this paper.
3
2. Previous Work
This subsection aims at charting some advances in the literature, including the recent
datasets that have been made available by some authors highlighting what we believe to
be the value added of our work. Ours is not the first attempt to compile distributional data
based only on representative household surveys. The Deininger and Squire dataset and, in
particular, the World Income Inequality Database (WIID1 and WIID2) compiled by
World Institute for Research in Development Economics (WIDER) (2000) are recent
examples of commendable attempts to gather information from household surveys to
facilitate income distribution analysis. While there had been prior efforts to create
secondary datasets for the study of income inequality including those by Kuznets (1963),
Paukert (1973), Jain (1975), Aluwalia (1976), the United Nations (1981 and 1985) and
Fields (1989), these data often fell short of the needs of researchers and policymakers
who were interested in world inequality issues (see Atkinson and Brandolini, 2001).2 In
the words of Denininger and Squire (1996:567):
“although a large number of earlier studies on inequality have amassed
substantial data on inequality, the information included is often of dubious
quality”.
Responding to the call by many researchers for better quality data on income distribution,
Deininger and Squire compiled the first most comprehensive cross-country dataset on
inequality measures based mainly on representative household surveys, rather than
estimates drawn from national account statistics. Most studies have used the subset of this
data labelled “accept” and declared as satisfying the authors’ criteria for being “highquality”.3 This dataset that covers the period 1960-1996 has been widely cited in the
empirical literature investigating the link between inequality and economic growth (see
Deninger and Squire, 1998; Benabou, 1996; Forbes, 2000; Barro, 2000; Li and Zou,
2
The data previously available to earlier researchers were almost entirely based on national statistics or
even when derived from household surveys the secondary data were grouped in nature, with typically just a
few observations per country-year. However, the increased recognition of population heterogeneity
continues to cast doubt on the reliability and relevance of such average data.
3
To be classified as high-quality, the data must come from representative (national) household surveys, and
that the survey must be capturing all sources of income or expenditure, including own-consumption.
4
1998; Li et al., 1998). Despite the progress achieved, the Deininger and Squire dataset
should be approached with circumspection, and Atkinson and Brandolini have
unequivocally verified this (see also Bourguignon and Morrison, 1998). Among its
problems are two persistent issues: (1) the inequality measures are not derived on a
consistent basis from comparable micro data - most of the figures are sourced from “third
parties” that may differ from the original source because different studies do not
necessarily employ similar conceptual definitions and methodologies (2) Income
inequality observations based on the household unit are mixed with statistics based on
income recipients; indices derived from grouped data are mixed with those based on
micro data; estimates include some which relate to gross income and others to net
(disposable) income; inequality indices include a mixture of ones obtained from
parameterized and actual distributions.4
The Deninger and Squire dataset has since been updated to generate the World Income
Inequality Databases (WIID1 and WIID2) by WIDER (2000, 2005). WIID1 which covers
the period 1950-1998 has recently been updated resulting in a new version referred to as
WIID2 (WIDER 2005). It is reasonable to view this database as a mere extension of the
Deininger and Squire dataset; it contains all of the Deininger and Squire “high-quality”
data and adds some ‘new’ data rejected by Deninger and Squire for not satisfying their
inclusion criteria. Hence most of the limitations raised by Atkinson and Brandolini about
the Deninger and Squire dataset may apply even more forcefully to the WIDER datasets.
Two very rich and important databases that have some semblance with ours are the
World Income Distribution (WYD) dataset and the All the Ginis database compiled by
the World Bank (Milanovic 2005). Unlike the earlier datasets, these two are original (not
secondary) databases created directly by the authors from household survey data. The
WYD dataset covering the period 1985-2000 contain grouped (decile or quintile) income
distribution indices for a large number of countries for three benchmark years (1988,
1993 and 1998). If a country does not have a household survey for a given benchmark
4
For a number of cases when income shares rather than Gini coefficients were available, Deninger and
Squire resorted to Chen, Datt and Ravallion’s (1995) statistical program, POVCAL, to compute Gini
coefficients.
5
year, then a year as close to the benchmark was selected so that observations are clustered
around the years 1988, 1993, and 1998 (Milanovic, 2006). This database has been used
for most of Milanovic’s work on world inequality (see for example, Milanovic 2002 &
2005). In the summer of 2004, Branko Milanovic created the All the Ginis database that
simply combines the DS, WIID2 and WYD datasets.5 The idea was to create a more
comprehensive and superior database than were available at the time. In the end what
Milanovic did was to show a revealed preference for WIID2 so that WYD was only used
to ‘fill’ the gaps in WIID2. The rationale for doing this is not clear to us but such
arbitrary presumption introduces severe inconsistencies in the All the Ginis database.
Apart from the limitation of having only grouped data, the database is a mixture of both
secondary and primary sources of data, which make the indices not comparable. Take any
country for example and you will be comparing a Gini coefficient à la WIID2 in one
benchmark year against a Gini coefficient from WYD in another benchmark year.
Moreover, since All the Ginis is a synthesis of all the above-mentioned datasets, it
inherent suffers from the limitations that plague all three datasets.
3. The GIDD Database
Compared to the existing secondary databases on cross-country inequality indices, the
current dataset is a ‘true’ global income distribution. In fact, there is no precursor to this
work. Milanovic (2002) is the first study we are aware of that derives ‘true’ global
income distribution based solely on household survey data. Milanovic estimates the
world income or expenditure inequality for individuals for the years 1988 and 1993. The
world distribution is basically derived in the same manner as one would derive a
country’s income distribution from regional distributions. Household surveys from 91
different countries adjusted for differences in Purchasing Power Parity (PPP) are used.
For about three-quarters of the reported observations Milanovic had access to individual
record data while for the rest, mean income or expenditure per deciles (or any other
population group share) are used from grouped data. To maintain internal consistency,
5
Note however that the WIID2 dataset already incorporates the DS datasets so that in effect Milanovic is
combining WIID2 and WYD.
6
Milanovic converted the unit record data into decile data so that in the end he had only
ten data-points per country.6
Other than the fact that the data used by Milanovic is not publicly accessible, the main
difference in the approach used here consists of working directly with the household data
rather than with grouped data. In particular, our data relate to the individual/household so
that one is able to explore much more heterogeneity within the distribution. Although
working with grouped data has become popular and it has apparent advantages in terms
of simplicity, it is not particularly attractive if one wants to perform scenario simulations.
This is because using grouped data, or even parameterized Lorenz curves, may not permit
the full heterogeneity in the distribution to be explored; in part because rural and urban or
agricultural and non-agricultural households are mixed together. Working directly with
household data – and thus being able to exploit the full heterogeneity – is fairly powerful
in such cases. In addition, our data covers more countries and recent surveys.7 Moreover,
by aligning all surveys in time to a common base year (2000 in this case) and by applying
a common processing procedure, our data is more suitable for purposes of international
comparison of income distribution [this paragraph may not be completely true if we are
also working with grouped data].
3.1. Conceptual and Methodological Issues
Any attempt to put together a secondary cross-country dataset drawing on different
household surveys raises several conceptual and methodological issues. A few reflections
on a number of these problems are appropriate before we describe our data and present
some descriptive results. Many of these issues are well discussed in the literature and so
we will only give a cursory treatment here touching on the points that bear directly upon
our work. There is at present little conceptual agreement regarding what constitutes a
6
More detailed information on the data used by Milanovic (2002) can be found at
http://www.worldbank.org/research/inequality/data.htm.
7
Milanovic was able to obtain household survey data for a common sample of 91 countries both in 1988
and 1993, covering about 84 per cent of the world population. By contrast, our sample covers 116 countries
representing 91 percent of the world population. We had access to individual records for 1.2 million
households in 84 developing countries. These micro data are complemented with more aggregate
information for countries where we do not have direct access to surveys. Household information from
developed countries comes from the Luxemburg Income Study dataset.
7
good quality database for the purposes of studying global income distribution. That is not
to suggest that there is no ideal standard, and Atkinson and Brandolini (2001) and most
recently Milanovic (2002) have already forcefully made the case for this. National
accounts data are in principle comparable across countries and they have been widely
used for the analysis of world income distribution. However, most researchers including
Milanovic are uncomfortable about their rampant utilization for the analysis of world
income distribution. We agree with Milanovic that for the analysis of ‘true’ global
income distribution household survey data are really indispensable. Granted that survey
data are the most preferred, the issue is how to maintain consistency and to ensure that
what we measure is truly global income distribution. We begin with one of the issues
which has been most discussed in the literature: the concept of economic welfare.
The Welfare Concept
The first challenge in this exercise is the choice between consumption and income as the
preferred overall measure of living standards. There is no clear guidance as to the most
preferred welfare concept to use in studying distributional issues. While the advantages in
preferring consumption over household income in welfare analysis are well known –
consumption is less variable and more accurately gathered -, some people still hold the
view that the lack of consensus on the treatment of durables makes the use of
consumption problematic (see Deaton and Zaidi, 2002; and Atkinson and Bourguignon,
2000). Opposing views aside, many statistical offices only collect information on either
consumption or income and so, in practice, one has not got a lot of choice anyway. For
the purposes of this exercise, we define global income as the sum (over all household
members) of the reported as well as estimated and imputed personal monthly
consumption expenditures (or income) of all countries for which we have access to unit
record data. Whenever both consumption and income are available the former is always
preferred. This was the case in most African and Asian surveys where detailed
consumption data are collected. By contrast, most industrialized countries and much of
Latin America collect much more detailed income data. For these countries we follow
common practice by using the information available, which in most cases is income.
8
Adjustment for Household Size and Composition
The reference unit may be the household or the individual income earner. For income, we
have information for each income earner while for consumption is usually given at the
household level. For every country where we have access to individual record data our
statistical unit of analysis is taken to be the total consumption expenditure (or income)
adjusted for the size by dividing by household size. Indeed not adjusting income implies
that the welfare achievable in a household with a certain income is independent of the
number of its occupants. We note that though not the most preferable, in practice, per
capita consumption or incomes are used as they are the most commonly available. We
recognize that the use of per capita income as the unit of observation amounts to an
assumption that no economies of scale arise from sharing of economic resources and that
children and adults do not differ in their needs. However, the choice is mainly driven by
practical matters. Very few countries report equivalence scales, and indeed these scales
are difficult to compare across countries. We could, in fact, use the square root of
household size as a crude measure of equivalence scales (see Gottschalk and Smeeding,
1997).
The Application of Weights
The welfare unit may be the person-weighted or household-weighted. As we are
concerned with the welfare of the individual, all observations are person-weighted so that
per capita income is counted as many times as there are persons in the household.
Converting Local Currencies to International Dollars
On the issue of making international comparisons of monetary variables across countries,
there is wide consensus among economists that PPP income data are the most
appropriate, even though there are ongoing discussions about some remaining
imperfections. The direct use of the official exchange rates has been discounted as
inappropriate for international comparison of living standards for several reasons, not
least because in many countries the official exchange rates are distorted and volatile. In
order to preserve comparability of the monetary variables across countries, we follow
9
common practice by converting all local currencies into international dollars using PPP
conversion factors.
3.2. Coverage and Data Processing
This subsection is concerned with the processes involved in arriving at the final dataset
and the sample coverage. We also discuss how the included surveys were selected and
the sources of the data used.
Data Processing
When one talks about global income distribution what one really wants to do is to be able
to calculate income differences between all the citizens of the world regardless of
nationality (see Milanovic 2002). For example, we want to be able to compare where an
individual in Ghana stands in the world income distribution vis à vis his counterpart in
Mexico. We are here talking about each person in the world having his or her own real
income adjusted for differences in purchasing power parity. Note that the size of the
country matters as well as the within country income distribution. One would, ideally,
require data on both within- and between-national income distributions for all the
countries in the world. The data requirement for such an exercise is massive and perfect
comparability of the data across countries is not achievable. Nonetheless, it is important
for data compilers to strive to minimize data and methodological differences across
nations (Gottschalk and Smeeding, 2000).
In this regard, in cases where we had access to individual data, we made all possible
efforts to make data as comparable as possible across countries by using similar
definitions of variables for each country and by applying consistent methods of
processing the data. We pool all available surveys per country drawing on survey data
from different World Bank sources. Our main source of data for developing countries
came from data files used for the World Development Reports 2006 and 2007. For
countries where we had access to multiple surveys we only retain the most nationally
representative, recent and as closest to year 2000 as possible. For the developed countries
we had to rely on predefined grouped data from the already standardized LIS dataset.
Similarly, for China we only had access to predefined aggregate data already divided into
10
urban and rural parts. For internal consistency, we also convert all the developing unit
record data into vintiles ranking all individuals by their household per capita consumption
or income. Each vintile contains 5% of individuals in a given country. To this dataset, we
added two new variables; PPP conversion factors obtained from the Penn World Tables
and local CPI indexes obtained from the World Development Indicators (WDI). We
should mention that all household surveys are “placed” in the year 2000. Basically, if we
use a survey for 2002, the CPI is used to have all income/consumption figures in
domestic values of 2000; also a correction factor is applied to the population weights so
as to get to the (census) population of 2000. The real figures were finally converted into
international dollars in year 2000 using the PPP conversion rates.
Coverage
Our sample is primarily determined by the availability of representative household survey
data with as comprehensive information as possible on consumption or income. The only
inclusion restriction was the quality of the survey, whether it is nationally representative
and availability of consumption or income aggregates. Essentially, we began with as
many survey data as possible for all countries for which unit record data on consumption
or income were available in the WDR. However, many of these countries were
eliminated from the sample due to lack of data deemed as important for our purposes. As
with Chen and Ravallion (2007), surveys are excluded if essential data are missing (PPP
exchange rates and local CPIs, for example) or if there are serious comparability
problems with the rest of the data. The main source of data for the developing countries is
the World Bank WDR database. We mainly use the underlying data used for the WDR
2006 and 2007 which are drawn largely from the LSMS and the Africa ISP-Poverty
monitoring group. The data for Eastern Europe are drawn from the ECA databank and
different World Bank sources. The Luxembourg Income Studies database (LIS) is our
source of data for most of the developed countries. Table A.1 in the Appendix presents
the main characteristics of each household survey [This Table will come later]. The table
shows the names of the surveys, the sample size (in number of individuals) and the
welfare measure used (consumption or income).
11
The final sample consists of 116 countries representing 91 percent of the world
population. We had access to individual records for 1.2 million households in 84
developing countries. These micro data are complemented with more aggregate
information for countries where we do not have direct access to surveys. The countries
covered and their respective shares of the total sample and world population are
presented in Table A.2 in the Appendix. The final sample covers all regions in the world:
Eastern Europe and Central Asia (100%), Latin America (98%), South Asia (98%), East
Asia and Pacific (96%), High Income Countries (79%), Sub-Saharan African (74%) and
Middle East and North Africa (70%).
3.3. Limitations
As already mentioned, the data requirements and the quality restrictions required to
maintain international comparability is enormous. While we endeavored to make the data
consistent and cross-nationally comparable, the usual ‘caution’ applies. Since there exists
no global ‘household survey’ but instead different countries use different questionnaires
and have different ways of minimizing potential measurement errors, perfect
comparability is not assured. Perhaps, one of the most obvious difficulties is that as we
do not have household income for all countries income inequality statistics are mixed
with consumption inequality measures, confounding international comparisons as income
tends to be more unequally distributed than expenditure. Moreover, differences in survey
design, comprehensiveness of income sources and quality all have the potential to affect
cross-country comparisons of income inequality. These quality issues and others, which
plague most cross-country distributional analysis, have been sufficiently discussed in
Gottschalk and Smeeding (1997, 1998), Székely and Hilgert (1999), Atkinson and
Brandolini (2001) and Milanovic (2002); and we do not want to belabor them here other
than warn that users should bear them in mind while using our data.
12
Appendix 1:
Table A1: Household Surveys Included in the GIDD
Region
Actual population
Covered Population (%)
World
5,498,162
6,076,509
90.48
East Asia and Pacific
Eastern Europe and Central Asia
High Income Countries
Latin America
Middle East and North Africa
South Asia
Sub-Saharan Africa
1,733,358
460,385
764,285
500,199
190,397
1,332,800
516,737
1,817,232
471,549
974,612
515,069
276,447
1,358,294
663,305
95.38
97.63
78.42
97.11
68.87
98.12
77.90
Economy
Covered population
Covered population
Actual population
East Asia and Pacific
1,733,358
1,805,691
China
Indonesia
Vietnam
Philippines
Thailand
Malaysia
1,260,000
212,000
80,400
71,600
61,700
23,300
1,260,000
212,000
80,400
71,600
61,700
23,300
13
Data used
grouped
individual
individual
individual
individual
grouped
Cambodia
Lao PDR
Papua New Guinea
Mongolia
Myanmar
Korea, Dem. Rep.
Fiji
Timor-Leste
Solomon Islands
Vanuatu
Samoa
Micronesia, Fed. Sts.
Tonga
Kiribati
Marshall Islands
Eastern Europe and Central Asia
11,900
4,927
5,133
2,398
460,385
11,900
4,927
5,133
2,398
47,700
21,900
811
784
419
191
177
107
100
91
53
471,549
individual
individual
grouped
grouped
individual
individual
individual
individual
individual
individual
individual
grouped
grouped
individual
individual
individual
individual
individual
grouped
individual
individual
grouped
grouped
individual
individual
individual
individual
grouped
individual
individual
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
Russian Federation
Turkey
Ukraine
Poland
Uzbekistan
Romania
Kazakhstan
Serbia and Montenegro
Czech Republic
Hungary
Belarus
Azerbaijan
Bulgaria
Tajikistan
Slovak Republic
Georgia
Kyrgyz Republic
Turkmenistan
Croatia
Moldova
Lithuania
Armenia
Albania
Latvia
Estonia
Macedonia, FYR
Bosnia and Herzegovina
High Income Countries
136,000
69,600
47,600
38,300
25,100
21,800
15,000
10,600
10,300
9,876
9,994
8,199
7,906
6,376
5,393
4,514
5,008
4,644
4,446
4,259
3,477
3,065
3,139
2,383
1,363
2,044
764,285
146,000
67,400
49,200
38,500
24,700
22,400
14,900
8,137
10,300
10,200
10,000
8,049
8,060
6,159
5,389
4,720
4,915
4,502
4,503
4,275
3,500
3,082
3,062
2,372
1,370
2,010
3,847
974,612
United States
Germany
France
United Kingdom
Italy
Korea, Rep.
Spain
Canada
Netherlands
Greece
282,000
82,200
58,900
58,800
57,700
47,000
40,500
30,800
15,900
10,900
282,000
82,200
58,900
59,700
56,900
47,000
40,300
30,800
15,900
10,900
14
Belgium
Portugal
Sweden
Austria
Hong Kong, China
Israel
Denmark
Finland
Norway
Singapore
New Zealand
Ireland
Slovenia
Luxembourg
Netherlands Antilles
Japan
Taiwan, China
Saudi Arabia
Australia
Switzerland
Puerto Rico
United Arab Emirates
Kuwait
Cyprus
Bahrain
Qatar
Macao, China
Malta
Brunei Darussalam
Bahamas, The
Iceland
French Polynesia
New Caledonia
Guam
Channel Islands
Virgin Islands (U.S.)
Antigua and Barbuda
Isle of Man
Bermuda
Greenland
Latin America
Brazil
Mexico
Colombia
Argentina
Peru
Venezuela, RB
Chile
Ecuador
Guatemala
Bolivia
Dominican Republic
Haiti
Honduras
10,300
10,100
8,875
8,011
6,669
6,282
5,338
5,177
4,492
4,020
3,864
3,815
1,986
441
215
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
grouped
500,199
10,300
10,200
8,869
8,012
6,665
6,289
5,337
5,176
4,491
4,018
3,858
3,805
1,989
438
176
127,000
22,200
20,700
19,200
7,184
3,816
3,247
2,190
694
672
606
444
390
333
301
281
236
213
155
147
109
76
76
62
56
515,069
172,000
98,000
41,600
37,300
26,800
24,300
15,200
12,000
11,800
8,514
7,950
8,146
6,281
174,000
98,000
42,100
36,900
26,000
24,300
15,400
12,300
11,200
8,317
8,265
7,939
6,424
individual
individual
individual
individual
individual
individual
individual
individual
individual
individual
individual
individual
individual
15
El Salvador
Paraguay
Nicaragua
Costa Rica
Uruguay
Panama
Jamaica
Guyana
Cuba
Trinidad and Tobago
Suriname
Barbados
Belize
St. Lucia
St. Vincent and the Grenadines
Grenada
Dominica
St. Kitts and Nevis
Middle East and North Africa
Egypt, Arab Rep.
Iran, Islamic Rep.
Morocco
Yemen, Rep.
Tunisia
Jordan
Algeria
Iraq
Syrian Arab Republic
Libya
Lebanon
West Bank and Gaza
Oman
Djibouti
South Asia
India
Pakistan
Bangladesh
Nepal
Sri Lanka
Afghanistan
Bhutan
Maldives
Sub-Saharan Africa
Nigeria
Ethiopia
South Africa
Tanzania
Kenya
Uganda
Ghana
Côte d'Ivoire
Madagascar
Cameroon
Zimbabwe
6,409
5,386
5,186
3,805
3,332
2,849
2,607
733
6,280
5,346
4,920
3,929
3,342
2,950
2,589
744
11,100
1,285
434
266
250
156
116
101
71
44
276,447
individual
individual
individual
individual
individual
individual
individual
individual
67,300
63,700
27,800
17,900
9,564
4,857
30,500
23,200
16,800
5,306
3,398
2,966
2,442
715
1,358,294
grouped
grouped
individual
individual
grouped
individual
individual
individual
individual
individual
individual
516,737
1,020,000
138,000
129,000
24,400
19,400
26,600
604
290
663,305
137,000
64,300
43,900
34,500
28,100
24,600
19,300
16,500
16,000
15,500
12,600
118,000
64,300
44,000
34,800
30,700
24,300
19,900
16,700
16,200
14,900
12,600
individual
individual
individual
individual
individual
individual
individual
individual
individual
individual
grouped
190,397
67,300
63,700
27,800
16,500
9,565
5,532
1,332,800
1,020,000
142,000
131,000
20,800
19,000
16
Zambia
Niger
Mali
Burkina Faso
Malawi
Rwanda
Guinea
Senegal
Benin
Burundi
Sierra Leone
Mauritania
Lesotho
Gambia, The
Comoros
Congo, Dem. Rep.
Sudan
Mozambique
Angola
Chad
Somalia
Togo
Central African Republic
Eritrea
Congo, Rep.
Liberia
Namibia
Botswana
Guinea-Bissau
Gabon
Mauritius
Swaziland
Cape Verde
Equatorial Guinea
São Tomé and Principe
Seychelles
12,600
11,800
11,100
10,800
10,300
8,024
7,929
7,914
6,718
6,563
4,509
2,668
1,743
1,217
554
10,700
11,800
11,600
11,300
11,500
8,025
8,434
10,300
7,197
6,486
4,509
2,645
1,788
1,316
540
50,100
32,900
17,900
13,800
8,216
7,012
5,364
3,777
3,557
3,438
3,065
1,894
1,754
1,366
1,272
1,187
1,045
451
449
140
81
grouped
grouped
individual
individual
grouped
grouped
individual
individual
individual
individual
grouped
individual
grouped
individual
grouped
References [needs updating]:
Atkinson, A. B., and A. Brandolini (2001), “Promises and Pitfalls in the Use of
Secondary Data-Sets: Income Inequality in OECD Countries as a Case Study”. Journal of
Economic Literature 39, 771-800.
Bourguignon, F. and C. Morrison (1998), “Inequality and Development: The Role of
Dualism,” Journal of Development Economics, 57(2), December, 233–58.
Deininger, K. and L. Squire (1996), “A New Data Set Measuring Income Inequality,”
World Bank Economic Review, 10(3), 565–91.
17
Deininger, K. and L. Squire (1998), “New Ways of Looking at Old Issues: Inequality and
Growth,” Journal of Development Economics, 57(2), December.
Fields, G. (1989), “A Compendium of Data on Inequality and Poverty for the Developing
World”, Cornell University (mimeograph).
Gottschalk, P., and T. M. Smeeding (1997), “Cross-National Comparisons of Earnings
and Income Inequality”, Journal of Economic Literature 35, 633–687.
Székely, M. and M. Hilgert (1999), “What’s Behind the Inequality We Measure: An
Investigation Using Latin American Data”, Working Paper no. 409, Inter-American
Development Bank.
Put the following somewhere in the main text >>>>>>>>>>>>>>>>>>
A Note on Imputing Sector of Employment Data using
Household Surveys
The Linkage Global Computable General Equilibrium model and the micro data of the
Global Income Distribution Dynamics (GIDD) model are linked through several
aggregate variables. More specifically, the wage rate and the employment levels of the
agriculture and non-agriculture segments of the economy are among the crucial link
variables. A first step in assembling the dataset for the Linkage-GIDD modeling
framework thus consists of identifying the variable “sector of employment/occupation” in
the original survey data. Given that this variable is recorded only for a subset of the
surveys used in the GIDD framework – out of the 73 household surveys included in the
GIDD dataset, only 30 of them report this information – an imputation methodology had
to be devised. This note explains how this missing variable has been estimated for the
cases where it was not available.
The basic logic behind the data imputation is that observable characteristics, both at the
household and individual level, are correlated with the probability of being employed in
certain sector. The methods described in this note make use of this correlation to assign,
to each household head, a probability of being employed in a particular sector.
According to the GIDD’s surveys where sector of employment is reported, less than 3
percent of heads of households in urban areas work in the agricultural sector as a main
activity, on the other hand, 65 percent of the heads in rural areas derived their incomes
from farming activities (see Table 1). Given this high correlation, and the fact that the
household’s stratum (rural/urban) is available in all the surveys, we assume that all the
heads of households located in urban areas are not engaged in agricultural activities.
18
Table 1: Distribution of Agricultural Employment in Rural vs. Urban Areas (%)
Urban
Stratum
Rural
Sector of Employment
Agriculture
Non-Agriculture
2.7
97.3
65
35
Total
100
100
Note: With data from the GIDD
Define Pr(i  1) as the probability observing individual i being employed in the
agricultural sector, allow X i denote a vector of observable—personal and household—
characteristics of individual i affecting the probability of being part of the agricultural
sector and, finally, introduce  i as a set of zero-mean, normally-distributed, random
components. For all households located in rural areas in surveys with sector of
employment information the following model was estimated:
Pr(i  1)  X i β   i
(1)
where β is a vector of parameters relating characteristics X i with the probability of
being part of the agricultural sector. In countries without information on sector of
employment, we nevertheless observe the vector of personal and household
characteristics, X i , therefore, under certain assumptions, we can use β̂ to impute, to each
household head, a probability of being part of the agricultural sector. The crucial
assumption is that the parameters estimated for countries with employment information
( β̂ ) are valid for countries without employment information. In other words, we have to
assume that countries with employment information are a random sample of our universe
of 73 countries. If this assumption is satisfied, we can define the expected value of the
probability of being part of the agricultural sector in out-of-sample households as:
EPr(i  1) | Xi   Xi βˆ
(2)
The variables included in vector X i are age, gender and education level of the head plus
household size and per-capita household income. The results of the estimation of
equation (1) are presented in Table 2. Although the pseudo-R2 is quite low, all
independent variables have a significant effect on the probability of being employed in
the agricultural sector.
Table 2: Probability of Being Employed in the Agricultural Sector
Age
Gender (1=Female)
Education Level
Household Size
Coefficient
0.002
-0.421
-0.307
0.005
Robust
Standard Error
0.000
0.014
0.008
0.002
19
z-statistic
5.6
-30.2
-40.8
2.4
p-values
0.000
0.000
0.000
0.017
Per-capita HH Income
Constant
Number of
Observations
Pseudo R2
-0.003
1.448
0.000
0.029
-26.6
50.5
Wald 2 (5)
Prob. > 2
312,045
0.068
0.000
0.000
5514
0.000
Note: Author’s own estimations using data from the GIDD
The coefficients shown in Table 2, are used to assign a propensity score ( Xi βˆ ) or the
probability of being farmer to each rural household in countries without agricultural
employment information. The last bit of information that we need is the proportion of
households whose head is part of the agricultural sector at the national level. This specific
information is not available; nevertheless, the World Development Indicators (WDI)
reports the proportion of total employment that is part of the agricultural sector. We
assume that this proportion is close enough to the proportion of heads in farming
activities and therefore use it as the parameter determining the amount of households that
will be assigned to the agricultural sector.
For each country, rural households are ranked according to their probability of being part
of the farming sector. Households are then assigned to the agricultural sector (according
to their propensity score) until the proportion of households in the agricultural sector
matches the proportion of employment in agriculture (using data from WDI).
RAFA: A sentence (and perhaps a quantitative measure) on validation: didn’t we try to
test this method on a few countries and were quite satisfied…
20
Download