Report - InGRID

advertisement
Activity report of visit to InGRID
research infrastructures
Click here to enter subtitle
Name and last name
Vladimir Hlasny
Project title
Top Income Biases and the Measurement of Inequality: A Worldwide Study
Abstract (max 300-500 words)
The questions of income inequality and its measurement have received significant attention in
recent years in academic as well as popular media. A debate continues regarding the true degree of
inequality within and across nations as well as worldwide, the impact of various statistical problems
on inequality measurement; and the appropriate methods for dealing with them. Prescriptions how
to deal with the statistical problems and programming routines for such corrections in standard
databases are missing. Through this research, I intend to clarify the performance of alternative
correction methods for issues in the top end (top income biases) of income distributions of
different statistical properties, under different degrees of statistical problems encountered. Top
income biases include those due to item and unit nonresponse, measurement errors and top coding
and trimming practices. A worldwide sample of household surveys will be used. Sensitivity analysis
of technical specifications used in the methods will be conducted, and best practices will be
identified.
The study started with a review of methodologies for dealing with top income biases.
Distribution of incomes and measures of income inequality will next be corrected using alternative
methods. Alternative correction methods for top income biases will be evaluated one-by-one and
jointly in income distributions of different statistical properties. Sensitivity analysis of the alternative
corrections will be performed with respect to technical specifications of the methods, and
characteristics and structure of the data at hand. Best practices will be identified.
Cross-country comparisons of adjustments will be made to derive regularities and laws emerging
from data corrections. Beside corrected measures of inequality at the national level, inequality
measures at the level of world regions and worldwide will be generated. A methodological and
statistical “toolbox” will be made available to researchers and statistical agencies to improve and
standardize the measurement of inequality across the world.
Introduction and motivation of visit
The visit was integral to the first exploratory phase of the project, to clarify any technical
constraints and the potential for performing the analysis on a large number of national surveys.
During the visit, survey data for forty countries and multiple years were reviewed directly on LIS
secure servers, their properties were assessed through various resources available at LIS, and the
considered methodological tools were evaluated on those surveys to gauge their performance in the
following stages of the project.
Following the visit, cross-country comparisons of corrections will be performed to derive
regularities and laws emerging from the various corrections. Beside corrected measures of inequality
at the national level, inequality measures at the level of world regions and worldwide will be
estimated, an important contribution on its own given the lack of consensus on worldwide
inequality. The final contribution is to offer a methodological and statistical “toolbox” to
researchers and statistical agencies to improve and standardize the measurement of inequality across
the world.
Scientific objectives of visit
The main purpose of the visit was to use the official documentation, reports from LIS staff and
contact information for individual statistical agencies to understand the sampling and editing
practices performed on individual datasets by the statistical agencies, and harmonization of the
datasets performed by LIS. Help was sought from LIS staff regarding access to the datasets, to
documentation and to contacts for national statistical agencies; regarding the impacts of
2
harmonization and other statistical practices on the measurement of top incomes and inequality;
and regarding best practices for merging additional variables and appending additional national
surveys to LIS database. Preliminary analysis of top-income biases on these surveys was performed
on LIS secure servers, to ensure success via future remote access.
Reasons for choosing research infrastructure and datasets/surveys/...
LIS is a unique infrastructure that collects a large panel of income surveys from across the world
and over a number of years. LIS also collects extensive documentation on individual surveys,
contacts for national administrators, and information on how raw files from individual national
statistical agencies get converted into public-access harmonized LIS datasets. LIS staff are
knowledgeable about the quality of each dataset, and about the best approach for collecting
additional information for each dataset. This body of information and advice was invaluable to my
project.
Activities during your visit (research, training, events, ...)
One of the key pieces of information required by the project was survey response rates by
households’ primary sampling unit or by detailed region. My main task was to collect response rates
from survey documentation, directly from statistical agencies, or with the help of LIS staff from
raw survey data, and match them on to LIS data. With the collected regional response rates for
several country surveys, I performed a preliminary analysis of biases due to systematic nonresponse by top-income households, and due to nonrepresentative distribution of their incomes, on
secure computers at LIS headquarters.
Also, I held many useful discussions with LIS staff about data quality, availability of data-quality
flags, and definitions of all survey variables. I assisted with obtaining data-quality clarifications from
one national statistical agency. Lastly, I presented a seminar about the statistical methods I used in
the project, and I joined LIS staff at another seminar at a related institute in Luxembourg.
Method and set-up of research
Top-income measurement issues addressed in this study specifically include unit nonresponse and
imprecise measurement of incomes due to item nonresponse, data-entry errors and top-coding. I
use recently developed statistical methods for assessing these problems and correcting for them.
Specifically, I evaluate the existence of a systematic relationship between household demographics
and the probability of households’ responding to a survey, using the information on regional
response rates. I correct for the resulting bias by re-weighting household incomes according to the
households’ estimated response probability (Korinek et al. 2007). This is performed at various
levels of regional disaggregation, in datasets that permit it (Hlasny and Verme 2013). Secondly, to
gauge the extent of measurement problems due to item nonresponse, data-entry errors and topcoding, I evaluate how representative the top ends of income distributions are across country
surveys to what would be expected under smooth theoretical distributions. I correct for poor
representativeness of a small number of top-end incomes by replacing them with predicted values
(Cowell and Victoria-Feser 2007) or random draws from these smooth distributions (Jenkins et al.
2011).
Project achievements during visit (and possible difficulties encountered)
My visit was successful on various fronts, thanks to substantial help from LIS staff: I collected
regional response rates for nearly 30 country surveys, and was able to perform my analysis of top-
3
income biases on these surveys. I received advice how to collect regional response rates for
additional surveys. I reviewed the structure of country surveys in LIS database. I learnt of various
potential pitfalls with the data (limitations of harmonization, variable quality, heterogeneous types
of household non-response etc.), and received guidance how to use the database and the remoteexecution data access system (Lissy). I received helpful technical feedback on my up-to-date work
on the project. Notable difficulties include: poor availability of additional information about surveys
from statistical agencies; and limitation on the comparison of top-income bias corrections across
countries due to heterogeneous definition of variables and response rates across countries.
Preliminary project results and conclusions
Preliminary results indicate that the Gini index of inequality in most country surveys is sensitive to
the systematic non-response by top-income households. Correcting for such unit non-response
using information on regional response rates consistently raises the estimated Gini coefficients
across countries, by several percentage points. Changing the geographic level of analysis has an
important impact on the unit-nonresponse correction, implying that understanding of the income
distribution, demographics and behavioral similarities in the population within and across regions is
important. Greater degrees of geographic disaggregation yield systematically lower estimates of the
nonresponse bias, but the bias remains significant economically as well as statistically.
Correcting for non-representative distributions of top income observations using fitted values or
random draws from smooth distributions helps to refine the estimated Ginis, but by a small
amount, of either direction across country surveys. Assumptions regarding the true distribution of
top incomes have a small effect on the correction.
Outcomes and future studies
Methods and results of the research undertaken at LIS will be summarized in a technical paper
made available through LIS portal. As an extension, results for the set of LIS country surveys will
be combined with results from other available surveys, to provide a fuller picture of the variation in
the top-income biases across surveys and countries of various levels of development, and to shed
light on the determinants of their size. The following external sources will be used: several national
surveys in the World Bank repositories; Economic Research Forum database for North Africa and
the Middle East countries; the EU Statistics on Income and Living Conditions (EU-SILC) surveys;
and surveys from other statistical agencies, including the US Bureau of the Census, Egypt’s Central
Agency for Public Mobilization and Statistics and others. The identified regularities and laws will be
applied to a worldwide measure of inequality, in order to correct it for the plausible bias due to
mismeasurement of top incomes worldwide. Finally, a statistical “toolbox” will be offered to other
researchers to improve and standardize the measurement of inequality across the world.
References
Cowell, F.A. and Victoria-Feser, M.-P. (2007) Robust Lorenz curves: a semiparametric approach,
Journal of Economic Inequality 5:21-35.
Hlasny, V., and Verme, P. (2013) Top incomes and the measurement of inequality in Egypt, World
Bank Policy Research working paper series #6557.
Jenkins, S.P., Burkhauser, R.V., Feng, S., and Larrimore, J. (2011) Measuring inequality using
censored data: a multiple-imputation approach to estimation and inference, Journal of the Royal
Statistical Society 174(1):63-81.
Korinek, A., Mistiaen, J.A. and Ravallion, M. (2007) An econometric method of correcting for unit
nonresponse bias in surveys, Journal of Econometrics 136:213-235.
4
Download