Key methods in constructing (spatial) composite indicators

advertisement
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
Key methods in constructing (spatial)
composite indicators
INQUIMUS Workshop
Salzburg
15-17 September 2014
Stergios Athanasoglou
stergios.athanasoglou@jrc.ec.europa.eu
(with input from Dorota Weziak-Bialowolska and Michaela
Saisana)
European Commission
Joint Research Centre
Econometrics and Applied Statistics Unit
Stergios Athanasoglou
JRC Structure:
8 Institutes in
5 Member States
Econometrics and Applied Statistics Unit
(formally in Brussels, physically in Ispra by
lake Maggiore)
~ 2750 staff
~ 330 M€/y budget
INQUIMUS workshop
Salzburg, 15-17 September 2014
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
4 lines of activity of the JRC-COIN group
Construction
Regional Europe 2020 (included in the 2014 WEF’s report and in the EC 6th
Cohesion Report), Regional HDI, Regional Multidimensional Poverty Index,…
Validation
Global Innovation Index 2014, WJP Rule of Law 2014, Environmental
Performance Index 2014, Corruption Perceptions Index 2012 (… over 100
requests)
Methodology
In-house developed quality control frame, sensitivity analysis, multicriteria decision analysis, statistics and policy
Training
04/2013 WEF-Geneva; 07/2013 Istanbul; 22-26/09/2014 Ispra (12th year)
https://composite-indicators.jrc.ec.europa.eu
Collaborations with OECD,
WEF, INSEAD, WIPO,
UN-IFAD, FAO,
Transparency International,
World Justice Project,
Harvard U., Yale U.,
Columbia U., Cornell U., …
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
OECD/JRC Handbook on composite indicators
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
2014 Global Innovation Index (INSEAD, WIPO)
…
2014/2012/2010/2008/2006 Environmental Performance Index (Yale & Columbia Uni)
2014/2012/2010 Rule of Law Index (World Justice Project)
2012 Corruption Perceptions Index (Transparency International)
Over 100 audit
…
requests
2010 Multidimensional Poverty Assessment Tool (UN IFAD)
2009 Index of African Governance (Harvard Kennedy School)
…
2008 European Lifelong Learning Index (Bertelsmann Foundation, CCL)
Composite Indicators
Stergios Athanasoglou
Overview
0. What is a composite indicator?
1. Theoretical framework
2. Indicator Selection
3. Imputation and Outlier Treatment
4. Normalization
5. Statistical Coherence
6. Weights and Aggregation
7. Uncertainty and Sensitivity Analysis
INQUIMUS workshop
Salzburg, 15-17 September 2014
Composite Indicators
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
Definition: A composite indicator is formed when individual indicators are
compiled into a single index, on the basis of an underlying model of the
multi-dimensional concept that is being measured.
(OECD Glossary of statistics)
Composite Indicators: Pros and Cons
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
Pros
•
Summarize complex multidimensional information into single number.
•
Easy to interpret.
•
Able to incorporate large amounts of data.
•
Facilitate communication with press and public, potentially enhancing
accountability.
•
Allow for the computation of rankings to assess comparative standing.
Composite Indicators: Pros and Cons
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
Cons
•
Potential lack of clarity regarding the underlying concept that is being
measured (e.g., Newsweek “best country” index).
•
Arbitrariness and lack of theoretical underpinnings in their construction.
•
Possible questionable tradeoffs between components.
•
Lack of robustness to changes in subjective choices regarding indicator
selection, weights, aggregation, normalization, etc.
•
Badly constructed composite indicators can form the basis of bad and
misinformed policy.
Developing a theoretical framework
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Definition of the multidimensional phenomenon to be studied.
• Determination of its sub-dimensions.
• Identification of selection criteria for underlying indicators.
Experts and stakeholders need to be consulted at this pivotal initial stage.
Be wary of logical inconsistencies. E.g., it is generally not good practice
to mix inputs and outputs. E.g., if you want to measure environmental
conditions (output), it may not be a good idea to include, say, environmental
subsidies (input) in the index.
Selecting indicators
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
• Importance of an indicator
• Proper unit of analysis
Literature review, expert opinion,
statistical analysis
• Sign of an indicator
• Sound variables (merge different sources of information, survey data,
proxy variables …)
• Survey data: sampling size, representation, translation!
different years,
Selecting indicators
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
• Importance of an indicator
• Proper unit of analysis
• Sign of an indicator
Country, region, household, individual
(not all indicators are available at the unit
• Sound variables (merge different sources of information,
survey data, different years,
of analysis)
proxy variables …)
• Survey data: sampling size, representation, translation!
Selecting indicators
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Importance of an indicator
• Proper unit of analysis
• Sign of an indicator
E.g. social cohesion – OECD lists 35
• Sound variables (merge different sources of information,
survey
data, of different
indicators: e.g.,
number
marriagesyears,
and
divorces, number of foreigners.
proxy variables …)
Q: Does a large group of foreign-born
• Survey data: sampling size, representation, translation!
population decrease social cohesion?
(yes/no οƒ sign)
Selecting indicators
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
• Importance of an indicator
• Proper unit of analysis
• Sign of an indicator
Sound data providers & descriptive
statistics on indicators
• Sound variables (merge different sources of information, survey data, different years,
proxy variables …)
• Survey data: sampling size, representativeness
Suggestion: choose roughly 5-10 indicators per dimension
Imputation of missing data: Methods
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Hot-deck imputation: use actual data of other “similar” entities
• Cold-deck imputation: use data from external source
• Unconditional sample mean/median/mode
• Regression imputation: missing data are substituted by values predicted by
regression (look for strong correlations)
• Expectation maximization (EM): uses concepts from maximum likelihood
estimation
• Markov Chain Monte Carlo (MCMC) multiple imputation
Suggestions:
1. Not more than 25% missing data per indicator, country or dimension
2. Prefer variations of hot-deck method, or imputation based on correlations, to
unconditional sample mean.
Outlier detection and treatment
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
•Important: Detect outliers before it is too lateοƒ  they may distort the correlation
structure and become unintended benchmarks during normalization
JRC’s Rule of Thumb:|Skewness|>2 AND kurtosis>3.5 flag problematic indicators that
need to be treated before the final index construction.
Normalization
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
Why normalize? To adjust for
1. Different nature of indicators (positive vs. negative orientation
towards the index).
2.
Asymmetric distribution, different variances, and outliers.
3.
Different units of measurement.
4.
Different ranges of variation.
Different methods of normalization
Stergios Athanasoglou
1.
Linear Scale
a) Min-max rescaling
b) Standardization (z-scores)
2.
Ordinal Scale
a) Ranks
b) Categorical scales
3.
Ratio Scale
a) Distance from benchmark peer (e.g., country)
b) Distance from benchmark year
INQUIMUS workshop
Salzburg, 15-17 September 2014
Which normalization method to use?
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Each method has pros and cons.
• Choice will depend on the context.
• However…important to note that composite indicators often criticized on
account of the potential arbitrariness of normalization --especially since
different methods may lead to different rankings.
• The above being said, max-min rescaling can be justified axiomatically (see
Theorem 1 in Chakravarty 2003)
Statistical Coherence
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
Statistical soundness of the composite indicator is studied in two basic ways:
1. Correlation Analysis
• Detect excessive collinearity (suggestive of double-counting) between
indicators.
• Detect indicators that behave as noise
• Detect indicators with problematic correlation (e.g., negative correlations
where they are supposed to be positive)
2. Principal Component Analysis
• Verify the internal consistency of the CI within each pillar and sub-pillar.
Correlation Analysis: Suggestions
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Ensure index pillars are strongly positively and, as much as possible, evenly
correlated with the composite index.
• Moreover, ensure that they are positively (and as much as possible) evenly
correlated with each other, though not excessively so (to avoid doublecounting)
• Ensure the above at all consecutive levels of aggregation of the index (i.e.,
pillar-sub pillar, sub pillar-indicator, etc)
Statistical Coherence: Suggestions
Stergios Athanasoglou
•
Ensure that within each level of aggregation (index, pillar, sub-pillar) a single
principal component explains most of the variance. Especially important at
the first level of aggregation.
•
•
INQUIMUS workshop
Salzburg, 15-17 September 2014
This confirms the conceptual framework of the index, suggesting that
components are all measuring the same concept. Thus, aggregating them
makes sense.
It is desirable that indicator loadings within the principal factor are also
roughly even.
•
This supports the adoption of equal weights within the pillar, sub-pillar,
etc.
Statistical Coherence: EPI Example based on JRC Audit
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• The EPI index comprises two overarching policy objectives β€’ Environmental
Health Ecosystem Vitality. These two policy objectives are made up of three and
six issue areas, respectively. (Each issue area consists of sets of 1 to 4 indicators.)
• Structure and weighting scheme below (linear aggregation)
Statistical Coherence: Example Correlations
INQUIMUS workshop
Salzburg, 15-17 September 2014
Health Impacts
Air Quality
Water & Sanitation
Water Resources
Agriculture
Ecosystem Vitality
Forests
Fisheries
Biodiversity & Habitat
Climate & Energy
correlation between
indicator and issue
area
correlation between
indicator and
objective
correlation between
indicator and
EPI
correlation between
issue area and
objective
correlation between
objective and
EPI
Indicators
Issue Areas
Environmental Health Policy objectives
Stergios Athanasoglou
Child Mortality
Household Air Quality
1.00
0.62
Average Exposure to PM2.5
0.73
0.02ns -0.06 ns
PM2.5 Exceedance
0.68
-0.08 ns -0.14 ns
Access to Drinking Water
Access to Sanitation
Wastewater Treatment
0.95
0.96
1.00
0.90
0.90
0.75
0.84
0.81
0.81
Agricultural Subsidies
0.57
-0.53*
-0.59*
Pesticide Regulation
Change in Forest Cover
0.62
1.00
0.41
0.14 ns
0.43
0.22
Coastal Shelf Fishing Pressure
0.95
-0.05 ns -0.07 ns
Fish Stocks
National Biome Weights
Global Biome Weights
Marine Protected Areas
Critical Habitat Protection
Trend in Carbon Intensity
Change of Trend in Carbon
Intensity
Trend in CO2 Emissions per
KWH
0.55
0.93
0.93
0.65
0.74
0.54
0.12 ns
0.58
0.58
0.55
0.44
0.35
0.58
0.11 ns -0.07 ns
0.60
0.93
0.88
0.34
0.85
0.75
0.05 ns
0.41
0.42
0.43
0.34
0.24
0.93
0.60
0.88
0.95
Red: wrong direction
Purple: noise
Orange: uneven
(with overlaps)
0.75
-0.02ns
0.14 ns
-0.01ns
0.90
0.67
0.61
0.39
Source: Athanasoglou S, Weziak-Bialowolska D, Saisana M. Environmental Performance Index 2014 JRC Analysis and Recommendations, EUR 26623
Statistical Coherence: Example PCA
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Principal Component Analysis:
1. EH objective can be considered one-dimensional with one latent variable
explaining 71.1% of the variance in the three issue areas.
2. Two latent dimensions between the six issue areas of the EV objective.
The first one captures 26% of the variance and is described by Water
Resources, Agriculture, Climate & Energy, Fisheries (in part). The second
captures 24% of the variance described by Forests, Biodiversity &
Habitat, and Fisheries (in part).
•
Unclear whether aggregating them into one policy objective is
supported by the data.
Weighting & Aggregation
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• The choice of aggregation function and weights is among the most
contentious in the construction of a composite indicator.
• Often these choices are not grounded in economic theory or a coherent
normative framework, sparking backlash (Ravallion, 2012).
• In the field of environmental composite indices, the concept of
meaningfulness is particularly important: that is, we would like the index
recommendations to be invariant to plausible transformations of the
underlying variables (switching scales for instance).
Aggregation
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
• Most of the time use simple linear aggregation.
• Parameterized by 𝛽 ∈ ℜ , the generalized weighted mean of a vector π‘₯ given
weights 𝑀 is given by
𝛽
𝑦 𝛽 π‘₯, 𝑀 = ∑𝑀𝑖 π‘₯𝑖
1
𝛽
.
• When 𝛽 = 1 (0), weighted arithmetic (geometric) mean.
• 𝛽 can be interpreted in terms of the elasticity of substitution between the different
1
dimensions of the index, 𝑒 , where 𝑒 =
. The smaller the value of 𝛽, the
1−𝛽
lower the substitutability between the different dimensions of performance.
• For values of 𝛽 < 1, generalized weighted means reflect a preference for balanced
performance across the different dimensions of the index.
Weighting
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Generalized weighted means can be justified axiomatically (Blackorby and
Donaldson, 1982).
• However, this still leaves open the issue of choosing weights. Their choice can be
fraught with complex philosophical dilemmas (Foster and Sen, 1997).
• Many different methods for setting weights, including (see Decancq and Lugo, 2013
for a literature review):
1.
2.
3.
4.
5.
6.
7.
Principal component and factor analyses,
Regression (stated preference or hedonic)
Data envelopment analysis,
Public opinion polls,
Budget allocation,
Analytic Hierarchy Process, and
Expert consultation, among others.
Weights & Aggregation: Pitfall 1
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
• Most composite indicators are linear, i.e. weighted arithmetic averages. Weights are
understood to reflect variables’ importance in the index (e.g. budget allocation)
• Caution: Nominal weights are not always a measure of variables’ statistical importance
• Example: A dean wants to rank teachers based on ‘hours of teaching’ and ‘number of
publications’, adding these two variables up
Assume:
X1 = Hours of teaching
X2 = Number of publications
Y = 0.5 X1 + 0.5 X2
Weights & Aggregation: Pitfall 1
Stergios Athanasoglou
• ρ(x1, x2) =−0.151, σ(x1) = 614, σ(x2) = 116, σ(y) = 162
• Estimated 𝑅21 = 0.075 and 𝑅2 2 = 0.826
• Teachers’ ranking is mostly driven by the number of publications
INQUIMUS workshop
Salzburg, 15-17 September 2014
Weights & Aggregation: Pitfall 1
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
To address this, the dean substitutes the model
Y = 0.5 X1 + 0.5 X2
with
Y = 0.7 X1 + 0.3 X2
• This weighting indeed leads to much more balanced correlations.
• Inasmuch as “importance” is taken to mean “correlation with composite
index”, teaching and publications are now equal.
• However, a professor comes by, looks at the last formula, and complains that
publishing is disregarded in the department …
• See Paruolo et al. (2013) for a deeper discussion on weights as importance
coefficients
Weights & Aggregation: Pitfall 2
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
•
Unwanted trade-offs between dimensions may emerge, formalized via the marginal rate of
substitution: the amount of one dimension needed to compensate for the loss of another.
•
Consider 2010 HDI in which geometric aggregation was introduced. A study by Martin Ravallion
(2012, J Dev. Econ) found that the new version of the HDI index inadvertently lead to:
•
•
An implicit monetary valuation of longevity that is sharply increasing in income.
Extremely high monetary valuations for schooling (again sharply increasing in income)
Source: Ravallion, 2012
Weights & Aggregation: Pitfall 3
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Environmental indices are deemed meaningful, only if their recommendations are
invariant to reasonable transformations of the data (Ebert and Welsch, 2004).
• E.g., suppose an index of environmental stress combines eutrophication and
acidification.
• We would want the index rankings to be insensitive to whether we decide to measure
eutrophication in terms of phosphorus or nitrogen (=*10) equivalents (ratio-scale
transformation).
• This requirement implies that we must use geometric aggregation. Arithmetic
aggregation will violate this property sooner or later!
• Ensuring meaningfulness for sets of non-comparable interval-scale transformations
(e.g., Celsius to Fahrenheit) is impossible --unless we normalize variables by max-min
rescaling (which, in turn, requires justification!).
Uncertainty Analysis
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Many modelling choices (input factors) going into the construction of a
composite indicator are to a large degree subjective and open to debate:
•
•
•
•
•
•
Weights
Aggregation
Normalization
Inclusion/exclusion of indicators
Imputation techniques
…
Testing the robustness of an index to different modelling choices is
of paramount importance.
Uncertainty Analysis
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
As a result, an uncertainty analysis should include a careful mapping of all
these uncertainties onto the space of the output.
Two things can happen:
The space of the inference
is still narrow enough, so as
to be meaningful.
GREAT!
The space of the inference
is too wide to be meaningful.
(What “too wide” means is
open to interpretation…)
Revise the CI, or
further collect
indicators
Uncertainty Analysis: EPI 2014 uncertain weights and aggregation
INQUIMUS workshop
Salzburg, 15-17 September 2014
Stergios Athanasoglou
Reference
I. Uncertainty in the
aggregation formula
II. Uncertainty in the weights
Weighted arithmetic
average, i.e.,
Reference value for the
weight
Environmental Health
0.4
Ecosystem Vitality
0.6
Alternative
Generalized weighted
mean
Distribution assigned for
robustness analysis
Uncertainty analysis results for 2014 EPI country ranks (based on 500 weight-aggregation pairs)
Source: Athanasoglou S, Weziak-Bialowolska D, Saisana M. Environmental Performance Index 2014 JRC Analysis and Recommendations, EUR 26623
Sensitivity Analysis
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Given the results of an uncertainty analysis, it is important to understand which
assumptions most drive variation in index results.
• This is accomplished via a sensitivity analysis.
• A compelling aggregate measure of robustness can be found in the average shift
in rank πœ‡Δ𝑅 that uncertain input factors lead to.
•
Define the sensitivity index 𝑆𝑖 to be the fractional contribution to the sample
variance of πœ‡Δ𝑅 due to uncertain factor 𝑖. Put differently: the expected
reduction in variance of πœ‡Δ𝑅 that would be obtained if that factor could be
fixed.
• Estimation of sensitivity indices is an active field of research.
Sensitivity Analysis: EPI 2014 index
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
• Example: 2014 EPI with uncertain weights 𝑀 and aggregation 𝛽 at policy objective
level.
𝑆𝑀 = 0.04
𝑆𝛽 = 0.94
Source: Athanasoglou et al., 2014
• The choice of aggregation function accounts for 94% of the sample variance
of πœ‡Δ𝑅 , while the weights just 4%. Clearly, EPI results are far more sensitive to
the former than the latter.
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
Thank You
References
Stergios Athanasoglou
INQUIMUS workshop
Salzburg, 15-17 September 2014
Athanasoglou S, Weziak-Bialowolska D, Saisana M. Environmental Performance Index 2014 JRC Analysis and Recommendations,
EUR 26623
Blackorby, C., and D. Donaldson (1982), “Ratio-scale and translation-scale full interpersonal comparability without domain
restrictions,” International Economic Review, 23, 249-268.
Chakravarty, S. R. (2003), "A generalized human development index." Review of Development Economics 7, 99-114.
Ebert, U., and H. Welsch (2004), “Meaningful environmental indices: a social choice approach,” Journal of Environmental Economics and
Management, 47(2), 270-283.
Decancq, K. and M.A. Lugo (2013), “Weights in Multidimensional Indices of wellbeing: An Overview,'' Econometric Reviews, 32, 7--34.
OECD and European Commission JRC, Handbook on Constructing Composite Indicators, OECD Publications, Paris, France, 2008
Paruolo, P, M. Saisana, and A. Saltelli (2013), “Ratings and rankings: voodoo or science?” Journal of the Royal Statistical Society: A, 176,
no. 3 (2013): 609-634.
Ravallion, M. (2012), “Troubling Tradeoffs in the Human Development Index,'' Journal of Development Economics, 99, 201-209.
Ravallion, M. (2012), “Mashup Indices of Development,'' World Bank Research Observer, 27, 1-32.
Saisana, M., A. Saltelli, and S. Tarantola (2005), “Uncertainty and Sensitivity Analysis as Tools for the Quality of Composite
Indicators,'' Journal of the Royal Statistical Society A, 168, 1--17.
Download