INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou Key methods in constructing (spatial) composite indicators INQUIMUS Workshop Salzburg 15-17 September 2014 Stergios Athanasoglou stergios.athanasoglou@jrc.ec.europa.eu (with input from Dorota Weziak-Bialowolska and Michaela Saisana) European Commission Joint Research Centre Econometrics and Applied Statistics Unit Stergios Athanasoglou JRC Structure: 8 Institutes in 5 Member States Econometrics and Applied Statistics Unit (formally in Brussels, physically in Ispra by lake Maggiore) ~ 2750 staff ~ 330 M€/y budget INQUIMUS workshop Salzburg, 15-17 September 2014 INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou 4 lines of activity of the JRC-COIN group Construction Regional Europe 2020 (included in the 2014 WEF’s report and in the EC 6th Cohesion Report), Regional HDI, Regional Multidimensional Poverty Index,… Validation Global Innovation Index 2014, WJP Rule of Law 2014, Environmental Performance Index 2014, Corruption Perceptions Index 2012 (… over 100 requests) Methodology In-house developed quality control frame, sensitivity analysis, multicriteria decision analysis, statistics and policy Training 04/2013 WEF-Geneva; 07/2013 Istanbul; 22-26/09/2014 Ispra (12th year) https://composite-indicators.jrc.ec.europa.eu Collaborations with OECD, WEF, INSEAD, WIPO, UN-IFAD, FAO, Transparency International, World Justice Project, Harvard U., Yale U., Columbia U., Cornell U., … Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 OECD/JRC Handbook on composite indicators Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 2014 Global Innovation Index (INSEAD, WIPO) … 2014/2012/2010/2008/2006 Environmental Performance Index (Yale & Columbia Uni) 2014/2012/2010 Rule of Law Index (World Justice Project) 2012 Corruption Perceptions Index (Transparency International) Over 100 audit … requests 2010 Multidimensional Poverty Assessment Tool (UN IFAD) 2009 Index of African Governance (Harvard Kennedy School) … 2008 European Lifelong Learning Index (Bertelsmann Foundation, CCL) Composite Indicators Stergios Athanasoglou Overview 0. What is a composite indicator? 1. Theoretical framework 2. Indicator Selection 3. Imputation and Outlier Treatment 4. Normalization 5. Statistical Coherence 6. Weights and Aggregation 7. Uncertainty and Sensitivity Analysis INQUIMUS workshop Salzburg, 15-17 September 2014 Composite Indicators Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 Definition: A composite indicator is formed when individual indicators are compiled into a single index, on the basis of an underlying model of the multi-dimensional concept that is being measured. (OECD Glossary of statistics) Composite Indicators: Pros and Cons Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 Pros • Summarize complex multidimensional information into single number. • Easy to interpret. • Able to incorporate large amounts of data. • Facilitate communication with press and public, potentially enhancing accountability. • Allow for the computation of rankings to assess comparative standing. Composite Indicators: Pros and Cons Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 Cons • Potential lack of clarity regarding the underlying concept that is being measured (e.g., Newsweek “best country” index). • Arbitrariness and lack of theoretical underpinnings in their construction. • Possible questionable tradeoffs between components. • Lack of robustness to changes in subjective choices regarding indicator selection, weights, aggregation, normalization, etc. • Badly constructed composite indicators can form the basis of bad and misinformed policy. Developing a theoretical framework Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Definition of the multidimensional phenomenon to be studied. • Determination of its sub-dimensions. • Identification of selection criteria for underlying indicators. Experts and stakeholders need to be consulted at this pivotal initial stage. Be wary of logical inconsistencies. E.g., it is generally not good practice to mix inputs and outputs. E.g., if you want to measure environmental conditions (output), it may not be a good idea to include, say, environmental subsidies (input) in the index. Selecting indicators INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou • Importance of an indicator • Proper unit of analysis Literature review, expert opinion, statistical analysis • Sign of an indicator • Sound variables (merge different sources of information, survey data, proxy variables …) • Survey data: sampling size, representation, translation! different years, Selecting indicators INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou • Importance of an indicator • Proper unit of analysis • Sign of an indicator Country, region, household, individual (not all indicators are available at the unit • Sound variables (merge different sources of information, survey data, different years, of analysis) proxy variables …) • Survey data: sampling size, representation, translation! Selecting indicators Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Importance of an indicator • Proper unit of analysis • Sign of an indicator E.g. social cohesion – OECD lists 35 • Sound variables (merge different sources of information, survey data, of different indicators: e.g., number marriagesyears, and divorces, number of foreigners. proxy variables …) Q: Does a large group of foreign-born • Survey data: sampling size, representation, translation! population decrease social cohesion? (yes/no ο sign) Selecting indicators INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou • Importance of an indicator • Proper unit of analysis • Sign of an indicator Sound data providers & descriptive statistics on indicators • Sound variables (merge different sources of information, survey data, different years, proxy variables …) • Survey data: sampling size, representativeness Suggestion: choose roughly 5-10 indicators per dimension Imputation of missing data: Methods Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Hot-deck imputation: use actual data of other “similar” entities • Cold-deck imputation: use data from external source • Unconditional sample mean/median/mode • Regression imputation: missing data are substituted by values predicted by regression (look for strong correlations) • Expectation maximization (EM): uses concepts from maximum likelihood estimation • Markov Chain Monte Carlo (MCMC) multiple imputation Suggestions: 1. Not more than 25% missing data per indicator, country or dimension 2. Prefer variations of hot-deck method, or imputation based on correlations, to unconditional sample mean. Outlier detection and treatment Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 •Important: Detect outliers before it is too lateο they may distort the correlation structure and become unintended benchmarks during normalization JRC’s Rule of Thumb:|Skewness|>2 AND kurtosis>3.5 flag problematic indicators that need to be treated before the final index construction. Normalization Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 Why normalize? To adjust for 1. Different nature of indicators (positive vs. negative orientation towards the index). 2. Asymmetric distribution, different variances, and outliers. 3. Different units of measurement. 4. Different ranges of variation. Different methods of normalization Stergios Athanasoglou 1. Linear Scale a) Min-max rescaling b) Standardization (z-scores) 2. Ordinal Scale a) Ranks b) Categorical scales 3. Ratio Scale a) Distance from benchmark peer (e.g., country) b) Distance from benchmark year INQUIMUS workshop Salzburg, 15-17 September 2014 Which normalization method to use? Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Each method has pros and cons. • Choice will depend on the context. • However…important to note that composite indicators often criticized on account of the potential arbitrariness of normalization --especially since different methods may lead to different rankings. • The above being said, max-min rescaling can be justified axiomatically (see Theorem 1 in Chakravarty 2003) Statistical Coherence Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 Statistical soundness of the composite indicator is studied in two basic ways: 1. Correlation Analysis • Detect excessive collinearity (suggestive of double-counting) between indicators. • Detect indicators that behave as noise • Detect indicators with problematic correlation (e.g., negative correlations where they are supposed to be positive) 2. Principal Component Analysis • Verify the internal consistency of the CI within each pillar and sub-pillar. Correlation Analysis: Suggestions Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Ensure index pillars are strongly positively and, as much as possible, evenly correlated with the composite index. • Moreover, ensure that they are positively (and as much as possible) evenly correlated with each other, though not excessively so (to avoid doublecounting) • Ensure the above at all consecutive levels of aggregation of the index (i.e., pillar-sub pillar, sub pillar-indicator, etc) Statistical Coherence: Suggestions Stergios Athanasoglou • Ensure that within each level of aggregation (index, pillar, sub-pillar) a single principal component explains most of the variance. Especially important at the first level of aggregation. • • INQUIMUS workshop Salzburg, 15-17 September 2014 This confirms the conceptual framework of the index, suggesting that components are all measuring the same concept. Thus, aggregating them makes sense. It is desirable that indicator loadings within the principal factor are also roughly even. • This supports the adoption of equal weights within the pillar, sub-pillar, etc. Statistical Coherence: EPI Example based on JRC Audit Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • The EPI index comprises two overarching policy objectives β Environmental Health Ecosystem Vitality. These two policy objectives are made up of three and six issue areas, respectively. (Each issue area consists of sets of 1 to 4 indicators.) • Structure and weighting scheme below (linear aggregation) Statistical Coherence: Example Correlations INQUIMUS workshop Salzburg, 15-17 September 2014 Health Impacts Air Quality Water & Sanitation Water Resources Agriculture Ecosystem Vitality Forests Fisheries Biodiversity & Habitat Climate & Energy correlation between indicator and issue area correlation between indicator and objective correlation between indicator and EPI correlation between issue area and objective correlation between objective and EPI Indicators Issue Areas Environmental Health Policy objectives Stergios Athanasoglou Child Mortality Household Air Quality 1.00 0.62 Average Exposure to PM2.5 0.73 0.02ns -0.06 ns PM2.5 Exceedance 0.68 -0.08 ns -0.14 ns Access to Drinking Water Access to Sanitation Wastewater Treatment 0.95 0.96 1.00 0.90 0.90 0.75 0.84 0.81 0.81 Agricultural Subsidies 0.57 -0.53* -0.59* Pesticide Regulation Change in Forest Cover 0.62 1.00 0.41 0.14 ns 0.43 0.22 Coastal Shelf Fishing Pressure 0.95 -0.05 ns -0.07 ns Fish Stocks National Biome Weights Global Biome Weights Marine Protected Areas Critical Habitat Protection Trend in Carbon Intensity Change of Trend in Carbon Intensity Trend in CO2 Emissions per KWH 0.55 0.93 0.93 0.65 0.74 0.54 0.12 ns 0.58 0.58 0.55 0.44 0.35 0.58 0.11 ns -0.07 ns 0.60 0.93 0.88 0.34 0.85 0.75 0.05 ns 0.41 0.42 0.43 0.34 0.24 0.93 0.60 0.88 0.95 Red: wrong direction Purple: noise Orange: uneven (with overlaps) 0.75 -0.02ns 0.14 ns -0.01ns 0.90 0.67 0.61 0.39 Source: Athanasoglou S, Weziak-Bialowolska D, Saisana M. Environmental Performance Index 2014 JRC Analysis and Recommendations, EUR 26623 Statistical Coherence: Example PCA Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Principal Component Analysis: 1. EH objective can be considered one-dimensional with one latent variable explaining 71.1% of the variance in the three issue areas. 2. Two latent dimensions between the six issue areas of the EV objective. The first one captures 26% of the variance and is described by Water Resources, Agriculture, Climate & Energy, Fisheries (in part). The second captures 24% of the variance described by Forests, Biodiversity & Habitat, and Fisheries (in part). • Unclear whether aggregating them into one policy objective is supported by the data. Weighting & Aggregation Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • The choice of aggregation function and weights is among the most contentious in the construction of a composite indicator. • Often these choices are not grounded in economic theory or a coherent normative framework, sparking backlash (Ravallion, 2012). • In the field of environmental composite indices, the concept of meaningfulness is particularly important: that is, we would like the index recommendations to be invariant to plausible transformations of the underlying variables (switching scales for instance). Aggregation INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou • Most of the time use simple linear aggregation. • Parameterized by π½ ∈ ℜ , the generalized weighted mean of a vector π₯ given weights π€ is given by π½ π¦ π½ π₯, π€ = ∑π€π π₯π 1 π½ . • When π½ = 1 (0), weighted arithmetic (geometric) mean. • π½ can be interpreted in terms of the elasticity of substitution between the different 1 dimensions of the index, π , where π = . The smaller the value of π½, the 1−π½ lower the substitutability between the different dimensions of performance. • For values of π½ < 1, generalized weighted means reflect a preference for balanced performance across the different dimensions of the index. Weighting Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Generalized weighted means can be justified axiomatically (Blackorby and Donaldson, 1982). • However, this still leaves open the issue of choosing weights. Their choice can be fraught with complex philosophical dilemmas (Foster and Sen, 1997). • Many different methods for setting weights, including (see Decancq and Lugo, 2013 for a literature review): 1. 2. 3. 4. 5. 6. 7. Principal component and factor analyses, Regression (stated preference or hedonic) Data envelopment analysis, Public opinion polls, Budget allocation, Analytic Hierarchy Process, and Expert consultation, among others. Weights & Aggregation: Pitfall 1 INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou • Most composite indicators are linear, i.e. weighted arithmetic averages. Weights are understood to reflect variables’ importance in the index (e.g. budget allocation) • Caution: Nominal weights are not always a measure of variables’ statistical importance • Example: A dean wants to rank teachers based on ‘hours of teaching’ and ‘number of publications’, adding these two variables up Assume: X1 = Hours of teaching X2 = Number of publications Y = 0.5 X1 + 0.5 X2 Weights & Aggregation: Pitfall 1 Stergios Athanasoglou • ρ(x1, x2) =−0.151, σ(x1) = 614, σ(x2) = 116, σ(y) = 162 • Estimated π 21 = 0.075 and π 2 2 = 0.826 • Teachers’ ranking is mostly driven by the number of publications INQUIMUS workshop Salzburg, 15-17 September 2014 Weights & Aggregation: Pitfall 1 INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou To address this, the dean substitutes the model Y = 0.5 X1 + 0.5 X2 with Y = 0.7 X1 + 0.3 X2 • This weighting indeed leads to much more balanced correlations. • Inasmuch as “importance” is taken to mean “correlation with composite index”, teaching and publications are now equal. • However, a professor comes by, looks at the last formula, and complains that publishing is disregarded in the department … • See Paruolo et al. (2013) for a deeper discussion on weights as importance coefficients Weights & Aggregation: Pitfall 2 Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Unwanted trade-offs between dimensions may emerge, formalized via the marginal rate of substitution: the amount of one dimension needed to compensate for the loss of another. • Consider 2010 HDI in which geometric aggregation was introduced. A study by Martin Ravallion (2012, J Dev. Econ) found that the new version of the HDI index inadvertently lead to: • • An implicit monetary valuation of longevity that is sharply increasing in income. Extremely high monetary valuations for schooling (again sharply increasing in income) Source: Ravallion, 2012 Weights & Aggregation: Pitfall 3 Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Environmental indices are deemed meaningful, only if their recommendations are invariant to reasonable transformations of the data (Ebert and Welsch, 2004). • E.g., suppose an index of environmental stress combines eutrophication and acidification. • We would want the index rankings to be insensitive to whether we decide to measure eutrophication in terms of phosphorus or nitrogen (=*10) equivalents (ratio-scale transformation). • This requirement implies that we must use geometric aggregation. Arithmetic aggregation will violate this property sooner or later! • Ensuring meaningfulness for sets of non-comparable interval-scale transformations (e.g., Celsius to Fahrenheit) is impossible --unless we normalize variables by max-min rescaling (which, in turn, requires justification!). Uncertainty Analysis Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Many modelling choices (input factors) going into the construction of a composite indicator are to a large degree subjective and open to debate: • • • • • • Weights Aggregation Normalization Inclusion/exclusion of indicators Imputation techniques … Testing the robustness of an index to different modelling choices is of paramount importance. Uncertainty Analysis INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou As a result, an uncertainty analysis should include a careful mapping of all these uncertainties onto the space of the output. Two things can happen: The space of the inference is still narrow enough, so as to be meaningful. GREAT! The space of the inference is too wide to be meaningful. (What “too wide” means is open to interpretation…) Revise the CI, or further collect indicators Uncertainty Analysis: EPI 2014 uncertain weights and aggregation INQUIMUS workshop Salzburg, 15-17 September 2014 Stergios Athanasoglou Reference I. Uncertainty in the aggregation formula II. Uncertainty in the weights Weighted arithmetic average, i.e., Reference value for the weight Environmental Health 0.4 Ecosystem Vitality 0.6 Alternative Generalized weighted mean Distribution assigned for robustness analysis Uncertainty analysis results for 2014 EPI country ranks (based on 500 weight-aggregation pairs) Source: Athanasoglou S, Weziak-Bialowolska D, Saisana M. Environmental Performance Index 2014 JRC Analysis and Recommendations, EUR 26623 Sensitivity Analysis Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Given the results of an uncertainty analysis, it is important to understand which assumptions most drive variation in index results. • This is accomplished via a sensitivity analysis. • A compelling aggregate measure of robustness can be found in the average shift in rank πΔπ that uncertain input factors lead to. • Define the sensitivity index ππ to be the fractional contribution to the sample variance of πΔπ due to uncertain factor π. Put differently: the expected reduction in variance of πΔπ that would be obtained if that factor could be fixed. • Estimation of sensitivity indices is an active field of research. Sensitivity Analysis: EPI 2014 index Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 • Example: 2014 EPI with uncertain weights π€ and aggregation π½ at policy objective level. ππ€ = 0.04 ππ½ = 0.94 Source: Athanasoglou et al., 2014 • The choice of aggregation function accounts for 94% of the sample variance of πΔπ , while the weights just 4%. Clearly, EPI results are far more sensitive to the former than the latter. Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 Thank You References Stergios Athanasoglou INQUIMUS workshop Salzburg, 15-17 September 2014 Athanasoglou S, Weziak-Bialowolska D, Saisana M. Environmental Performance Index 2014 JRC Analysis and Recommendations, EUR 26623 Blackorby, C., and D. Donaldson (1982), “Ratio-scale and translation-scale full interpersonal comparability without domain restrictions,” International Economic Review, 23, 249-268. Chakravarty, S. R. (2003), "A generalized human development index." Review of Development Economics 7, 99-114. Ebert, U., and H. Welsch (2004), “Meaningful environmental indices: a social choice approach,” Journal of Environmental Economics and Management, 47(2), 270-283. Decancq, K. and M.A. Lugo (2013), “Weights in Multidimensional Indices of wellbeing: An Overview,'' Econometric Reviews, 32, 7--34. OECD and European Commission JRC, Handbook on Constructing Composite Indicators, OECD Publications, Paris, France, 2008 Paruolo, P, M. Saisana, and A. Saltelli (2013), “Ratings and rankings: voodoo or science?” Journal of the Royal Statistical Society: A, 176, no. 3 (2013): 609-634. Ravallion, M. (2012), “Troubling Tradeoffs in the Human Development Index,'' Journal of Development Economics, 99, 201-209. Ravallion, M. (2012), “Mashup Indices of Development,'' World Bank Research Observer, 27, 1-32. Saisana, M., A. Saltelli, and S. Tarantola (2005), “Uncertainty and Sensitivity Analysis as Tools for the Quality of Composite Indicators,'' Journal of the Royal Statistical Society A, 168, 1--17.