Assessing The Quality of the Quality of Government Data: A Sensitivity Test of the
World Bank Government Indicators.
________________________________________________________________________
Abstract: This study aims at assessing the quality of one of most often employed indicators of Quality of Government (QoG), namely the World Bank Government
Indicators (WGI). The WGI has faced numerous critiques in recent years and while the authors have defended their data on many fronts, there remain several empirical concerns that have been unexplored. Critiques about the data’s precision, internal consistency, robustness and the transparency of the indicators have been questioned, and until now, untested. This analysis attempts to fill in these gaps by performing a sensitivity analysis on four of the WGI indicators for the year 2008 on a sample of 27 E.U. countries. We find that the WGI composite indicators are predominantly internally consistent and are remarkably robust to adjustments in the weighting and aggregation scheme of the underlying data and to the exclusion of any one underlying indicator.
Key words: governance, corruption, sensitivity testing, world bank,
Nicholas Charron, PhD
Research Fellow and Assistant Professor
The Quality of Government Institute
Department of Political Science
The University of Gothenburg, Sweden
Sprängkullsgatan 19
Box 711, SE-405 30 Göteborg, Sweden
Phone: +46 (0) 31 786 46 89
Fax: +46 (0) 31 786 44 56
1
Introduction
What is the state of the current quality of government data (henceforth QoG) empirical data in the field of development economics and political science? This study seeks to contribute significantly to this broad and challenging question by assessing several primary critiques against one of the most widely used QoG data quantitative indicators – the World Bank Governance Indictors (henceforth WGI) (Kaufmann, Kraay and Mastruzzi 2009 ‘KKM’) for a sub-set of countries - the European Union 27. The
WGI, one of the first QoG data sources to publish free QoG data for a world-wide sample starting back in 1996, is one of the most widely employed empirical sources of QoG assessment, used by scholars in prominent journals, media, policy-makers and aid organizations including the United States Millennium Challenge Account aid program.
Unlike other indicators of QoG, such as Transparency International’s Corruption
Perception Index (CPI) that focus on one specific QoG concept such as corruption, the
WGI covers multiple areas of QoG. Due to its widespread use in scholarly work and policy circles, the WGI has been under intense scrutiny relative to other QoG data for a number of reasons (see Arndt and Oman 2006; Knack 2006; Kurtz and Shrank 2006;
Polit 2008; Thomas 2009; Apaza 2009). This analysis empirically addresses several of those critiques.
Although there are a number of other important critiques of the WGI not addressed here (for a list and rebuttal of said critiques, see Kaufmann, Kraay and Mastruzzi 2006), this analysis attempts to deal with several salient main lines of criticism. This study is the first to demonstrate the uncertainty and sensitivity of four widely used indicators of the WGI. Since the WGI is a composite indicator - combining multiple data sources for each country and weighting them - a sensitivity test can directly answer questions of the
WGI’s internal validity and precision with respect to cross-country comparisons. In addition, the sensitivity test can confront issues regarding potential problems stemming from the weighting or aggregation scheme of the WGI. Finally, the sub-sample of the
E.U. countries is selected intentionally to address the matter ‘common sources’. Unlike many developing areas, the E.U. is a region where much QoG data is available and the
WGI provides many of the same sources for the 27 countries, which is necessary to perform several of the tests. Moreover, there is much variation within the region as well
2
with respect to QoG. We take advantage of this and show the results of the sensitivity test for the 27 countries
.
With regard to uncertainty and sensitivity testing, this analysis follows the advice of
JRC and OECD Handbook on Constructing Composite Indicators (2008). The results for the sensitivity test show that for the E.U. countries in the most recent year of the data
(2008), the WGI are remarkably robust to changes in the weighting scheme, aggregation and removal of underlying data sources. They are also internally consistent. Although this analysis does not address all the critiques of the WGI, it does firmly address several concerns that to this point have not been answered empirically. The scope of this study is also limited to does the E.U.. For this fact, the paper makes a meaningful contribution in that it should ease some of the trepidation for many that are concerned about the internal validity, transparency and precision of the WGI.
The remainder of this paper goes as follows. First, a brief presentation of the WGI and a definition of the individual indicators is provided. Next, a brief presentation of the recent criticisms of the WGI is summarized. We then show the rank order by each indicator of E.U. countries according to the current WGI data. Following this, a test of the data’s internal consistency is performed using principle component analysis (PCA).
We then group countries into cluster-groups according to QoG after a cluster analysis.
Next an uncertainty and sensitivity test are performed on each of the four WGI indicators in this analysis and the results are accordingly reported. A final discussion of the data concludes the paper.
Brief Description of the WGI and the Indicators of QoG
Launched by the World Bank originally in 1996 (Kaufmann, Kraay and Mastruzzi
2009) the WGI was one of the first sources to offer a global dataset freely available for scholars and practitioners researching in the field of QoG. It is a ‘composite index’, which until 2002, was available bi-annually and from 2002-2009 the data has been published annually. It employs a wide scope of data for each indicator, and KKM argue that “each of the individual data sources we have provides an imperfect signal of some
1
The exceptions are the cases of Cyprus and Malta, where data is imputed for a few sources.
3
deep underlying notion of governance that is difficult to observe directly.” (Kaufmann,
Kraay and Mastruzzi 2008: 13). The data contains 6 ‘pillars’ of governance:
1. Control of Corruption
KKM define corruption broadly, which is simply “the abuse of public power for personal gain”. On the ‘control of corruption’ indicator they publish annually, they write for example that the indicator measures “ the extent to which public power is exercised for private gain, including both petty and grand forms of corruption, as well as "capture" of the state by elites and private interests .” Examples of variation in the ways each source captures corruption include: the Business Environment and Enterprise Performance
Survey ( BEEPS), which is undertaken in all former Eastern-bloc countries within the EU, asks firms “How common is for firms to have to pay irregular additional payments to get things done” and “How problematic is corruption for the growth of your business.”
Another firm survey included in the composite index, the World Economic Forum Global
Competitiveness Survey (GCS) ask business leaders specific questions such as the frequency a firm might have to make extra payments in connection with trade permits, loan applications, taxes, and to obtain public contracts. On the other hand, the
Bertelsmann Transformation Index collects data on the same sample of EU states as the
BEEPS, but their data on corruption only captures the extent to which an anti-corruption agency has been established and how effective it is perceived to be in carrying out its mandate.
2. Rule of Law
The underlying individual data of ROL include concepts such as: judiciary independence, property rights, the level of organized crime, respect for contracts, human trafficking, money laundering, and trust of police and the courts. For example, the World Bank uses the Cingranelli and Richards
measure as part of their ROL for ‘public sector data providers’, which has a narrow definition of ROL based solely on the independence of the judiciary, scored as 0, 1 or 2 (not independent to ‘generally independent’).
2
See http://ciri.binghamton.edu/documentation/ciri_variables_short_descriptions.pdf
for more information on this particular dataset.
4
Conversely, several of the business firm surveys employed by the WB such as the
Heritage Foundation or the Business Environment Risk Intelligence focus their measurement of the ROL on property rights or contract enforcement by businesses and individuals. Further, household surveys such as the Gallop World Poll includes questions regarding individuals’ confidence in the police force, the judiciary or if they have ever been the victim of a crime
.
3. Government Effectiveness
KKM define this indicator as “capturing perceptions of the quality of public services, the quality of the civil service and the degree of its independence from political pressures, the quality of policy formulation and implementation, and the credibility of the government's commitment to such policies.” (Kaufmann, Kraay & Mastruzzi 2009: 6). The composite index is comprised of such data as: the Gallup World Poll’s data that focuses exclusively on citizens’ satisfaction with public education, public transit along with roads and highways. In the World Economic Forum’s (GCS) survey, leaders of firms respond to questions about a country’s infrastructure, along with how much time business leaders interact with government officials in the civil service. The Economist Intelligence Unit
(EIU) asks experts to assess the amount of bureaucratic excess (red tape) and the
‘institutional effectiveness’ of a country’s civil service.
4. Voice & Accountability
KKM describe this pillar of QoG as “capturing perceptions of the extent to which a country's citizens are able to participate in selecting their government, as well as freedom of expression, freedom of association, and a free media.” (KKM 2009: 6). Examples of different underlying data measuring this concept are: the Gray Area Dynamics (GAD) expert assessment, which measures religious freedom, the military’s involvement in politics, political patronage, and the role of the opposition – a fairly broad framework.
The EIU uses 5 criteria to rate each country- based on human rights, accountability of
3
The World Bank’s annual report on their governance indicators carfeully elucidates which sources capture which aspects of the variables in question. See: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1148386 for Kaufmann, Kraay and Mastruzzi’s (2008) ful description of each measured concept.
5
political figures, ‘vested interests’, freedom of association and ‘democracy index’. On the other hand, Gallup World (GWP) and The Reporters without Borders index (RSF) data used in the index is more narrowly focused – the former asking respondents about the fairness/freeness of elections and the absence of political violence, while the later focusing exclusively on freedom of the press from the government.
Two other indicators are also included in the WGI data
:
5. Political Stability & the Absence of Violence
6. Regulatory Quality
The ‘composite’ aspect of the data means that it aggregates and amalgamates survey data from firms, experts, and citizens from countries world-wide. The underlying data are collected from four main types of sources: non-governmental organizations (NGO’s), firm and citizen survey data, public sector agencies, and commercial interest/ risk assessment providers for businesses
. All data has been standardized for each year, so that the world average is equal to ‘0’ with a standard deviation of ‘1’. KKM also provide margins of error for all indicators for each country-year. In this analysis, only the first four pillars are taken into consideration in the sensitivity analysis for the E.U. sample.
A Brief Overview of the Critics of the WGI
As noted, the WGI is commonly used by academics, media, NGO’s and aid organizations across the world to assess different aspects of QoG, and thus it has rightfully drawn a number of recent criticisms
. Some, such as Pollitt (2008) have questioned the entire endeavour of KKM, calling their effort to quantify QoG “fruitless”, and too difficult for
“lay people” to understand or interpret correctly. Other critiques focus on more specific aspects of the data itself, such as Kurtz and Shrank (2007), who try to demonstrate that the data is biased towards the interest of international business elites and that countries
4
For reasons due to lack of variation of the data in the sample (in particular for political stability), along with time constraints, this analysis does not include ‘ Political Stability & the Absence of Violence’ or
5
‘Regulatory Quality’
For a more thorough description of the data and background of the WGI, see Kaufmann, Kraay and
6
Amstruzzi 2009.
For a deeper look into the criticisms of the WGI, KKM (2006) have summarized them in to 11 separate categories.
6
are rewarded for economic growth and not necessarily for improving QoG
. Knack
(2006) and Arndt and Oman (2006) argue there are problems in using the data to compare
QoG across space and time due to different underlying sources or imprecision in the data.
The weighting scheme is also argued to be problematic due to the fact that risk assessment agencies may make correlated errors (essentially by free riding on one another), and thus the underlying data is not uncorrelated as assumed by KKM, and this results in overvaluing certain data sources based on the weighting scheme (Arndt and
Oman 2006; Knack 2006). Thomas (2009) questions the “construct validity” of the data, arguing that the scope of the definition of the QoG indicators themselves is too vague to be meaningful. She also criticises the WGI of lacking “convergent and discriminate validity”. The former means that the underlying data which make up each composite indicator need to be sufficiently correlated, while the separate indicators themselves, such as corruption or rule of law, should be sufficiently uncorrelated. Finally, she critiques the
WGI for a lack of transparency (Thomas 2009).
KKM have aptly addressed specifically several of these and other critiques in several recent articles (KKM 2006, 2007), and continue to defend the WGI in the face of concerns over issues about the data’s relevance or definitional validity. However, a few critiques remain empirically unexplored. First, the idea that the data are not sufficiently internally valid or consistent is taken up directly in this analysis. Second, the concern of whether the weighting scheme or certain underlying data are driving the results is also assessed. Finally, issues of the data’s precision with respect to cross-country comparisons are also addressed.
QoG in the E.U. According to the WGI
Table 1 shows the current rankings based on the WGI data for the E.U. 27 for each indicator based on the 2008 data. A quick glance across the four areas of QoG demonstrate that the countries are relatively stable – the Scandinavian and Northern
European countries rank consistently high, while Southern-central Europe generally scores in the middle of the group and the New Member States (NMS) along with Italy and Greece generally score low within the E.U. sample.
7
KKM (2007) answer this critique directly in their publication in the Journal of Politics
7
***Table 1 about here***
Yet how stable and robust are these rankings for E.U. countries? The rest of this analysis goes as follows: we first analyze the internal consistency of the data to test whether it is internally consistent from a statistical point of view. We test whether the underlying data in the 4 pillars of QoG are in fact internally consistent. Next, we perform a cluster analysis so as to place EU countries in appropriate groups, from which more realistic comparisons can be established. Following this, we perform a multi-modelled uncertainty and sensitivity analysis to see if the rankings derived from the WGI are significantly altered or if they remain robust to changes. More specifically, we challenge some of the initial assumptions made by the creators of the data with respect to aggregation and weighting schemes. We also test the impact of each individual indicator by removing it from each pillar one by one.
The Sensitivity Test
“…it is hard to imagine that the debate on the use of composite indicators will ever be settled […] official statisticians may tend to resent composite indicators, whereby a lot of work in data collection and editing is
“wasted” or “hidden” behind a single number of dubious significance. On the other hand, the temptation of stakeholders and practitioners to summarise complex and sometime elusive processes (e.g. sustainability, single market policy, etc.) into a single figure to benchmark country performance for policy consumption seems likewise irresistible.
” Andrea Saltelli, JRC
The WGI publishes all of their underlying data for each country-year on their website, along with the weighting scheme for each country. All underlying indicators for each country have been taken for 2008 data directly from their website
. In the following sections, we test the internal consistency of the data, and the sensitivity of the data in terms of how different the overall E.U. rankings become when changes are made to the original road-map of KKM in building the indicators from the underlying data. It is hoped that in addressing these primary matters related to the internal structure of the data itself, several of the aforementioned critiques will be aptly addressed.
Internal Consistency of the Data
8 http://info.worldbank.org/governance/wgi/index.asp
8
We begin with the following question - is each of the individual indicators itself internally consistent? To test this question, we use a commonly employed method in the literature – namely Principle Component Analysis (PCA). The World Bank classifies each of the pillars of governance independently from one another and do not specify any
‘sub-pillars’ which exist in building such indices as ‘rule of law’ or ‘control of corruption’. The internal consistency checks in the data should demonstrate that this is the case.
We begin by testing the internal consistency of each of the 4 WGI individually.
Since the World Bank data does not alert users of the data to any ‘sub-pillars’ – e.g. significant clusters in the data that might need to be accounted for in the weighting scheme – we thus expect that each of the four measures of QoG will contain only one significant factor. Because of the need for common data sources, we were forced to drop the 2-3 sources that are region-specific to the former Eastern group and rely on the sources which included all MS’s
. Although the number of sources varies from country to country in the dataset covering all countries in the world (some small island states have only one source for example for a given pillar while some states have more than 15) we have selected the countries of the E.U. for this particular reason – there are at least 9 common sources for each individual indicator of QoG for the WGI, and in the case of
Rule of Law (RL), there are 12. Moreover, to test the internal consistency of the data with common sources only addresses a main concern of the critics of the data, namely that comparisons are difficult to make when the underlying data sources are different from country to country. Table 2 shows a PCA for each of the four indicators of QoG individually.
***Table 2 about here***
Government Effectiveness
•
Using the standard rules of the Kaiser criteria - factors with Eigenvalues greater than ‘1’ that explain 10% or more of the variance, we find only 1 significant
9
We took only sources that included 24 or more MS’s, thus at times; we had to impute missing data, in particular for Malta and Luxembourg. We did so with the simple method of ‘mean substitution’. See the appendix for a full list of the individual data sources used in the E.U. 27 sample for the 4 pillars of WGI.
9
factor in the 9 underlying indicators that make up this pillar. The single factor has an Eigenvalue of 5.41 and explains over 60% of the variance of the data.
•
The next largest factor was close – explaining over 8% of the variance with an
Eigenvalue of 0.9.
•
This demonstrates strong internal consistency in the data which corroborates the
World Bank’s decision to condense this data into a single index. When aggregating the underlying data to form this pillar – as well as using GE to generate the total QoG composite index – we can be fairly certain that significant clusters in the underlying data do not exist.
Control of Corruption
•
Only one significant factor found in the data (9 underlying sources). The factor has an Eigenvalue of 0.69 and itself explains over three-fourths of the total variance.
•
The next largest factor had an Eigenvalue under 0.7 and explained less than
6% of the variation.
•
Results of the PCA point to a very internally consistent and related set of underlying indicators.
Rule of Law
•
The Rule of Law pillar exhibited two significant factors in underlying data. The first, which explains over 60% of the variance itself and has an Eigenvalue of
7.52, is clearly the most important factor in the data. Factor 2 has an Eigenvalue of 1.41 and explains 12% of the variance. Together, the two factors explain about three-fourths of the total variance in the data.
•
After rotating the factor (Verimax method), we find that 9 of the 12 indicators load onto factor 1, while the other 3 (DRI, GAD and TPR) load onto factor 2.
•
The later three factors share a common thread in that they focus heavily on human trafficking and organized crime – rather than judicial independence or property rights as many of the other indicators do.
10
•
Suggests the need testing the sensitivity of the RoL indicator using two separately weighted factors instead of the single index. However, this may not be a serious problem in that factor 1 in and of has a relatively high Eigenvalue and explains over 0.6 of the proportion of variance – with all of the underlying variables positively loading on it with at least a weight of 0.3 suggesting a certain degree of internal consistency among all 12 indicators.
Voice & Accountability
•
Two significant factors were found in the VA pillar also. Again, the first explains about 60% of the total variance and has an Eigenvalue of about 6, while the second factor is significant according to the Kaiser rule in that it exceeds the
Eigenvalue threshold of 1 and explains over 10% of the total variance.
•
After rotating the factor (Verimax method), we find that 8 of the 10 indicators load onto factor 1, while the other 2 (RSF & HUM) load onto factor 2.
•
Although there is no direct common theoretical thread in the underlying make-up of the two indicators in factor 2 (RSF asks exclusively about press freedom, while
HUM deals with wider human rights issues) they are the underlying variables with the least amount of variance for the MS’s – most of the EU 27 rank very high on both of these measures.
•
Suggests the need testing the sensitivity of the VA indicator using two separately weighted factors instead of the single index. Although the two factors were found, again, as with the RoL, all 10 indicators are positively loaded onto factor 1 with weights of at least 0.35. Thus internal consistency of the data is present to a certain degree.
Table 3 explores the relationship between the underlying data and its respective pillar.
To demonstrate 100% internal consistency of underlying variables, each indicator would have a positive and significant relationship with its pillar. In fact, this is nearly the case.
Of the 59 separate underlying indicators of the 4 indicators of QoG, only 4 are in the opposite direction with both their respective pillar. However, their relationship appears
11
random (indistinguishable from zero) as their p-values do not reach even the 90% level of confidence. These variables come from two sources, which are:
•
BEEPS GE and CC scores (BPS), the Global Integrity Index for CC and RL (GII)
We believe that this is less of a problem than it might appear in that they only pertain to the sub-set of former Eastern-bloc states and are thus weighted less in the original data than the common sources. Of the remaining data, and 6 sources are insignificantly related to their respective pillars. These data come from the following sources:
•
The Bartelsman Index for GE (BTI), the BEEPS for RL (BPS), the Institutional
Profile Database for RL (IPD), the Global Integrity Index for VA (GII) and the
Media Sustainability Index for VA (MSI). The MSI is understandable as is only has 2 observations. In addition, the Gray Area Dynamics measure for RL (GAD) is insignificantly related to the RL pillar.
Cluster Analysis
***Figure 1 about here***
Having shown that the underlying data of the four indicators is mostly internally consistent, we perform a cluster analysis on the data for the E.U. 27. What can we learn from cluster groupings? Although several EU states demonstrate similar QoG scores, there might be patterns in the underlying data that distinguish countries – suggesting different challenges despite sharing a similar overall score. As has been noted by KKM, the WGI is a tool that scholars and practitioners can use to rate the relative position of countries (and not absolutely expect to rank countries with pinpoint accuracy). For example, while finding any existing measure of QoG that can reliably distinguish between Sweden and Denmark or Romania and Bulgaria would be an admittedly all but impossible task, we can use existing measures to accurately point out relative standing to other groups of countries. With the fact that most all governance data is subjective in nature and thus an estimate of the true value of governance rather than representing the
‘real value’, we apply the cluster analysis to elucidate peer countries, which are similarly
12
ranked in the 4 pillars of QoG. The cluster groupings can serve as a helpful tool to identify EU member states which share common challenges to building QoG at the national level.
For the sake of simplicity, we take a simple mean of the four QoG indicators for each country in 2008 and employ hierarchical clustering, using Ward’s method and squared
Euclidian distancing for the four indicators of QoG to spot the number of appropriate cluster groupings. Three distinct groups were detected in the analysis. Then we used kmeans clustering with squared Euclidian distancing to assign each country to a cluster
.
Although there are geopolitical and historical similarities among the groups, they are not entirely driven by such factors. For example, new MS’s of the former eastern bloc,
Estonia and Slovenia are grouped with EU-15 MS’s such as Spain, Belgium and France.
Other EU-15 MS’s Italy and Greece belong to the third cluster grouping. However, the group containing the top nine performers with respect to QoG is made up of all EU-15 states from Northern and Central parts of Europe. Figure 1 shows a visual of the distribution of states into the three groups.
Again, as we are not claiming with certainty that these group are ‘set in stone’, the pattern however tells us that cluster 1 countries exhibit relatively high levels of QoG, while cluster 2 and cluster 3 show good and moderate performance respectably. There is also considerable variation within the E.U. (compared with other countries of the world as well) on several of the indicators. Several states in the top group – in particular
Denmark, Sweden, Finland and the Netherlands – rank consistently in top four to five countries in Europe on most all pillars and underlying indicators and in the top 5% worldwide on most all QoG data. It is also worth noting that, according to the World
Bank data all MS’s are found in the top half of the world ranking in ‘voice and accountability’
. On the other 3 pillars of QoG, there are times at which a few countries’ scores are indistinguishable from the 3 rd
quartile of countries in the world (the ones that rank between 25% and 50%) and the 2 nd
quartile (50% to 75% percent), such as Bulgaria,
10
In addition, we clustered countries with a ‘maximum between cluster difference’ method and found that only France and Czech Republic would belong to a different group (group 1 and 2 respectively). After carefully looking at the data and their relative proximity to their neighbors within both groups, we made the judgment that the placement in the squared Euclidian clustering was the more appropriate choice.
11
Applying the 90% confidence intervals provided by Kaufmann et al (2009) one can calculate the relative standing of a state to any one of the 4 ’quartiles’ with relative ease as scores are normalized with a mean of
’0’ and SD of ’1’.
13
Romania, Greece, Latvia and Italy – particularly on corruption scores. However, none of the MS’s are ever found in the 4 th
quartile (the lowest 25%) and are all statistically distinguishable from this group on all four pillars according to the margins of error provided by KKM.
In Figure 2, we show the pillar averages for the 3 cluster groupings. Although the 3 groups clearly distinguish themselves on all 4 components of QoG along with the total scores, the difference vary according to certain pillars. For example, the 3 cluster groups are most clearly distinguished with respect to ‘control of corruption’ scores, while the
MS’s scores converge when looking at the measure for the ‘rule of law’. On both VA and GE, the mean cluster scores more or less resemble the overall QoG scores. On none of the individual pillars do any of the groups overlap – suggesting strong internal consistency of cluster groupings.
Uncertainly and Sensitivity Analyses
In any composite indicator there are a number of assumptions and creative decisions that the modellers make in order to put together the various underlying indicators into a single number. As Cherchye et al (2008) argue, “there is no recipe for building composite indicators that is at the same time universally applicable and sufficiently detailed.” Thus in any composite index, modellers face potential criticisms about what they ‘could have done’ when building the underlying variables and how basic changes in the assumptions of weighting variables or aggregating them together might alter the results.
In ranking E.U. MS’s for any of the QoG indicators of the WGI, which is based on such a composite index, it is important to utilize as many existing alternative methods in building a composite measure so as to reduce the level of skewness or biases towards certain countries from the original assumptions of the index. We therefore address several questions about how stable the index is (e.g. how much the rankings shift for any one of the indicators). We then point out those countries that are particularly prone to volatility based on alterations from the original model.
In this section, we run several uncertainty analyses (UA), which employ evaluating the affect of alternative models on the ranking of countries in each of the 4 pillars of
QoG. In particular, we focus on the choice of weights by the original modellers as well
14
as the aggregation method. Of course, there is no way we can establish the ‘true ranking’ for each individual country as each indicator of the 4 pillars of QoG is only an estimate of the perceived level of corruption, rule of law, etc. However, by testing a number of alternative assumptions about the weighting and aggregation schemes, we can evaluate the robustness of the measure used to calculate the rankings. Uncertainty tests, while still relatively new in the field, have been used by previous studies evaluating such composite indices as the Environmental Performance Index (Saisana & Saltelli 2008) and the
Ibrahim Index of African Governance (Saisana, Annoni & Nardo 2009). As these studies do, we follow the guidelines and advice for UA set forth by the OECD (2008).
A brief description of the original data’s assumptions
The WB’s governance indicators are built using an Unobserved Components Model
(UCM) due to the uneven amount of underlying indicators each country may contain in any given pillar of QoG. Each underlying indicator is normalized using a min-max method so that it is bound from 0-1, with higher scores equating to better QoG. The scores are then standardized by subtracting each country’s score on indicator ‘i’ from the mean value and dividing it over the standard deviation for indicator ‘i’. The weights are essentially ‘country specific’ in that each indicator is weighted uniquely (outlier variables are weighted less for instance than those that are closer to other indicators) yet because each country potentially has a different number of underlying indicators, the adjusted weights can have a country specific effect
. Such a method allows for the use of regional-specific data as well as survey data with less than global coverage.
The indicators are then aggregated arithmetically, taking the sum-product of the weights multiplied by the standardized indicator and then dividing each country’s score over the number of indicators it has multiplied by the weights in order to compensate for difference sin the raw number of indicators. Once the indicators are aggregated, they are again standardized so that the global average for each year is ‘0’ with a standard deviation of ‘1’.
While this method allows for different countries to have different numbers of underlying indicators, it rewards conformity of the data so to speak in that it minimizes
12
See a more detailed description of UCM in the appendix of the final report.
15
the impact of outliers (OECD 2008: 101). Moreover, with highly correlated underlying indicators, identification problems could arise (OECD 2008: 101).
Evaluation of the Data: A Multi-Modelling Approach
We gather all underlying data for the four indicators of the WGI for 2008. After successfully replicating the original estimates, we explore three key aspects of the models assumptions in the uncertainty analysis. They address: 1) the aggregation method, 2) the weighting method and 3) the number of indicators included in the indicator.
1. The aggregation method
The World Bank data is aggregated using a linear, additive method. Though this method is very common in contemporary indices there are potential drawbacks of this method.
First there is a strong assumption made of the independence of the underlying indicators.
In essence, the underlying variables are taken independently and thus a poor score in one indicator can be offset by a relatively high score in another indicator. Nardo et al (2005:
79-80) give the following example : “if a hypothetical composite were formed by inequality, environmental degradation, GDP per capita and unemployment, two countries, one with values 21, 1, 1, 1; and the other with 6,6,6,6 would have equal composite if the aggregation is additive. Obviously the two countries would represent very different social conditions that would not be reflected in the composite.”
We take this warning seriously and therefore use the recommended geometric aggregation method as an alternative - which in the above example would give country A a score of 2.14 and country B a 6. Thus one high score (which could potentially be a misleading outlier) does not skew the end result as much as the linear additive method.
For geometric aggregation, we linearly transform the indicators so that all figures are positive (<’0’) and employ the following formula, where ‘z’ is each QoG underlying data point and ‘w’ is the weight assigned to that data
:
WGI countryX
= q
Q
=
1 z w q q , x
2. The Weighting Scheme
13
From the “Handbook on Constructing Composite Indicators” (2008: 104).
16
As noted, the weighting scheme is based on the UCM which can potentially have individual country-specific weights. We employ two alternative weighting schemes in order to test the robustness of the original data. First we use a simple equal weighting
(EW) so that the correlation of the underlying indicators are not taken into account as they are in the original weighting scheme. Second we employ a new set of weights based on principle component analysis (PCA) including only common underlying data for each pillar. When necessary, we use factor rotation and square the factor loadings to derive the new weights. All weights are for each indicator are listed in the appendix.
3. Inclusion/Exclusion of Individual Indicators
For each pillar, we further test the robustness of the index by excluding each indicator relevant to the EU 27 by giving it essentially a ‘0’ weight in the simulation. We do this to find out if the absence of any one data source would significantly alter the results and if so, in what weighting scheme and aggregation context. We report the ten most extreme simulations for each of the four indicators
.
A Brief Note about Missing Data
In the simulations for the FA and geometric weighting schemes, we employ only common data sources, meaning we remove the 2-3 region-specific indicators for former
Eastern and Central Europe for each pillar and any data with less than 24 EU countries.
The vast majority of EU MS’s have between 9 and 12 sources in common depending on the pillar. The exception is mainly Malta (which has an average of 6 indicators per pillar) though Luxembourg and Cyprus have a 1-2 missing data points as well for each pillar compared with other EU countries. In order to maximize the number of common underlying data sources for each indicator, we employ the most simplistic method ( mean substitution) for each missing data point during these simulations, so that they take the value of the EU 27 mean (as opposed to the world mean).
Uncertainty Test Results
In this section, the robustness of each indicator is tested and discussed one at a time.
14
For a full list of the simulation results, please contact the author.
17
1. Rule of Law (RL)
Kaufmann et al (2009) provide a 90% confidence interval for each country’s QoG pillar in each year of the data. Figure 3 shows the 2008 rankings for the EU 27 along with the original confidence intervals.
According to Kaufmann et al (2009), we should only interpret significant differences between two countries if the confidence intervals do not overlap. Thus the first significant difference between Denmark and the next highest ranking state would be
France. On the other hand, although Denmark ranks number 1 in the original data, its score is all but indistinguishable statistically from other high ranking states such as
Austria, Sweden and Finland. For this pillar, Bulgaria and Romania can be distinguished from all other states in the sample. The line in the middle represents the EU average, which all states starting with Belgium (and ranked higher) are significantly above, while all states from Slovenia and below are significantly under.
***Figure 3 about here***
We test the robustness of these rankings in Figure 3 with 78 simulations based on the different aggregation and weighting schemes along with removing each indictor out one at a time. The results from Table 4 show the frequencies of each country’s rank for the
78 simulations. For example, Denmark ranks in the top two positions 90% of the time and 10% of the time is either #3 or #4, while Bulgaria and Romania share the bottom two positions in 100% of the simulations irrespective of any change in any of the assumptions of the index or when any of the indicators are removed. The bold numbers indicate that a state was found to be in the same place in a majority of the simulations and there are 12 such states, with 7 of them belonging to group 1. Only 6 countries - Malta, Cyprus,
Estonia, Hungary, Poland and Italy – were found to rank at least 3 places above or below their original rank place in a majority of cases. The other 9 cases fell within two spots of their original rank, such as the U.K. and Spain. The two countries with the most volatility within the RL pillar were Czech Republic and Cyprus, which were found spanning across 7-8 different rank positions each (both 13-20) in at least 5% of the simulations. The vast majority however remained within 2-3 positions of their original
18
rank and all states (except for possibly Hungary) remaining within their respective cluster blocs.
***Table 4 about here***
Sensitivity Results
We complement the uncertainty analysis with a sensitivity analysis for each pillar, beginning with the rule of law. Here we show the results of the 10 cases that differ most from the original rankings according to the Spearman rank coefficient. We report the median and maximum changes and a Spearman Rank Correlation Coefficient, comparing the altered data to the original RoL index. Table 5 shows the results.
We use the full data then remove each indicator one at a time in the 6 different weighting/aggregation combinations. Overall the results are remarkably robust and the original rankings do not appear to significantly biased at all, with the lowest Spearman coefficient (simulation 30) being .931. Only 6 of the 78 simulations have a median country shift of 2, and the vast majority has a median shift of only 1. Eight of the 10 most extreme cases use geometric aggregation while 6 of the top ten cases use an equal weighting scheme. In cases when the max shift of at least one country was 5 or more places from its original rank, we report that particular country. The largest shift of one state is Slovenia in simulation 30, with FA weighting and arithmetic aggregation when the Eiu variable is removed. Otherwise, the largest shift is 5 places, with Hungary and
Italy doing so on multiple occasions. It is clear that these two countries for example, are at a relative disadvantage by the original weighting and aggregation in the RL data as they generally rank at least 3-5 places higher than their original ranking (25 for Italy, 19 for Hungary) in the majority of cases in the simulations.
***Table 5 about here***
2: Government Effectiveness (GE)
***Figure 4 about here***
We start with a distribution of the original scores, using the WB’s confidence intervals to show significant differences between EU states. Again the top four countries
– Denmark, Sweden, Finland and the Netherlands are statistically indistinguishable from one another based on the estimates and the confidence intervals surrounding them.
19
However, Denmark is statistically higher than all EU state from Austria and downward, while Sweden is so with state from Ireland and downward. All countries from #11
(Belgium) and up distinguish themselves from the mean EU score and from #19
(Slovakia) and down are clearly below the EU average, which re-enforces the cluster groupings from the previous section. States in the middle ranking between 12 and 18
(Malta, Cyprus, Estonia, Portugal, Czech Rep., Slovenia and Spain) are virtually indistinguishable from one another. In Table 6, the uncertainty analysis of these relative rankings is shown.
***Table 6 about here***
Table 7 reports the sensitivity analyses results for the 10 cases that differ most from the original results according to the Spearman rank coefficient from the 60 simulations for the GE pillar. The overall results again show impressive robustness as the median shift is never higher than 2 places and the Spearman coefficient does not fall below .90 in any of the 66 simulations (the low is #47 and # 37 at .907). Based on alternative aggregation and weighting scenarios – in particular geometric aggregation, in which 9 of the 10 most different cases use – several states show a pattern of shifting up or down in the rankings by about 5 or 6 places. Italy, Spain, Poland and Belgium all end up 5 or 6 places higher than their original rankings on several occasions, while the initial GE data seems to have benefitted Slovakia the most, with Hungary and Latvia also ending up lower than their original rank in multiple scenarios. The largest jumps occur in simulation 38 (Slovenia drops 8 spots) twice with Ireland moving from 10 th
place up to
3 rd
in number 17 and 47 and Italy jumps 7 spots four times in the geometric aggregated simulations in both the original and equal weighting in scenarios 34, 39, 44 and 49.
Despite these shifts in a handful of countries, the shift from one cluster to another is very rare - for example, Italy’s frequent shifts up to rank #20 or #19 up from #25 is still within the bounds of group 3 – demonstrating that most of the variation of state rankings for GE is within – and not between – clusters of EU states.
***Table 7 about here***
3. Control of Corruption
20
The results for the control of corruption pillar show that there is more significant variation between countries than in the previous two indicators. Group 1 (from the cluster analysis) almost completely distinguishes itself statistically from both group 2 and
3. Moreover, the gap between Estonia and Slovakia (border states in cluster 2 and 3) is significant at the 90% level of confidence, thus group 2 completely distinguishes itself from group 3 as well. The top 4 countries in particular show remarkably high scores significantly over ‘2’ in the original WB data, meaning that they are at least two standard deviations above the world mean score. Bulgaria and Romania however perform relatively poorly relative to world rankings as they are significantly below world averages according to the 90% confidence interval.
***Figure 5 about here***
We report the results of the uncertainty tests for the 60 simulations for the control of corruption indicator. Again, each country’s proportion of their respective ranking across the 60 simulations is shown in Table 8. The three cluster groupings show remarkable robustness, as every country in the sample stays within their cluster-grouping range.
Further, 19 of the 27 states – including the top and bottom 6 ranked - are found in their original place at least 50% of the time (bold numbers), while those that show up below or above their original ranking in a majority of simulations (for example, the U.K. originally ranked 9, was found between 5 th
and 8 th
place 83% of the time) were only a few places above or below. The uncertainty tests show no signs of the original data negatively biasing any individual country. The largest range is Slovenia, which was found anywhere from 12 th
to 18 th
in the rankings, which are again, within the scope of the second cluster grouping.
***Table 8 here***
The sensitivity analysis results for each individual simulation demonstrate that the
CC scores are by comparison to the other 3 indicators, the most robust of the WGI data.
In fact, in 23 simulations of the 60, the median state does not shift even one place in the rankings and in the other 37 cases; the median shift is only equal to 1. Moreover, the
Spearman rank coefficient does not drop below 0.97 in any of the cases and in no simulation did we find that the maximum shift was larger than four places from the
21
original rankings. We are therefore quite confident of the robustness of the CC rankings from the World Bank data – there is apparently no systematic bias from the original aggregation or weighting methods that hurt or help any EU states in any significant way from the initial rank order.
***Table 9 here***
4. Voice & Accountability (VA)
***figure 6 about here***
The final pillar displays a similar rank order to the first three yet has a much smaller range of scores than the first three indicators. All EU states are significantly over the world average yet the higher ranking states are not 2 standard deviations over the mean as they have been, but approximately 1.5 over the mean, thus making for a tighter grouping in the VA pillar. Therefore the two highest ranking EU states, Sweden and the
Netherlands are statistically indistinguishable all the way down the 11 th
ranked country
(France). Spain’s (ranked 14 th
) confidence interval overlaps with both Belgium’s and
Poland’s, meaning there is a 90% chance that Spain’s ‘true’ VA estimate could be between 7 th
and 23 rd
within the EU for example according to KKM.
***Table 10 about here***
Due to the tighter score-groupings in the VA pillar, we would thus expect a bit more volatility of the rankings than the previous three pillars. We discovered this indeed to be the case. Only 8 countries are found to be in their original rank at least 50% of the time after all simulations. We find that after the 66 simulations, 7 countries (Germany, Malta,
Estonia, Cyprus, Hungary, Greece and Poland) ended up in four different boxes at least
5% of the time, meaning that they could plausible be in one of eight different places in the rankings from where they were in the original VA data depending on the assumptions of the model. Cyprus is the most extreme case. Originally ranked 19 th
, it is only found in the 19 th
or 20 th
ranking in 11% of the simulations, while it reaches the 9 th
or 10 th
ranking
17% of the time and even the 8 th
sport in the rankings in 6% of the simulations. Poland also appears to have been affected negatively by the original assumptions of the data, and ends up ranked either 19 th
or 20 th
in 60% of the simulations, about 4 to 5 places above its original ranking. Even Romania, which until the VA pillar has been in the bottom two
22
spots in almost every simulation, ends up a couple of places higher in a majority of cases.
Notable countries that seem to significantly benefit from the original weights and additive aggregation schemes (at least 4-5 spots ahead of the majority of the simulation outcomes) include Hungary and possibly Germany.
***Table 11 about here***
It is clear from the results of the sensitivity analysis that one state in particular, namely Cyprus, is clearly disadvantaged from the original weights and aggregation methods. For most all cases from the 66 simulations, we see Cyprus jump between 6 and
11 spots in all but a handful of cases, all with either FA or EW weighting schemes in arithmetic aggregation. Poland seems to benefit from geometric aggregation of the indicators, while Latvia jumps up 6 places in the rankings when the Gallop World Poll data is taken out in additive aggregate schemes. Surprisingly, Germany ends up ranked between 13 th
and 15 th
in about 29% of the simulations. Aside from those few cases however, the remainder of the countries are again quite stable relative to their original ranking – in not one case does the median rank rise above 2. Furthermore, the lowest
Spearman Rank coefficient is still above .91 (simulation 31 with additive aggregation and
Factor weights is .916).
In the case of all four indicators, we find that the World Bank data provide reliable estimates of QoG in EU states for the year 2008.
Discussion
The WGI has been one of, if not the most important sources of empirical QoG data in the last 15 years. Its scope of use ranges from academia to media to assessment of international aid by countries such as the U.S. and the Netherlands. Thus questions of validity and reliability are of utmost importance for scholars and policy-makers interested in QoG research. Several recent critiques of the WGI have questioned the data’s internal validity, the notion that without common sources the data cannot be used for crosscountry comparison, which lead to different weighting schemes for each country that could be problematic and finally, the precision of the data has been questioned. All of these critiques have been addressed directly by employing several internal consistency,
23
uncertainty and sensitivity analyses for a sub-group of countries, the E.U. 27. Four
‘pillars’ of QoG were analysed. The data shows to be remarkably internally consistent, in particular the GE and CC indicators. With the help of cluster analysis, we find that the patterns that emerge from the original data rankings are robust to changes in the assumptions of the original data, namely after altering the weighting and aggregation methods along with removing each of the underlying data sources to test whether this had a significant impact on the original data. The answer is that the data is solidly robust, internally reliable and consistent for EU countries. Furthermore, although time consuming, it was indeed possible to replicate the estimates of the WGI by simply taking the underlying data sources from their website, thus speaking to the level of transparency and replicability of the data.
The sensitivity tests performed in this analysis are certainly not exhaustive, but simply a basic series of tests. Moreover the analyses employed only EU states, and thus a drawback to this study is that it cannot speak to the robustness of the world-wide data.
Time constraints simply did not allow for such an analysis, nor could we perform several of the internal consistency or sensitivity tests without having a significant amount of common underlying data sources for each indicator across the sample. Thus the advantage of the EU sample employed here is that there is ample data available for each country and that there is a significant amount of variation across the sample in all four indicators to test the stability of the data. While it is outside the scope of this analysis to answer questions of ‘conceptual relevance’, definition concerns or ‘appropriateness’ of all sources of underlying data, what this study does show is that – at least within the EU – no single data source or methodological decision to construct the four indicators, such as weights or aggregation, drive the results in a significantly biased way. Other criticisms, such as whether perceptions-data are appropriate in measuring QoG, time series issues, or whether the data are driven by recent economic success or business interests have been taken up by KKM themselves. However, based on the results here, researchers that employ this data – in particular for comparisons within the EU – can be more assured that the quality of this particular QoG data is reliable and internally sound when making cross-country comparisons.
24
Sources
Apaza, Carmen (2009) Measuring Governance and Corruption through the Worldwide
Governance Indicators: Critiques, Responses, and Ongoing Scholarly Discussion, PS:
Political Science & Politics 42:1 (January), 139-143.
Arndt, Christiane and Charles Oman. 2006. "Uses and Abuses of Governance
Indicators". OECD Development Center Study.
Kaufmann, D., A. Kraay and M. Mastruzzi 2006. Growth and Governance: A Reply.
Journal of Politics.
Vol. 69, Issue 2.
Kaufmann, D., A. Kraay and M. Mastruzzi 2008. Governance Matters VII: Aggregate and Individual Governance Indicators: 1996-2007.
Knack, Steven. 2006. "Measuring Corruption in Eastern Europe and Central Asia: A
Critique of the Cross-Country Indicators". World Bank Policy Research
Department Working Paper 3968.
Kurtz, M.J. and A. Schrank 2007. Growth and Governance: Models, Measures, and
Mechanisms. Journal of Politics.
Vol. 69, (2):
OECD. 2008. “Handbook on Constructing Composite Indicators: Methodology and
User Guide”. Joint publication by the OECD and JRC European Commission
Pollit, C. 2008. ’Moderation in All Things’: Governance Quality and Performance
Information. Presented at the SoG conference, Göteborg, Sweden 2008.
Thomas, Melissa, What Do the Worldwide Governance Indicators Measure? (July 16,
2009). European Journal of Development Research .
25
1 DENMARK
2 SWEDEN
3 FINLAND
4 NETHERLANDS
5 AUSTRIA
6 UNITED KINGDOM
7 GERMANY
8 LUXEMBOURG
9 FRANCE
10 IRELAND
11 BELGIUM
12 MALTA
13 CYPRUS
14 ESTONIA
15 PORTUGAL
16 CZECH REPUBLIC
17 SLOVENIA
18 SPAIN
19 SLOVAKIA
20 HUNGARY
21 LITHUANIA
22 LATVIA
23 GREECE
24 POLAND
25 ITALY
26 BULGARIA
27 ROMANIA
Table 1: Overall Rankings of the E.U. States for the 4 WGI QoG Indicators
1.Government
Effectiveness 2. Control of Corruption 3. Rule of Law 4. Voice & Accountability
1 FINLAND
2 DENMARK
3 SWEDEN
4 NETHERLANDS
5 LUXEMBOURG
6 AUSTRIA
7 GERMANY
8 IRELAND
9 UNITED KINGDOM
10 FRANCE
11 BELGIUM
12 SPAIN
13 PORTUGAL
14 CYPRUS
15 MALTA
16 SLOVENIA
17 ESTONIA
18 HUNGARY
19 SLOVAKIA
20 POLAND
21 CZECH REPUBLIC
22 LATVIA
23 LITHUANIA
24 ITALY
25 GREECE
26 ROMANIA
27 BULGARIA
1 DENMARK
2 AUSTRIA
3 SWEDEN
4 FINLAND
5 LUXEMBOURG
6 NETHERLANDS
7 IRELAND
8 GERMANY
9 UNITED KINGDOM
10 MALTA
11 FRANCE
12 BELGIUM
13 SPAIN
14 CYPRUS
15 PORTUGAL
16 ESTONIA
17 SLOVENIA
18 CZECH REPUBLIC
19 HUNGARY
20 GREECE
21 LITHUANIA
22 LATVIA
23 POLAND
24 SLOVAKIA
25 ITALY
26 ROMANIA
27 BULGARIA
1 SWEDEN
2 NETHERLANDS
3 LUXEMBOURG
4 DENMARK
5 FINLAND
6 IRELAND
7 BELGIUM
8 AUSTRIA
9 GERMANY
10 UNITED KINGDOM
11 FRANCE
12 MALTA
13 PORTUGAL
14 SPAIN
15 ESTONIA
16 CZECH REPUBLIC
17 SLOVENIA
18 HUNGARY
19 CYPRUS
20 ITALY
21 SLOVAKIA
22 GREECE
23 LATVIA
24 POLAND
25 LITHUANIA
26 BULGARIA
27 ROMANIA
Table 2: Principle Component Analysis - Check of the Underlying Data Consistency
Pillar Sig. Factor Eigenvalue Proportion
Government Effectiveness (9)
Control of Corruption (9)
Rule of Law (12) 1
2
Voice & Accountability (10)
Note: number of common sources in parentheses
2 1.17
7.52 0.63
1.41 0.12
0.12
26
Table 3: Pearson's Correlation Coefficients between Underlying Data, Pillar and QoG Index
Pillar Availability Indicator Pillar Pillar Availability Indicator Pillar
Correlation Correlation
Government
Effectiveness
Control of
Corruption
All (9)
Regional (2)
Limited (1)
All (9)
Regional (4)
BRI
DRI
EIU
PRS
WMO
GCS
WCY
GWP
EGV
BPS
BTI
IPD
BRI
DRI
EIU
GAD
PRS
WMO
GCS
GWP
WCY
BPS
BTI
FRH
GII
.76***
.56***
.92***
.88***
.81***
.84***
.84***
0.71***
.39**
-0.33
0.53
.81***
0.83***
0.60***
0.93***
0.82***
0.94***
0.86***
0.96***
0.86***
0.95***
-0.21
0.70**
0.93***
-0.55
Rule of Law All (12) BRI
DRI
EIU
GAD
PRS
WMO
GCS
GWP
WCY
HER
HUM
TPR
Regional (4) BPS
BTI
GII
Limited (1) IPD
Voice &
Accountability
All (10) WMO
FRH
PRS
EIU
RSF
HUM
WCY
GCS
GAD
.81***
.46***
.87***
0.28
.73***
.67***
.80***
.68***
Limited (2) IPD 0.92*** GWP .70***
GCB 0.76*** Regional (3) BTI
GII
OBI
Limited (2) IPD
MSIª
.81***
0.45
.69**
.77***
0.99
ªonly available for Bulgaria & Romania.
Number of sources in parentheses under ‘availability. For a full list and description of variable sources, see the
Appendix.
‘all’ means that the data is available for all states, ‘regional’ refers to data being mainly available for only one area within the E.U. while ‘limited’ refers to data that is only available for less than 4 countries.
*** p<.01, **p<.05
.72***
.78***
.63***
.38**
0.02
0.74**
-0.51
0.82
.90***
.85***
.62***
.92***
.65***
.38**
.80***
.86***
.77***
27
Figure 1 – Results of the Cluster Analysis
Note: Malta (not seen) is ‘average QoG’, same color as France or Spain
Figure 2
2.5
2
1.5
1
0.5
0
QoG(total) CC VA RoL GE
Cluster 1
Cluster 2
Cluster 3
28
Figure 3
Rule of Law in E.U. Member States mean RoL value 90% c.i. range
29
Table 4: Frequency Matrix of a EU Country's Rankings: Rule of Law
Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27
1
2
3
Denmark
Austria
Sweden
4 Finland
5 Lux
6 Netherlands
7 Ireland
90 10
81 18
10 83 5
18 78
5
8 Germany
9 United Kingdom
10 Malta
11 France
65 32
66 32
5
12 Belgium
13 Spain
14 Cyprus
15 Portugal
16 Estonia
17 Slovenia
18 Cz. Rep
19 Hungary
94
16 16 61 6
9 77 13
13 73 12
61 23 14
27 29 40
25 64 9 12
21 65 13
13 83
20 Greece
21 Latvia
22 Lithuania
23 Poland
24 Slovakia
25 Italia
26 Romania
27 Bulgaria
5 21 71
57 27 14
100
54 30 16
100
100 note: Frequencies are calculated from the 78 simulations with alternative aggregation, weighting and removing one indicator at a time
For example, Finland ranks in either the 3rd or 4th position 78% of the time, while 1st or 2nd 18% of the time. Frequencies under 5% are not shown. All figures rounded to the nearest percent. Bold numbers represent a majority of simulation results equaling a country’s original rank position.
Table 5: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and Exclusion of
Single Indicators on Rule of Law (10 Most Extreme Scenarios)
Scenario Aggregation Weighting Excluded Median Max
Indicator
Spearman
Rank Coefficient
30 Arithmetic
71 Geometric
45 Geometric
58 Geometric
19 Arithmetic
54 Geometric
22 Arithmetic
46 Geometric
57 Geometric
59 Geometric
FA
FA
Original
Equal
Equal
Equal
Equal
Original
Equal
Equal
Eiu
Prs
Prs
Prs
Prs
Bri
Gwp
Wmo
Gad
Wmo
1
1
2
1
1
1
2
5 (Italy+)
5 (Italy+)
5 (Italy+)
5 (Italy+)
0.943
0.949
0.949
0.95
4
5 (Hungary+)
0.951
0.954
5 (Italy, Hun. +) 0.955
1 5 0.955
2 5 (Italy+) 0.955
30
Figure 4
mean GE value 90% c.i. range
31
Table 6: Frequency Matrix of a EU Country's Rank
Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27
1 DENMARK
2 SWEDEN
3 FINLAND
4 NETHERLANDS
5 AUSTRIA
100
6 UNITED KINGDOM
16
48
75 8
21 66 8
45
7 GERMANY
8 LUXEMBOURG
25
18
28
63
9 FRANCE
10 IRELAND
11 BELGIUM
12 MALTA
5
5
6
35
33
16
13 CYPRUS
14 ESTONIA
15 PORTUGAL
16 CZECH REPUBLIC
17 SLOVENIA
18 SPAIN
11 63 25
13 31 50 5
15 11 73
6 80 13
15 65 13 5
71 5 18
19 SLOVAKIA
20 HUNGARY
21 LITHUANIA
22 LATVIA
23 GREECE
24 POLAND
5 70 25
18
25 ITALY
26 BULGARIA
8 23 15 53
100
27 ROMANIA 100 note: Frequencies are calculated from the 78 simulations with alternative aggregation, weighting and removing one indicator at a time
For example, Netherlands ranks in either the 3rd or 4th position 75% of the time, while 5th or 6th 24% of the time. Frequencies under 5% are not shown
Table 7: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and
Exclusion of Single Indicators on Government Effectiveness (10 Most Extreme Scenarios)
Scenario Aggregation Weighting Excluded Median Max Spearman
Indicator Rank Coefficient
37 Geometric
47 Geometric
17 Arithmetic
34 Geometric
44 Geometric
39 Geometric
49 Geometric
59 Geometric
36 Geometric
46 Geometric
Original
Equal
Equal
Original
Equal
Original
Equal
FA
Original
Equal
Gcs
Gcs
Gcs
Eiu
Eiu
Wcy
Wcy
Wcy
Wmo
Wmo
2
2
2
1
2
2
2
2
2
2
7 (Ireland+)
7 Ireland +)
7 (Ireland+)
7 (Italy+)
7 (Italy+)
7 (Italy+)
7 (Italy+)
5 (Spain+)
6 (Slovak-, Pol.+)
6 (Poland+)
0.907
0.907
0.914
0.925
0.925
0.928
0.928
0.928
0.932
0.932
32
Figure 5
CC estimate 90% c.i. range
33
Table 8: Frequency Matrix of a EU Country's Corruption Rank
Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27
1 FINLAND 97
2 DENMARK 92 7
3 SWEDEN 9 90
4 NETHERLANDS 99
5 LUXEMBOURG 99
6 AUSTRIA 50 44
7 GERMANY 54 10
8 IRELAND
9 UNITED
10 FRANCE
11 BELGIUM
12 SPAIN
18 HUNGARY
19 SLOVAKIA
20 POLAND
88 12
13 PORTUGAL
14 CYPRUS
68 28
35 60
15 MALTA 13 57 30
16 SLOVENIA 10 62 20 5
17 ESTONIA 7 30 62
100
95
21 CZECH
22 LATVIA
23 LITHUANIA
24 ITALY
25 GREECE
26 ROMANIA
7 82
17
5
99
100
83
100
27 BULGARIA 100 note: Frequencies are calculated from the 60 simulations with alternative aggregation, weighting and removing one indicator at a time
Frequencies under 5% are not shown. Bold numbers represent a majority of simulation results equaling a country’s original rank position.
Table 9: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and
Exclusion of Single Indicators on Corruption (10 Most Extreme Scenarios)
Scenario Aggregation Weighting Excluded Median Max Spearman
Indicator Rank Coefficient
36 Geometric
58 Geometric
46 Geometric
26 Arithmetic
16 Arithmetic
12 Arithmetic
54 Geometric
28 Arithmetic
34 Geometric
38 Geometric
Original
FA
Equal
FA
Equal
Equal
FA
FA
Original
Original
Prs
Gcs
Prs
Prs
Prs
Bri
Eiu
Gcs
Eiu
Gcs
1
1
1
1
1
1
1
1
1
1
4 0.971
3 0.971
4 0.972
4 0.974
4 0.975
4 0.981
3 0.981
3 0.982
3 0.982
3 0.982
34
Figure 6
number mean VA value 90% c.i. range
35
Table 10: Frequency Matrix of a EU Country's Voice & Accountability Rank
Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27
1 Sweden
2 Netherlands
3 Lux
4 Denmark
5 Finland
6 Ireland
7 Belgium
99
86 14
8 Austria
9 Germany
10 United
93 6
9
60 33 6
11 France
12 Malta
13 Portugal
14 Spain
15 Estonia
17 Slovenia
5 74 16 5
14 49 37
59 35
8 12 67 13
9 90
84 9 6
18 Hungary
19 Cyprus
20 Italia
21 Slovakia
22 Greece
23 Latvia
24 Poland
25 Lithuania
100
6
15 84
26 Bulgaria
27 Romania
25 75
8 92
66 34 note: Frequencies are calculated from the 66 simulations with alternative aggregation, weighting and removing one indicator at a time.
Frequencies under 5% are not shown. Bold numbers represent a majority of simulation results equaling a country’s original rank position.
Table 11: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and
Exclusion of Single Indicators on Voice & Accountability (10 Most Extreme Scenarios)
Scenario Aggregation Weighting Excluded Median Max Spearman
Indicator Rank Coefficient
20 Arithmetic
31 Arithmetic
29 Arithmetic
18 Arithmetic
13 Arithmetic
19 Arithmetic
24 Arithmetic
30 Arithmetic
16 Arithmetic
27 Arithmetic
Equal
FA
FA
Equal
Equal
Equal
FA
FA
Equal
FA
GCS
GCS
HUM
HUM
WMO
WCY
WMO
WCY
EIU
EIU
2 9 (Cyprus+) 0.916
2 8 (Cyprus+) 0.916
2 10 (Cyprus+) 0.918
2 9 (Cyprus+) 0.919
2 11 (Cyprus+) 0.92
2 10 (Cyprus+) 0.921
2 11 (Cyprus+) 0.921
1 11 (Cyprus+) 0.921
2 10 (Cyprus+) 0.922
2 10 (Cyprus+) 0.922
36
Appendix – List of sources
List of Underlying Sources of World Bank Governance Data for E.U. Countries
BTI
IPD
FRH
GII
GCB
HER
HUM
TPR
OBI
MSIª
BRI
DRI
EIU
PRS
WMO
GCS
WCY
GWP
EGV
BPS
Business Environment Risk Intelligence
Global Insight Global Risk Service
Economist Intelligence Unit
Political Risk Services International Country Risk Guide
Global Insight Business Condition and Risk Indicators
Global Competitiveness Report
Institute for Management Development World Competitiveness Yearbook
Gallup World Poll
Global E-Governance Index
Business Enterprise Environment Survey
Bertelsmann Transformation Index
Institutional Profiles Database
Freedom House
Global Integrity Index
Global Corruption Barometer Survey
Heritage Foundation Index of Economic Freedom
Cingranelli Richards Human Rights Database
US State Department Trafficking in People report
International Budget Project Open Budget Index
International Research & Exchanges Board Media Sustainability Index
For a more thorough look at each individual indicators, see: http://info.worldbank.org/governance/wgi/sources.htm
37