A Sensitivity Test of the

advertisement

Assessing The Quality of the Quality of Government Data: A Sensitivity Test of the

World Bank Government Indicators.

________________________________________________________________________

Abstract: This study aims at assessing the quality of one of most often employed indicators of Quality of Government (QoG), namely the World Bank Government

Indicators (WGI). The WGI has faced numerous critiques in recent years and while the authors have defended their data on many fronts, there remain several empirical concerns that have been unexplored. Critiques about the data’s precision, internal consistency, robustness and the transparency of the indicators have been questioned, and until now, untested. This analysis attempts to fill in these gaps by performing a sensitivity analysis on four of the WGI indicators for the year 2008 on a sample of 27 E.U. countries. We find that the WGI composite indicators are predominantly internally consistent and are remarkably robust to adjustments in the weighting and aggregation scheme of the underlying data and to the exclusion of any one underlying indicator.

Key words: governance, corruption, sensitivity testing, world bank,

Nicholas Charron, PhD

Research Fellow and Assistant Professor

The Quality of Government Institute

Department of Political Science

The University of Gothenburg, Sweden

Sprängkullsgatan 19

Box 711, SE-405 30 Göteborg, Sweden

Phone: +46 (0) 31 786 46 89

Fax: +46 (0) 31 786 44 56

1

Introduction

What is the state of the current quality of government data (henceforth QoG) empirical data in the field of development economics and political science? This study seeks to contribute significantly to this broad and challenging question by assessing several primary critiques against one of the most widely used QoG data quantitative indicators – the World Bank Governance Indictors (henceforth WGI) (Kaufmann, Kraay and Mastruzzi 2009 ‘KKM’) for a sub-set of countries - the European Union 27. The

WGI, one of the first QoG data sources to publish free QoG data for a world-wide sample starting back in 1996, is one of the most widely employed empirical sources of QoG assessment, used by scholars in prominent journals, media, policy-makers and aid organizations including the United States Millennium Challenge Account aid program.

Unlike other indicators of QoG, such as Transparency International’s Corruption

Perception Index (CPI) that focus on one specific QoG concept such as corruption, the

WGI covers multiple areas of QoG. Due to its widespread use in scholarly work and policy circles, the WGI has been under intense scrutiny relative to other QoG data for a number of reasons (see Arndt and Oman 2006; Knack 2006; Kurtz and Shrank 2006;

Polit 2008; Thomas 2009; Apaza 2009). This analysis empirically addresses several of those critiques.

Although there are a number of other important critiques of the WGI not addressed here (for a list and rebuttal of said critiques, see Kaufmann, Kraay and Mastruzzi 2006), this analysis attempts to deal with several salient main lines of criticism. This study is the first to demonstrate the uncertainty and sensitivity of four widely used indicators of the WGI. Since the WGI is a composite indicator - combining multiple data sources for each country and weighting them - a sensitivity test can directly answer questions of the

WGI’s internal validity and precision with respect to cross-country comparisons. In addition, the sensitivity test can confront issues regarding potential problems stemming from the weighting or aggregation scheme of the WGI. Finally, the sub-sample of the

E.U. countries is selected intentionally to address the matter ‘common sources’. Unlike many developing areas, the E.U. is a region where much QoG data is available and the

WGI provides many of the same sources for the 27 countries, which is necessary to perform several of the tests. Moreover, there is much variation within the region as well

2

with respect to QoG. We take advantage of this and show the results of the sensitivity test for the 27 countries

1

.

With regard to uncertainty and sensitivity testing, this analysis follows the advice of

JRC and OECD Handbook on Constructing Composite Indicators (2008). The results for the sensitivity test show that for the E.U. countries in the most recent year of the data

(2008), the WGI are remarkably robust to changes in the weighting scheme, aggregation and removal of underlying data sources. They are also internally consistent. Although this analysis does not address all the critiques of the WGI, it does firmly address several concerns that to this point have not been answered empirically. The scope of this study is also limited to does the E.U.. For this fact, the paper makes a meaningful contribution in that it should ease some of the trepidation for many that are concerned about the internal validity, transparency and precision of the WGI.

The remainder of this paper goes as follows. First, a brief presentation of the WGI and a definition of the individual indicators is provided. Next, a brief presentation of the recent criticisms of the WGI is summarized. We then show the rank order by each indicator of E.U. countries according to the current WGI data. Following this, a test of the data’s internal consistency is performed using principle component analysis (PCA).

We then group countries into cluster-groups according to QoG after a cluster analysis.

Next an uncertainty and sensitivity test are performed on each of the four WGI indicators in this analysis and the results are accordingly reported. A final discussion of the data concludes the paper.

Brief Description of the WGI and the Indicators of QoG

Launched by the World Bank originally in 1996 (Kaufmann, Kraay and Mastruzzi

2009) the WGI was one of the first sources to offer a global dataset freely available for scholars and practitioners researching in the field of QoG. It is a ‘composite index’, which until 2002, was available bi-annually and from 2002-2009 the data has been published annually. It employs a wide scope of data for each indicator, and KKM argue that “each of the individual data sources we have provides an imperfect signal of some

1

The exceptions are the cases of Cyprus and Malta, where data is imputed for a few sources.

3

deep underlying notion of governance that is difficult to observe directly.” (Kaufmann,

Kraay and Mastruzzi 2008: 13). The data contains 6 ‘pillars’ of governance:

1. Control of Corruption

KKM define corruption broadly, which is simply “the abuse of public power for personal gain”. On the ‘control of corruption’ indicator they publish annually, they write for example that the indicator measures “ the extent to which public power is exercised for private gain, including both petty and grand forms of corruption, as well as "capture" of the state by elites and private interests .” Examples of variation in the ways each source captures corruption include: the Business Environment and Enterprise Performance

Survey ( BEEPS), which is undertaken in all former Eastern-bloc countries within the EU, asks firms “How common is for firms to have to pay irregular additional payments to get things done” and “How problematic is corruption for the growth of your business.”

Another firm survey included in the composite index, the World Economic Forum Global

Competitiveness Survey (GCS) ask business leaders specific questions such as the frequency a firm might have to make extra payments in connection with trade permits, loan applications, taxes, and to obtain public contracts. On the other hand, the

Bertelsmann Transformation Index collects data on the same sample of EU states as the

BEEPS, but their data on corruption only captures the extent to which an anti-corruption agency has been established and how effective it is perceived to be in carrying out its mandate.

2. Rule of Law

The underlying individual data of ROL include concepts such as: judiciary independence, property rights, the level of organized crime, respect for contracts, human trafficking, money laundering, and trust of police and the courts. For example, the World Bank uses the Cingranelli and Richards

2

measure as part of their ROL for ‘public sector data providers’, which has a narrow definition of ROL based solely on the independence of the judiciary, scored as 0, 1 or 2 (not independent to ‘generally independent’).

2

See http://ciri.binghamton.edu/documentation/ciri_variables_short_descriptions.pdf

for more information on this particular dataset.

4

Conversely, several of the business firm surveys employed by the WB such as the

Heritage Foundation or the Business Environment Risk Intelligence focus their measurement of the ROL on property rights or contract enforcement by businesses and individuals. Further, household surveys such as the Gallop World Poll includes questions regarding individuals’ confidence in the police force, the judiciary or if they have ever been the victim of a crime

3

.

3. Government Effectiveness

KKM define this indicator as “capturing perceptions of the quality of public services, the quality of the civil service and the degree of its independence from political pressures, the quality of policy formulation and implementation, and the credibility of the government's commitment to such policies.” (Kaufmann, Kraay & Mastruzzi 2009: 6). The composite index is comprised of such data as: the Gallup World Poll’s data that focuses exclusively on citizens’ satisfaction with public education, public transit along with roads and highways. In the World Economic Forum’s (GCS) survey, leaders of firms respond to questions about a country’s infrastructure, along with how much time business leaders interact with government officials in the civil service. The Economist Intelligence Unit

(EIU) asks experts to assess the amount of bureaucratic excess (red tape) and the

‘institutional effectiveness’ of a country’s civil service.

4. Voice & Accountability

KKM describe this pillar of QoG as “capturing perceptions of the extent to which a country's citizens are able to participate in selecting their government, as well as freedom of expression, freedom of association, and a free media.” (KKM 2009: 6). Examples of different underlying data measuring this concept are: the Gray Area Dynamics (GAD) expert assessment, which measures religious freedom, the military’s involvement in politics, political patronage, and the role of the opposition – a fairly broad framework.

The EIU uses 5 criteria to rate each country- based on human rights, accountability of

3

The World Bank’s annual report on their governance indicators carfeully elucidates which sources capture which aspects of the variables in question. See: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1148386 for Kaufmann, Kraay and Mastruzzi’s (2008) ful description of each measured concept.

5

political figures, ‘vested interests’, freedom of association and ‘democracy index’. On the other hand, Gallup World (GWP) and The Reporters without Borders index (RSF) data used in the index is more narrowly focused – the former asking respondents about the fairness/freeness of elections and the absence of political violence, while the later focusing exclusively on freedom of the press from the government.

Two other indicators are also included in the WGI data

4

:

5. Political Stability & the Absence of Violence

6. Regulatory Quality

The ‘composite’ aspect of the data means that it aggregates and amalgamates survey data from firms, experts, and citizens from countries world-wide. The underlying data are collected from four main types of sources: non-governmental organizations (NGO’s), firm and citizen survey data, public sector agencies, and commercial interest/ risk assessment providers for businesses

5

. All data has been standardized for each year, so that the world average is equal to ‘0’ with a standard deviation of ‘1’. KKM also provide margins of error for all indicators for each country-year. In this analysis, only the first four pillars are taken into consideration in the sensitivity analysis for the E.U. sample.

A Brief Overview of the Critics of the WGI

As noted, the WGI is commonly used by academics, media, NGO’s and aid organizations across the world to assess different aspects of QoG, and thus it has rightfully drawn a number of recent criticisms

6

. Some, such as Pollitt (2008) have questioned the entire endeavour of KKM, calling their effort to quantify QoG “fruitless”, and too difficult for

“lay people” to understand or interpret correctly. Other critiques focus on more specific aspects of the data itself, such as Kurtz and Shrank (2007), who try to demonstrate that the data is biased towards the interest of international business elites and that countries

4

For reasons due to lack of variation of the data in the sample (in particular for political stability), along with time constraints, this analysis does not include ‘ Political Stability & the Absence of Violence’ or

5

‘Regulatory Quality’

For a more thorough description of the data and background of the WGI, see Kaufmann, Kraay and

6

Amstruzzi 2009.

For a deeper look into the criticisms of the WGI, KKM (2006) have summarized them in to 11 separate categories.

6

are rewarded for economic growth and not necessarily for improving QoG

7

. Knack

(2006) and Arndt and Oman (2006) argue there are problems in using the data to compare

QoG across space and time due to different underlying sources or imprecision in the data.

The weighting scheme is also argued to be problematic due to the fact that risk assessment agencies may make correlated errors (essentially by free riding on one another), and thus the underlying data is not uncorrelated as assumed by KKM, and this results in overvaluing certain data sources based on the weighting scheme (Arndt and

Oman 2006; Knack 2006). Thomas (2009) questions the “construct validity” of the data, arguing that the scope of the definition of the QoG indicators themselves is too vague to be meaningful. She also criticises the WGI of lacking “convergent and discriminate validity”. The former means that the underlying data which make up each composite indicator need to be sufficiently correlated, while the separate indicators themselves, such as corruption or rule of law, should be sufficiently uncorrelated. Finally, she critiques the

WGI for a lack of transparency (Thomas 2009).

KKM have aptly addressed specifically several of these and other critiques in several recent articles (KKM 2006, 2007), and continue to defend the WGI in the face of concerns over issues about the data’s relevance or definitional validity. However, a few critiques remain empirically unexplored. First, the idea that the data are not sufficiently internally valid or consistent is taken up directly in this analysis. Second, the concern of whether the weighting scheme or certain underlying data are driving the results is also assessed. Finally, issues of the data’s precision with respect to cross-country comparisons are also addressed.

QoG in the E.U. According to the WGI

Table 1 shows the current rankings based on the WGI data for the E.U. 27 for each indicator based on the 2008 data. A quick glance across the four areas of QoG demonstrate that the countries are relatively stable – the Scandinavian and Northern

European countries rank consistently high, while Southern-central Europe generally scores in the middle of the group and the New Member States (NMS) along with Italy and Greece generally score low within the E.U. sample.

7

KKM (2007) answer this critique directly in their publication in the Journal of Politics

7

***Table 1 about here***

Yet how stable and robust are these rankings for E.U. countries? The rest of this analysis goes as follows: we first analyze the internal consistency of the data to test whether it is internally consistent from a statistical point of view. We test whether the underlying data in the 4 pillars of QoG are in fact internally consistent. Next, we perform a cluster analysis so as to place EU countries in appropriate groups, from which more realistic comparisons can be established. Following this, we perform a multi-modelled uncertainty and sensitivity analysis to see if the rankings derived from the WGI are significantly altered or if they remain robust to changes. More specifically, we challenge some of the initial assumptions made by the creators of the data with respect to aggregation and weighting schemes. We also test the impact of each individual indicator by removing it from each pillar one by one.

The Sensitivity Test

“…it is hard to imagine that the debate on the use of composite indicators will ever be settled […] official statisticians may tend to resent composite indicators, whereby a lot of work in data collection and editing is

“wasted” or “hidden” behind a single number of dubious significance. On the other hand, the temptation of stakeholders and practitioners to summarise complex and sometime elusive processes (e.g. sustainability, single market policy, etc.) into a single figure to benchmark country performance for policy consumption seems likewise irresistible.

” Andrea Saltelli, JRC

The WGI publishes all of their underlying data for each country-year on their website, along with the weighting scheme for each country. All underlying indicators for each country have been taken for 2008 data directly from their website

8

. In the following sections, we test the internal consistency of the data, and the sensitivity of the data in terms of how different the overall E.U. rankings become when changes are made to the original road-map of KKM in building the indicators from the underlying data. It is hoped that in addressing these primary matters related to the internal structure of the data itself, several of the aforementioned critiques will be aptly addressed.

Internal Consistency of the Data

8 http://info.worldbank.org/governance/wgi/index.asp

8

We begin with the following question - is each of the individual indicators itself internally consistent? To test this question, we use a commonly employed method in the literature – namely Principle Component Analysis (PCA). The World Bank classifies each of the pillars of governance independently from one another and do not specify any

‘sub-pillars’ which exist in building such indices as ‘rule of law’ or ‘control of corruption’. The internal consistency checks in the data should demonstrate that this is the case.

We begin by testing the internal consistency of each of the 4 WGI individually.

Since the World Bank data does not alert users of the data to any ‘sub-pillars’ – e.g. significant clusters in the data that might need to be accounted for in the weighting scheme – we thus expect that each of the four measures of QoG will contain only one significant factor. Because of the need for common data sources, we were forced to drop the 2-3 sources that are region-specific to the former Eastern group and rely on the sources which included all MS’s

9

. Although the number of sources varies from country to country in the dataset covering all countries in the world (some small island states have only one source for example for a given pillar while some states have more than 15) we have selected the countries of the E.U. for this particular reason – there are at least 9 common sources for each individual indicator of QoG for the WGI, and in the case of

Rule of Law (RL), there are 12. Moreover, to test the internal consistency of the data with common sources only addresses a main concern of the critics of the data, namely that comparisons are difficult to make when the underlying data sources are different from country to country. Table 2 shows a PCA for each of the four indicators of QoG individually.

***Table 2 about here***

Government Effectiveness

Using the standard rules of the Kaiser criteria - factors with Eigenvalues greater than ‘1’ that explain 10% or more of the variance, we find only 1 significant

9

We took only sources that included 24 or more MS’s, thus at times; we had to impute missing data, in particular for Malta and Luxembourg. We did so with the simple method of ‘mean substitution’. See the appendix for a full list of the individual data sources used in the E.U. 27 sample for the 4 pillars of WGI.

9

factor in the 9 underlying indicators that make up this pillar. The single factor has an Eigenvalue of 5.41 and explains over 60% of the variance of the data.

The next largest factor was close – explaining over 8% of the variance with an

Eigenvalue of 0.9.

This demonstrates strong internal consistency in the data which corroborates the

World Bank’s decision to condense this data into a single index. When aggregating the underlying data to form this pillar – as well as using GE to generate the total QoG composite index – we can be fairly certain that significant clusters in the underlying data do not exist.

Control of Corruption

Only one significant factor found in the data (9 underlying sources). The factor has an Eigenvalue of 0.69 and itself explains over three-fourths of the total variance.

The next largest factor had an Eigenvalue under 0.7 and explained less than

6% of the variation.

Results of the PCA point to a very internally consistent and related set of underlying indicators.

Rule of Law

The Rule of Law pillar exhibited two significant factors in underlying data. The first, which explains over 60% of the variance itself and has an Eigenvalue of

7.52, is clearly the most important factor in the data. Factor 2 has an Eigenvalue of 1.41 and explains 12% of the variance. Together, the two factors explain about three-fourths of the total variance in the data.

After rotating the factor (Verimax method), we find that 9 of the 12 indicators load onto factor 1, while the other 3 (DRI, GAD and TPR) load onto factor 2.

The later three factors share a common thread in that they focus heavily on human trafficking and organized crime – rather than judicial independence or property rights as many of the other indicators do.

10

Suggests the need testing the sensitivity of the RoL indicator using two separately weighted factors instead of the single index. However, this may not be a serious problem in that factor 1 in and of has a relatively high Eigenvalue and explains over 0.6 of the proportion of variance – with all of the underlying variables positively loading on it with at least a weight of 0.3 suggesting a certain degree of internal consistency among all 12 indicators.

Voice & Accountability

Two significant factors were found in the VA pillar also. Again, the first explains about 60% of the total variance and has an Eigenvalue of about 6, while the second factor is significant according to the Kaiser rule in that it exceeds the

Eigenvalue threshold of 1 and explains over 10% of the total variance.

After rotating the factor (Verimax method), we find that 8 of the 10 indicators load onto factor 1, while the other 2 (RSF & HUM) load onto factor 2.

Although there is no direct common theoretical thread in the underlying make-up of the two indicators in factor 2 (RSF asks exclusively about press freedom, while

HUM deals with wider human rights issues) they are the underlying variables with the least amount of variance for the MS’s – most of the EU 27 rank very high on both of these measures.

Suggests the need testing the sensitivity of the VA indicator using two separately weighted factors instead of the single index. Although the two factors were found, again, as with the RoL, all 10 indicators are positively loaded onto factor 1 with weights of at least 0.35. Thus internal consistency of the data is present to a certain degree.

Table 3 explores the relationship between the underlying data and its respective pillar.

To demonstrate 100% internal consistency of underlying variables, each indicator would have a positive and significant relationship with its pillar. In fact, this is nearly the case.

Of the 59 separate underlying indicators of the 4 indicators of QoG, only 4 are in the opposite direction with both their respective pillar. However, their relationship appears

11

random (indistinguishable from zero) as their p-values do not reach even the 90% level of confidence. These variables come from two sources, which are:

BEEPS GE and CC scores (BPS), the Global Integrity Index for CC and RL (GII)

We believe that this is less of a problem than it might appear in that they only pertain to the sub-set of former Eastern-bloc states and are thus weighted less in the original data than the common sources. Of the remaining data, and 6 sources are insignificantly related to their respective pillars. These data come from the following sources:

The Bartelsman Index for GE (BTI), the BEEPS for RL (BPS), the Institutional

Profile Database for RL (IPD), the Global Integrity Index for VA (GII) and the

Media Sustainability Index for VA (MSI). The MSI is understandable as is only has 2 observations. In addition, the Gray Area Dynamics measure for RL (GAD) is insignificantly related to the RL pillar.

Cluster Analysis

***Figure 1 about here***

Having shown that the underlying data of the four indicators is mostly internally consistent, we perform a cluster analysis on the data for the E.U. 27. What can we learn from cluster groupings? Although several EU states demonstrate similar QoG scores, there might be patterns in the underlying data that distinguish countries – suggesting different challenges despite sharing a similar overall score. As has been noted by KKM, the WGI is a tool that scholars and practitioners can use to rate the relative position of countries (and not absolutely expect to rank countries with pinpoint accuracy). For example, while finding any existing measure of QoG that can reliably distinguish between Sweden and Denmark or Romania and Bulgaria would be an admittedly all but impossible task, we can use existing measures to accurately point out relative standing to other groups of countries. With the fact that most all governance data is subjective in nature and thus an estimate of the true value of governance rather than representing the

‘real value’, we apply the cluster analysis to elucidate peer countries, which are similarly

12

ranked in the 4 pillars of QoG. The cluster groupings can serve as a helpful tool to identify EU member states which share common challenges to building QoG at the national level.

For the sake of simplicity, we take a simple mean of the four QoG indicators for each country in 2008 and employ hierarchical clustering, using Ward’s method and squared

Euclidian distancing for the four indicators of QoG to spot the number of appropriate cluster groupings. Three distinct groups were detected in the analysis. Then we used kmeans clustering with squared Euclidian distancing to assign each country to a cluster

10

.

Although there are geopolitical and historical similarities among the groups, they are not entirely driven by such factors. For example, new MS’s of the former eastern bloc,

Estonia and Slovenia are grouped with EU-15 MS’s such as Spain, Belgium and France.

Other EU-15 MS’s Italy and Greece belong to the third cluster grouping. However, the group containing the top nine performers with respect to QoG is made up of all EU-15 states from Northern and Central parts of Europe. Figure 1 shows a visual of the distribution of states into the three groups.

Again, as we are not claiming with certainty that these group are ‘set in stone’, the pattern however tells us that cluster 1 countries exhibit relatively high levels of QoG, while cluster 2 and cluster 3 show good and moderate performance respectably. There is also considerable variation within the E.U. (compared with other countries of the world as well) on several of the indicators. Several states in the top group – in particular

Denmark, Sweden, Finland and the Netherlands – rank consistently in top four to five countries in Europe on most all pillars and underlying indicators and in the top 5% worldwide on most all QoG data. It is also worth noting that, according to the World

Bank data all MS’s are found in the top half of the world ranking in ‘voice and accountability’

11

. On the other 3 pillars of QoG, there are times at which a few countries’ scores are indistinguishable from the 3 rd

quartile of countries in the world (the ones that rank between 25% and 50%) and the 2 nd

quartile (50% to 75% percent), such as Bulgaria,

10

In addition, we clustered countries with a ‘maximum between cluster difference’ method and found that only France and Czech Republic would belong to a different group (group 1 and 2 respectively). After carefully looking at the data and their relative proximity to their neighbors within both groups, we made the judgment that the placement in the squared Euclidian clustering was the more appropriate choice.

11

Applying the 90% confidence intervals provided by Kaufmann et al (2009) one can calculate the relative standing of a state to any one of the 4 ’quartiles’ with relative ease as scores are normalized with a mean of

’0’ and SD of ’1’.

13

Romania, Greece, Latvia and Italy – particularly on corruption scores. However, none of the MS’s are ever found in the 4 th

quartile (the lowest 25%) and are all statistically distinguishable from this group on all four pillars according to the margins of error provided by KKM.

In Figure 2, we show the pillar averages for the 3 cluster groupings. Although the 3 groups clearly distinguish themselves on all 4 components of QoG along with the total scores, the difference vary according to certain pillars. For example, the 3 cluster groups are most clearly distinguished with respect to ‘control of corruption’ scores, while the

MS’s scores converge when looking at the measure for the ‘rule of law’. On both VA and GE, the mean cluster scores more or less resemble the overall QoG scores. On none of the individual pillars do any of the groups overlap – suggesting strong internal consistency of cluster groupings.

Uncertainly and Sensitivity Analyses

In any composite indicator there are a number of assumptions and creative decisions that the modellers make in order to put together the various underlying indicators into a single number. As Cherchye et al (2008) argue, “there is no recipe for building composite indicators that is at the same time universally applicable and sufficiently detailed.” Thus in any composite index, modellers face potential criticisms about what they ‘could have done’ when building the underlying variables and how basic changes in the assumptions of weighting variables or aggregating them together might alter the results.

In ranking E.U. MS’s for any of the QoG indicators of the WGI, which is based on such a composite index, it is important to utilize as many existing alternative methods in building a composite measure so as to reduce the level of skewness or biases towards certain countries from the original assumptions of the index. We therefore address several questions about how stable the index is (e.g. how much the rankings shift for any one of the indicators). We then point out those countries that are particularly prone to volatility based on alterations from the original model.

In this section, we run several uncertainty analyses (UA), which employ evaluating the affect of alternative models on the ranking of countries in each of the 4 pillars of

QoG. In particular, we focus on the choice of weights by the original modellers as well

14

as the aggregation method. Of course, there is no way we can establish the ‘true ranking’ for each individual country as each indicator of the 4 pillars of QoG is only an estimate of the perceived level of corruption, rule of law, etc. However, by testing a number of alternative assumptions about the weighting and aggregation schemes, we can evaluate the robustness of the measure used to calculate the rankings. Uncertainty tests, while still relatively new in the field, have been used by previous studies evaluating such composite indices as the Environmental Performance Index (Saisana & Saltelli 2008) and the

Ibrahim Index of African Governance (Saisana, Annoni & Nardo 2009). As these studies do, we follow the guidelines and advice for UA set forth by the OECD (2008).

A brief description of the original data’s assumptions

The WB’s governance indicators are built using an Unobserved Components Model

(UCM) due to the uneven amount of underlying indicators each country may contain in any given pillar of QoG. Each underlying indicator is normalized using a min-max method so that it is bound from 0-1, with higher scores equating to better QoG. The scores are then standardized by subtracting each country’s score on indicator ‘i’ from the mean value and dividing it over the standard deviation for indicator ‘i’. The weights are essentially ‘country specific’ in that each indicator is weighted uniquely (outlier variables are weighted less for instance than those that are closer to other indicators) yet because each country potentially has a different number of underlying indicators, the adjusted weights can have a country specific effect

12

. Such a method allows for the use of regional-specific data as well as survey data with less than global coverage.

The indicators are then aggregated arithmetically, taking the sum-product of the weights multiplied by the standardized indicator and then dividing each country’s score over the number of indicators it has multiplied by the weights in order to compensate for difference sin the raw number of indicators. Once the indicators are aggregated, they are again standardized so that the global average for each year is ‘0’ with a standard deviation of ‘1’.

While this method allows for different countries to have different numbers of underlying indicators, it rewards conformity of the data so to speak in that it minimizes

12

See a more detailed description of UCM in the appendix of the final report.

15

the impact of outliers (OECD 2008: 101). Moreover, with highly correlated underlying indicators, identification problems could arise (OECD 2008: 101).

Evaluation of the Data: A Multi-Modelling Approach

We gather all underlying data for the four indicators of the WGI for 2008. After successfully replicating the original estimates, we explore three key aspects of the models assumptions in the uncertainty analysis. They address: 1) the aggregation method, 2) the weighting method and 3) the number of indicators included in the indicator.

1. The aggregation method

The World Bank data is aggregated using a linear, additive method. Though this method is very common in contemporary indices there are potential drawbacks of this method.

First there is a strong assumption made of the independence of the underlying indicators.

In essence, the underlying variables are taken independently and thus a poor score in one indicator can be offset by a relatively high score in another indicator. Nardo et al (2005:

79-80) give the following example : “if a hypothetical composite were formed by inequality, environmental degradation, GDP per capita and unemployment, two countries, one with values 21, 1, 1, 1; and the other with 6,6,6,6 would have equal composite if the aggregation is additive. Obviously the two countries would represent very different social conditions that would not be reflected in the composite.”

We take this warning seriously and therefore use the recommended geometric aggregation method as an alternative - which in the above example would give country A a score of 2.14 and country B a 6. Thus one high score (which could potentially be a misleading outlier) does not skew the end result as much as the linear additive method.

For geometric aggregation, we linearly transform the indicators so that all figures are positive (<’0’) and employ the following formula, where ‘z’ is each QoG underlying data point and ‘w’ is the weight assigned to that data

13

:

WGI countryX

= q

Q

=

1 z w q q , x

2. The Weighting Scheme

13

From the “Handbook on Constructing Composite Indicators” (2008: 104).

16

As noted, the weighting scheme is based on the UCM which can potentially have individual country-specific weights. We employ two alternative weighting schemes in order to test the robustness of the original data. First we use a simple equal weighting

(EW) so that the correlation of the underlying indicators are not taken into account as they are in the original weighting scheme. Second we employ a new set of weights based on principle component analysis (PCA) including only common underlying data for each pillar. When necessary, we use factor rotation and square the factor loadings to derive the new weights. All weights are for each indicator are listed in the appendix.

3. Inclusion/Exclusion of Individual Indicators

For each pillar, we further test the robustness of the index by excluding each indicator relevant to the EU 27 by giving it essentially a ‘0’ weight in the simulation. We do this to find out if the absence of any one data source would significantly alter the results and if so, in what weighting scheme and aggregation context. We report the ten most extreme simulations for each of the four indicators

14

.

A Brief Note about Missing Data

In the simulations for the FA and geometric weighting schemes, we employ only common data sources, meaning we remove the 2-3 region-specific indicators for former

Eastern and Central Europe for each pillar and any data with less than 24 EU countries.

The vast majority of EU MS’s have between 9 and 12 sources in common depending on the pillar. The exception is mainly Malta (which has an average of 6 indicators per pillar) though Luxembourg and Cyprus have a 1-2 missing data points as well for each pillar compared with other EU countries. In order to maximize the number of common underlying data sources for each indicator, we employ the most simplistic method ( mean substitution) for each missing data point during these simulations, so that they take the value of the EU 27 mean (as opposed to the world mean).

Uncertainty Test Results

In this section, the robustness of each indicator is tested and discussed one at a time.

14

For a full list of the simulation results, please contact the author.

17

1. Rule of Law (RL)

Kaufmann et al (2009) provide a 90% confidence interval for each country’s QoG pillar in each year of the data. Figure 3 shows the 2008 rankings for the EU 27 along with the original confidence intervals.

According to Kaufmann et al (2009), we should only interpret significant differences between two countries if the confidence intervals do not overlap. Thus the first significant difference between Denmark and the next highest ranking state would be

France. On the other hand, although Denmark ranks number 1 in the original data, its score is all but indistinguishable statistically from other high ranking states such as

Austria, Sweden and Finland. For this pillar, Bulgaria and Romania can be distinguished from all other states in the sample. The line in the middle represents the EU average, which all states starting with Belgium (and ranked higher) are significantly above, while all states from Slovenia and below are significantly under.

***Figure 3 about here***

We test the robustness of these rankings in Figure 3 with 78 simulations based on the different aggregation and weighting schemes along with removing each indictor out one at a time. The results from Table 4 show the frequencies of each country’s rank for the

78 simulations. For example, Denmark ranks in the top two positions 90% of the time and 10% of the time is either #3 or #4, while Bulgaria and Romania share the bottom two positions in 100% of the simulations irrespective of any change in any of the assumptions of the index or when any of the indicators are removed. The bold numbers indicate that a state was found to be in the same place in a majority of the simulations and there are 12 such states, with 7 of them belonging to group 1. Only 6 countries - Malta, Cyprus,

Estonia, Hungary, Poland and Italy – were found to rank at least 3 places above or below their original rank place in a majority of cases. The other 9 cases fell within two spots of their original rank, such as the U.K. and Spain. The two countries with the most volatility within the RL pillar were Czech Republic and Cyprus, which were found spanning across 7-8 different rank positions each (both 13-20) in at least 5% of the simulations. The vast majority however remained within 2-3 positions of their original

18

rank and all states (except for possibly Hungary) remaining within their respective cluster blocs.

***Table 4 about here***

Sensitivity Results

We complement the uncertainty analysis with a sensitivity analysis for each pillar, beginning with the rule of law. Here we show the results of the 10 cases that differ most from the original rankings according to the Spearman rank coefficient. We report the median and maximum changes and a Spearman Rank Correlation Coefficient, comparing the altered data to the original RoL index. Table 5 shows the results.

We use the full data then remove each indicator one at a time in the 6 different weighting/aggregation combinations. Overall the results are remarkably robust and the original rankings do not appear to significantly biased at all, with the lowest Spearman coefficient (simulation 30) being .931. Only 6 of the 78 simulations have a median country shift of 2, and the vast majority has a median shift of only 1. Eight of the 10 most extreme cases use geometric aggregation while 6 of the top ten cases use an equal weighting scheme. In cases when the max shift of at least one country was 5 or more places from its original rank, we report that particular country. The largest shift of one state is Slovenia in simulation 30, with FA weighting and arithmetic aggregation when the Eiu variable is removed. Otherwise, the largest shift is 5 places, with Hungary and

Italy doing so on multiple occasions. It is clear that these two countries for example, are at a relative disadvantage by the original weighting and aggregation in the RL data as they generally rank at least 3-5 places higher than their original ranking (25 for Italy, 19 for Hungary) in the majority of cases in the simulations.

***Table 5 about here***

2: Government Effectiveness (GE)

***Figure 4 about here***

We start with a distribution of the original scores, using the WB’s confidence intervals to show significant differences between EU states. Again the top four countries

– Denmark, Sweden, Finland and the Netherlands are statistically indistinguishable from one another based on the estimates and the confidence intervals surrounding them.

19

However, Denmark is statistically higher than all EU state from Austria and downward, while Sweden is so with state from Ireland and downward. All countries from #11

(Belgium) and up distinguish themselves from the mean EU score and from #19

(Slovakia) and down are clearly below the EU average, which re-enforces the cluster groupings from the previous section. States in the middle ranking between 12 and 18

(Malta, Cyprus, Estonia, Portugal, Czech Rep., Slovenia and Spain) are virtually indistinguishable from one another. In Table 6, the uncertainty analysis of these relative rankings is shown.

***Table 6 about here***

Table 7 reports the sensitivity analyses results for the 10 cases that differ most from the original results according to the Spearman rank coefficient from the 60 simulations for the GE pillar. The overall results again show impressive robustness as the median shift is never higher than 2 places and the Spearman coefficient does not fall below .90 in any of the 66 simulations (the low is #47 and # 37 at .907). Based on alternative aggregation and weighting scenarios – in particular geometric aggregation, in which 9 of the 10 most different cases use – several states show a pattern of shifting up or down in the rankings by about 5 or 6 places. Italy, Spain, Poland and Belgium all end up 5 or 6 places higher than their original rankings on several occasions, while the initial GE data seems to have benefitted Slovakia the most, with Hungary and Latvia also ending up lower than their original rank in multiple scenarios. The largest jumps occur in simulation 38 (Slovenia drops 8 spots) twice with Ireland moving from 10 th

place up to

3 rd

in number 17 and 47 and Italy jumps 7 spots four times in the geometric aggregated simulations in both the original and equal weighting in scenarios 34, 39, 44 and 49.

Despite these shifts in a handful of countries, the shift from one cluster to another is very rare - for example, Italy’s frequent shifts up to rank #20 or #19 up from #25 is still within the bounds of group 3 – demonstrating that most of the variation of state rankings for GE is within – and not between – clusters of EU states.

***Table 7 about here***

3. Control of Corruption

20

The results for the control of corruption pillar show that there is more significant variation between countries than in the previous two indicators. Group 1 (from the cluster analysis) almost completely distinguishes itself statistically from both group 2 and

3. Moreover, the gap between Estonia and Slovakia (border states in cluster 2 and 3) is significant at the 90% level of confidence, thus group 2 completely distinguishes itself from group 3 as well. The top 4 countries in particular show remarkably high scores significantly over ‘2’ in the original WB data, meaning that they are at least two standard deviations above the world mean score. Bulgaria and Romania however perform relatively poorly relative to world rankings as they are significantly below world averages according to the 90% confidence interval.

***Figure 5 about here***

We report the results of the uncertainty tests for the 60 simulations for the control of corruption indicator. Again, each country’s proportion of their respective ranking across the 60 simulations is shown in Table 8. The three cluster groupings show remarkable robustness, as every country in the sample stays within their cluster-grouping range.

Further, 19 of the 27 states – including the top and bottom 6 ranked - are found in their original place at least 50% of the time (bold numbers), while those that show up below or above their original ranking in a majority of simulations (for example, the U.K. originally ranked 9, was found between 5 th

and 8 th

place 83% of the time) were only a few places above or below. The uncertainty tests show no signs of the original data negatively biasing any individual country. The largest range is Slovenia, which was found anywhere from 12 th

to 18 th

in the rankings, which are again, within the scope of the second cluster grouping.

***Table 8 here***

The sensitivity analysis results for each individual simulation demonstrate that the

CC scores are by comparison to the other 3 indicators, the most robust of the WGI data.

In fact, in 23 simulations of the 60, the median state does not shift even one place in the rankings and in the other 37 cases; the median shift is only equal to 1. Moreover, the

Spearman rank coefficient does not drop below 0.97 in any of the cases and in no simulation did we find that the maximum shift was larger than four places from the

21

original rankings. We are therefore quite confident of the robustness of the CC rankings from the World Bank data – there is apparently no systematic bias from the original aggregation or weighting methods that hurt or help any EU states in any significant way from the initial rank order.

***Table 9 here***

4. Voice & Accountability (VA)

***figure 6 about here***

The final pillar displays a similar rank order to the first three yet has a much smaller range of scores than the first three indicators. All EU states are significantly over the world average yet the higher ranking states are not 2 standard deviations over the mean as they have been, but approximately 1.5 over the mean, thus making for a tighter grouping in the VA pillar. Therefore the two highest ranking EU states, Sweden and the

Netherlands are statistically indistinguishable all the way down the 11 th

ranked country

(France). Spain’s (ranked 14 th

) confidence interval overlaps with both Belgium’s and

Poland’s, meaning there is a 90% chance that Spain’s ‘true’ VA estimate could be between 7 th

and 23 rd

within the EU for example according to KKM.

***Table 10 about here***

Due to the tighter score-groupings in the VA pillar, we would thus expect a bit more volatility of the rankings than the previous three pillars. We discovered this indeed to be the case. Only 8 countries are found to be in their original rank at least 50% of the time after all simulations. We find that after the 66 simulations, 7 countries (Germany, Malta,

Estonia, Cyprus, Hungary, Greece and Poland) ended up in four different boxes at least

5% of the time, meaning that they could plausible be in one of eight different places in the rankings from where they were in the original VA data depending on the assumptions of the model. Cyprus is the most extreme case. Originally ranked 19 th

, it is only found in the 19 th

or 20 th

ranking in 11% of the simulations, while it reaches the 9 th

or 10 th

ranking

17% of the time and even the 8 th

sport in the rankings in 6% of the simulations. Poland also appears to have been affected negatively by the original assumptions of the data, and ends up ranked either 19 th

or 20 th

in 60% of the simulations, about 4 to 5 places above its original ranking. Even Romania, which until the VA pillar has been in the bottom two

22

spots in almost every simulation, ends up a couple of places higher in a majority of cases.

Notable countries that seem to significantly benefit from the original weights and additive aggregation schemes (at least 4-5 spots ahead of the majority of the simulation outcomes) include Hungary and possibly Germany.

***Table 11 about here***

It is clear from the results of the sensitivity analysis that one state in particular, namely Cyprus, is clearly disadvantaged from the original weights and aggregation methods. For most all cases from the 66 simulations, we see Cyprus jump between 6 and

11 spots in all but a handful of cases, all with either FA or EW weighting schemes in arithmetic aggregation. Poland seems to benefit from geometric aggregation of the indicators, while Latvia jumps up 6 places in the rankings when the Gallop World Poll data is taken out in additive aggregate schemes. Surprisingly, Germany ends up ranked between 13 th

and 15 th

in about 29% of the simulations. Aside from those few cases however, the remainder of the countries are again quite stable relative to their original ranking – in not one case does the median rank rise above 2. Furthermore, the lowest

Spearman Rank coefficient is still above .91 (simulation 31 with additive aggregation and

Factor weights is .916).

In the case of all four indicators, we find that the World Bank data provide reliable estimates of QoG in EU states for the year 2008.

Discussion

The WGI has been one of, if not the most important sources of empirical QoG data in the last 15 years. Its scope of use ranges from academia to media to assessment of international aid by countries such as the U.S. and the Netherlands. Thus questions of validity and reliability are of utmost importance for scholars and policy-makers interested in QoG research. Several recent critiques of the WGI have questioned the data’s internal validity, the notion that without common sources the data cannot be used for crosscountry comparison, which lead to different weighting schemes for each country that could be problematic and finally, the precision of the data has been questioned. All of these critiques have been addressed directly by employing several internal consistency,

23

uncertainty and sensitivity analyses for a sub-group of countries, the E.U. 27. Four

‘pillars’ of QoG were analysed. The data shows to be remarkably internally consistent, in particular the GE and CC indicators. With the help of cluster analysis, we find that the patterns that emerge from the original data rankings are robust to changes in the assumptions of the original data, namely after altering the weighting and aggregation methods along with removing each of the underlying data sources to test whether this had a significant impact on the original data. The answer is that the data is solidly robust, internally reliable and consistent for EU countries. Furthermore, although time consuming, it was indeed possible to replicate the estimates of the WGI by simply taking the underlying data sources from their website, thus speaking to the level of transparency and replicability of the data.

The sensitivity tests performed in this analysis are certainly not exhaustive, but simply a basic series of tests. Moreover the analyses employed only EU states, and thus a drawback to this study is that it cannot speak to the robustness of the world-wide data.

Time constraints simply did not allow for such an analysis, nor could we perform several of the internal consistency or sensitivity tests without having a significant amount of common underlying data sources for each indicator across the sample. Thus the advantage of the EU sample employed here is that there is ample data available for each country and that there is a significant amount of variation across the sample in all four indicators to test the stability of the data. While it is outside the scope of this analysis to answer questions of ‘conceptual relevance’, definition concerns or ‘appropriateness’ of all sources of underlying data, what this study does show is that – at least within the EU – no single data source or methodological decision to construct the four indicators, such as weights or aggregation, drive the results in a significantly biased way. Other criticisms, such as whether perceptions-data are appropriate in measuring QoG, time series issues, or whether the data are driven by recent economic success or business interests have been taken up by KKM themselves. However, based on the results here, researchers that employ this data – in particular for comparisons within the EU – can be more assured that the quality of this particular QoG data is reliable and internally sound when making cross-country comparisons.

24

Sources

Apaza, Carmen (2009) Measuring Governance and Corruption through the Worldwide

Governance Indicators: Critiques, Responses, and Ongoing Scholarly Discussion, PS:

Political Science & Politics 42:1 (January), 139-143.

Arndt, Christiane and Charles Oman. 2006. "Uses and Abuses of Governance

Indicators". OECD Development Center Study.

Kaufmann, D., A. Kraay and M. Mastruzzi 2006. Growth and Governance: A Reply.

Journal of Politics.

Vol. 69, Issue 2.

Kaufmann, D., A. Kraay and M. Mastruzzi 2008. Governance Matters VII: Aggregate and Individual Governance Indicators: 1996-2007.

Knack, Steven. 2006. "Measuring Corruption in Eastern Europe and Central Asia: A

Critique of the Cross-Country Indicators". World Bank Policy Research

Department Working Paper 3968.

Kurtz, M.J. and A. Schrank 2007. Growth and Governance: Models, Measures, and

Mechanisms. Journal of Politics.

Vol. 69, (2):

OECD. 2008. “Handbook on Constructing Composite Indicators: Methodology and

User Guide”. Joint publication by the OECD and JRC European Commission

Pollit, C. 2008. ’Moderation in All Things’: Governance Quality and Performance

Information. Presented at the SoG conference, Göteborg, Sweden 2008.

Thomas, Melissa, What Do the Worldwide Governance Indicators Measure? (July 16,

2009). European Journal of Development Research .

25

1 DENMARK

2 SWEDEN

3 FINLAND

4 NETHERLANDS

5 AUSTRIA

6 UNITED KINGDOM

7 GERMANY

8 LUXEMBOURG

9 FRANCE

10 IRELAND

11 BELGIUM

12 MALTA

13 CYPRUS

14 ESTONIA

15 PORTUGAL

16 CZECH REPUBLIC

17 SLOVENIA

18 SPAIN

19 SLOVAKIA

20 HUNGARY

21 LITHUANIA

22 LATVIA

23 GREECE

24 POLAND

25 ITALY

26 BULGARIA

27 ROMANIA

Table 1: Overall Rankings of the E.U. States for the 4 WGI QoG Indicators

1.Government

Effectiveness 2. Control of Corruption 3. Rule of Law 4. Voice & Accountability

1 FINLAND

2 DENMARK

3 SWEDEN

4 NETHERLANDS

5 LUXEMBOURG

6 AUSTRIA

7 GERMANY

8 IRELAND

9 UNITED KINGDOM

10 FRANCE

11 BELGIUM

12 SPAIN

13 PORTUGAL

14 CYPRUS

15 MALTA

16 SLOVENIA

17 ESTONIA

18 HUNGARY

19 SLOVAKIA

20 POLAND

21 CZECH REPUBLIC

22 LATVIA

23 LITHUANIA

24 ITALY

25 GREECE

26 ROMANIA

27 BULGARIA

1 DENMARK

2 AUSTRIA

3 SWEDEN

4 FINLAND

5 LUXEMBOURG

6 NETHERLANDS

7 IRELAND

8 GERMANY

9 UNITED KINGDOM

10 MALTA

11 FRANCE

12 BELGIUM

13 SPAIN

14 CYPRUS

15 PORTUGAL

16 ESTONIA

17 SLOVENIA

18 CZECH REPUBLIC

19 HUNGARY

20 GREECE

21 LITHUANIA

22 LATVIA

23 POLAND

24 SLOVAKIA

25 ITALY

26 ROMANIA

27 BULGARIA

1 SWEDEN

2 NETHERLANDS

3 LUXEMBOURG

4 DENMARK

5 FINLAND

6 IRELAND

7 BELGIUM

8 AUSTRIA

9 GERMANY

10 UNITED KINGDOM

11 FRANCE

12 MALTA

13 PORTUGAL

14 SPAIN

15 ESTONIA

16 CZECH REPUBLIC

17 SLOVENIA

18 HUNGARY

19 CYPRUS

20 ITALY

21 SLOVAKIA

22 GREECE

23 LATVIA

24 POLAND

25 LITHUANIA

26 BULGARIA

27 ROMANIA

Table 2: Principle Component Analysis - Check of the Underlying Data Consistency

Pillar Sig. Factor Eigenvalue Proportion

Government Effectiveness (9)

Control of Corruption (9)

Rule of Law (12) 1

2

Voice & Accountability (10)

Note: number of common sources in parentheses

2 1.17

7.52 0.63

1.41 0.12

0.12

26

Table 3: Pearson's Correlation Coefficients between Underlying Data, Pillar and QoG Index

Pillar Availability Indicator Pillar Pillar Availability Indicator Pillar

Correlation Correlation

Government

Effectiveness

Control of

Corruption

All (9)

Regional (2)

Limited (1)

All (9)

Regional (4)

BRI

DRI

EIU

PRS

WMO

GCS

WCY

GWP

EGV

BPS

BTI

IPD

BRI

DRI

EIU

GAD

PRS

WMO

GCS

GWP

WCY

BPS

BTI

FRH

GII

.76***

.56***

.92***

.88***

.81***

.84***

.84***

0.71***

.39**

-0.33

0.53

.81***

0.83***

0.60***

0.93***

0.82***

0.94***

0.86***

0.96***

0.86***

0.95***

-0.21

0.70**

0.93***

-0.55

Rule of Law All (12) BRI

DRI

EIU

GAD

PRS

WMO

GCS

GWP

WCY

HER

HUM

TPR

Regional (4) BPS

BTI

GII

Limited (1) IPD

Voice &

Accountability

All (10) WMO

FRH

PRS

EIU

RSF

HUM

WCY

GCS

GAD

.81***

.46***

.87***

0.28

.73***

.67***

.80***

.68***

Limited (2) IPD 0.92*** GWP .70***

GCB 0.76*** Regional (3) BTI

GII

OBI

Limited (2) IPD

MSIª

.81***

0.45

.69**

.77***

0.99

ªonly available for Bulgaria & Romania.

Number of sources in parentheses under ‘availability. For a full list and description of variable sources, see the

Appendix.

‘all’ means that the data is available for all states, ‘regional’ refers to data being mainly available for only one area within the E.U. while ‘limited’ refers to data that is only available for less than 4 countries.

*** p<.01, **p<.05

.72***

.78***

.63***

.38**

0.02

0.74**

-0.51

0.82

.90***

.85***

.62***

.92***

.65***

.38**

.80***

.86***

.77***

27

Figure 1 – Results of the Cluster Analysis

Note: Malta (not seen) is ‘average QoG’, same color as France or Spain

Figure 2

Cluster Means Over the 4 Pillars of QoG

2.5

2

1.5

1

0.5

0

QoG(total) CC VA RoL GE

Cluster 1

Cluster 2

Cluster 3

28

Figure 3

Rule of Law in E.U. Member States mean RoL value 90% c.i. range

29

Table 4: Frequency Matrix of a EU Country's Rankings: Rule of Law

Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27

1

2

3

Denmark

Austria

Sweden

4 Finland

5 Lux

6 Netherlands

7 Ireland

90 10

81 18

10 83 5

18 78

5

8 Germany

9 United Kingdom

10 Malta

11 France

65 32

66 32

5

12 Belgium

13 Spain

14 Cyprus

15 Portugal

16 Estonia

17 Slovenia

18 Cz. Rep

19 Hungary

94

16 16 61 6

9 77 13

13 73 12

61 23 14

27 29 40

25 64 9 12

21 65 13

13 83

20 Greece

21 Latvia

22 Lithuania

23 Poland

24 Slovakia

25 Italia

26 Romania

27 Bulgaria

5 21 71

57 27 14

100

54 30 16

100

100 note: Frequencies are calculated from the 78 simulations with alternative aggregation, weighting and removing one indicator at a time

For example, Finland ranks in either the 3rd or 4th position 78% of the time, while 1st or 2nd 18% of the time. Frequencies under 5% are not shown. All figures rounded to the nearest percent. Bold numbers represent a majority of simulation results equaling a country’s original rank position.

Table 5: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and Exclusion of

Single Indicators on Rule of Law (10 Most Extreme Scenarios)

Scenario Aggregation Weighting Excluded Median Max

Indicator

Spearman

Rank Coefficient

30 Arithmetic

71 Geometric

45 Geometric

58 Geometric

19 Arithmetic

54 Geometric

22 Arithmetic

46 Geometric

57 Geometric

59 Geometric

FA

FA

Original

Equal

Equal

Equal

Equal

Original

Equal

Equal

Eiu

Prs

Prs

Prs

Prs

Bri

Gwp

Wmo

Gad

Wmo

1

1

2

1

1

1

2

5 (Italy+)

5 (Italy+)

5 (Italy+)

5 (Italy+)

0.943

0.949

0.949

0.95

4

5 (Hungary+)

0.951

0.954

5 (Italy, Hun. +) 0.955

1 5 0.955

2 5 (Italy+) 0.955

30

Figure 4

Government Effectivness in EU Member States

mean GE value 90% c.i. range

31

Table 6: Frequency Matrix of a EU Country's Rank

Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27

1 DENMARK

2 SWEDEN

3 FINLAND

4 NETHERLANDS

5 AUSTRIA

100

6 UNITED KINGDOM

16

48

75 8

21 66 8

45

7 GERMANY

8 LUXEMBOURG

25

18

28

63

9 FRANCE

10 IRELAND

11 BELGIUM

12 MALTA

5

5

6

35

33

16

13 CYPRUS

14 ESTONIA

15 PORTUGAL

16 CZECH REPUBLIC

17 SLOVENIA

18 SPAIN

11 63 25

13 31 50 5

15 11 73

6 80 13

15 65 13 5

71 5 18

19 SLOVAKIA

20 HUNGARY

21 LITHUANIA

22 LATVIA

23 GREECE

24 POLAND

5 70 25

18

25 ITALY

26 BULGARIA

8 23 15 53

100

27 ROMANIA 100 note: Frequencies are calculated from the 78 simulations with alternative aggregation, weighting and removing one indicator at a time

For example, Netherlands ranks in either the 3rd or 4th position 75% of the time, while 5th or 6th 24% of the time. Frequencies under 5% are not shown

Table 7: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and

Exclusion of Single Indicators on Government Effectiveness (10 Most Extreme Scenarios)

Scenario Aggregation Weighting Excluded Median Max Spearman

Indicator Rank Coefficient

37 Geometric

47 Geometric

17 Arithmetic

34 Geometric

44 Geometric

39 Geometric

49 Geometric

59 Geometric

36 Geometric

46 Geometric

Original

Equal

Equal

Original

Equal

Original

Equal

FA

Original

Equal

Gcs

Gcs

Gcs

Eiu

Eiu

Wcy

Wcy

Wcy

Wmo

Wmo

2

2

2

1

2

2

2

2

2

2

7 (Ireland+)

7 Ireland +)

7 (Ireland+)

7 (Italy+)

7 (Italy+)

7 (Italy+)

7 (Italy+)

5 (Spain+)

6 (Slovak-, Pol.+)

6 (Poland+)

0.907

0.907

0.914

0.925

0.925

0.928

0.928

0.928

0.932

0.932

32

Figure 5

Corruption in E.U. Member States

CC estimate 90% c.i. range

33

Table 8: Frequency Matrix of a EU Country's Corruption Rank

Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27

1 FINLAND 97

2 DENMARK 92 7

3 SWEDEN 9 90

4 NETHERLANDS 99

5 LUXEMBOURG 99

6 AUSTRIA 50 44

7 GERMANY 54 10

8 IRELAND

9 UNITED

10 FRANCE

11 BELGIUM

12 SPAIN

18 HUNGARY

19 SLOVAKIA

20 POLAND

88 12

13 PORTUGAL

14 CYPRUS

68 28

35 60

15 MALTA 13 57 30

16 SLOVENIA 10 62 20 5

17 ESTONIA 7 30 62

100

95

21 CZECH

22 LATVIA

23 LITHUANIA

24 ITALY

25 GREECE

26 ROMANIA

7 82

17

5

99

100

83

100

27 BULGARIA 100 note: Frequencies are calculated from the 60 simulations with alternative aggregation, weighting and removing one indicator at a time

Frequencies under 5% are not shown. Bold numbers represent a majority of simulation results equaling a country’s original rank position.

Table 9: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and

Exclusion of Single Indicators on Corruption (10 Most Extreme Scenarios)

Scenario Aggregation Weighting Excluded Median Max Spearman

Indicator Rank Coefficient

36 Geometric

58 Geometric

46 Geometric

26 Arithmetic

16 Arithmetic

12 Arithmetic

54 Geometric

28 Arithmetic

34 Geometric

38 Geometric

Original

FA

Equal

FA

Equal

Equal

FA

FA

Original

Original

Prs

Gcs

Prs

Prs

Prs

Bri

Eiu

Gcs

Eiu

Gcs

1

1

1

1

1

1

1

1

1

1

4 0.971

3 0.971

4 0.972

4 0.974

4 0.975

4 0.981

3 0.981

3 0.982

3 0.982

3 0.982

34

Figure 6

Voice & Accountability in E.U. Member States

number mean VA value 90% c.i. range

35

Table 10: Frequency Matrix of a EU Country's Voice & Accountability Rank

Original Rank Country 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 17,18 19,20 21,22 23,24,25 26,27

1 Sweden

2 Netherlands

3 Lux

4 Denmark

5 Finland

6 Ireland

7 Belgium

99

86 14

8 Austria

9 Germany

10 United

93 6

9

60 33 6

11 France

12 Malta

13 Portugal

14 Spain

15 Estonia

17 Slovenia

5 74 16 5

14 49 37

59 35

8 12 67 13

9 90

84 9 6

18 Hungary

19 Cyprus

20 Italia

21 Slovakia

22 Greece

23 Latvia

24 Poland

25 Lithuania

100

6

15 84

26 Bulgaria

27 Romania

25 75

8 92

66 34 note: Frequencies are calculated from the 66 simulations with alternative aggregation, weighting and removing one indicator at a time.

Frequencies under 5% are not shown. Bold numbers represent a majority of simulation results equaling a country’s original rank position.

Table 11: Sensitivity Analysis: Impact of Assumptions of Weighting, Aggregation and

Exclusion of Single Indicators on Voice & Accountability (10 Most Extreme Scenarios)

Scenario Aggregation Weighting Excluded Median Max Spearman

Indicator Rank Coefficient

20 Arithmetic

31 Arithmetic

29 Arithmetic

18 Arithmetic

13 Arithmetic

19 Arithmetic

24 Arithmetic

30 Arithmetic

16 Arithmetic

27 Arithmetic

Equal

FA

FA

Equal

Equal

Equal

FA

FA

Equal

FA

GCS

GCS

HUM

HUM

WMO

WCY

WMO

WCY

EIU

EIU

2 9 (Cyprus+) 0.916

2 8 (Cyprus+) 0.916

2 10 (Cyprus+) 0.918

2 9 (Cyprus+) 0.919

2 11 (Cyprus+) 0.92

2 10 (Cyprus+) 0.921

2 11 (Cyprus+) 0.921

1 11 (Cyprus+) 0.921

2 10 (Cyprus+) 0.922

2 10 (Cyprus+) 0.922

36

Appendix – List of sources

List of Underlying Sources of World Bank Governance Data for E.U. Countries

BTI

IPD

FRH

GII

GCB

HER

HUM

TPR

OBI

MSIª

BRI

DRI

EIU

PRS

WMO

GCS

WCY

GWP

EGV

BPS

Business Environment Risk Intelligence

Global Insight Global Risk Service

Economist Intelligence Unit

Political Risk Services International Country Risk Guide

Global Insight Business Condition and Risk Indicators

Global Competitiveness Report

Institute for Management Development World Competitiveness Yearbook

Gallup World Poll

Global E-Governance Index

Business Enterprise Environment Survey

Bertelsmann Transformation Index

Institutional Profiles Database

Freedom House

Global Integrity Index

Global Corruption Barometer Survey

Heritage Foundation Index of Economic Freedom

Cingranelli Richards Human Rights Database

US State Department Trafficking in People report

International Budget Project Open Budget Index

International Research & Exchanges Board Media Sustainability Index

For a more thorough look at each individual indicators, see: http://info.worldbank.org/governance/wgi/sources.htm

37

Download