Naming Bad Performance: Can Performance Disclosure Drive Improvements?

advertisement
Naming Bad Performance:
Can Performance Disclosure Drive Improvements?
Asmus Leth Olsen∗
Department of Political Science
University of Copenhagen
ajlo@ifs.ku.dk
Invited to revise and resubmit at
Journal of Public Administration Research and Theory (JPART).
Presented at the Midwest Political Science Association (MPSA).
Session: The Psychology and Perception of Performance Measures.
Chicago, 14th of April, 2012.
Abstract
If poor performance is disclosed, poorly performing organizations are likely to
face reputational damage and therefore improve their performance. This simple
hypothesis is tested with performance information on gender equality among Danish public organizations. The initiative aimed at measuring intra-organizational
gender policies and actual gender representation at different organizational levels.
A regression discontinuity design is proposed for estimating the causal effect of
the disclosed performance on subsequent improvements. The analysis finds no or
small effects of performance disclosure on subsequent improvements. Counter to
expectations, some results even indicate that feedback about poor performance
leads to further deterioration in performance. The results imply that performance
information in low-salience policy areas can be ignored by the organizations under scrutiny. The effect of disclosing performance information is thus likely to
be reliant on media coverage and public dissemination. Disclosing performance
information is by itself no quick fix to improving organizational performance.
Keywords: performance information · naming and shaming · reputation
∗
Many people have provided valuable comments and suggestions to earlier drafts of this manuscript
including: Hanne Foss Hansen, Jacob Gerner Hariri, Kasper M. Hansen, Peter Thisted Dinesen, Kristian
Anker-Moeller, and Simon Calmar Andersen.
1
Sunlight is said to be the best disinfectant: electric light the most efficient policeman.
–Brandeis (1914, 62)1
Does the disclosure of performance information affect organizational performance? The worldwide dissemination of public sector performance information is well documented and indicates
that policy makers believe in the wonders of performance disclosure (Pawson 2002; Hood
2007; Dixon et al. 2008). Some argue that disclosure, by itself, will lead to improvements in
performance: that is, even in the absence of formal incentives to improve, performance measures are believed to affect organizational performance solely by public disclosure (Pawson
2002; Bird et al. 2005; Mason and Street 2006). For instance, disclosure can affect organizational performance due to concerns over organizational reputation (Pawson 2002; Bevan and
Hood 2006), political pressure (Pawson et al. 2005; Van de Walle and Roberts 2008), public
image (Hibbard et al. 2003), embarrassment effects (Mayston 1985; Johnsen 2005), public
humiliation (Le Grand 2010, 61), or risk avoidance and scapegoating (James and John 2007;
Van de Walle and Roberts 2008). In the organizational classics, salient performance gaps
and organizational failures have long been believed to stimulate performance improvements
(Cyert and March 1963; Downs 1967). However, there are few empirical studies of the actual
causal impact of performance information disclosure in the public sector, and the empirical
findings are mixed (Heinrich 2002; Bird et al. 2005; Besley et al. 2009). The question remains:
can performance be improved by naming the under-performing organizations?
In this article, I test the causal effect of disclosing performance information on the subsequent organizational performance. I apply the publication of performance information on the
topic of gender equality within Danish state and local institutions. From 2003 and onwards
the Danish government initiated a biannual publication on gender equality within state institutions and local governments. Empirically, this article adds to the existing research in two
important aspects: 1) by separating disclosure from other performance changing mechanisms
of performance information, and 2) by illustrating a design which allows one to separate the
effect of reputational damage from other confounding factors.
First, the case allows for separating reputational effects from other mechanisms through
which performance information can affect organizational performance. These alternative
mechanism include formal sanctions or rewards and threats of exit or voice (Hirschman
1970). Traditionally, the publication of performance information has been restricted to highsalience policy fields. These include health and education which are characterized by strong
1
The quote by justice Louis D. Brandeis has been highlighted by both Fisse and Braithwaite (1983) and
(Pawson 2002).
2
mechanisms of exit or voice, and where performance is often linked to formal sanctions and
rewards (Hood 2007). The question is how effective naming and shaming is when introduced
to less salient policy areas where the audience for keeping organizations accountable is more
diffuse, the options and incentives to use exit or voice are unclear, and no formal sanctions
or rewards are linked to the performance information. This is exactly the case for the gender
equality performance measures in Denmark.
Second, a key obstacle to studying the effect of performance measures is their inherent
endogenous nature. Multiple organizational factors affect both the measured organizational
performance and the following potential performance change. For instance, management incentives and organizational resources are potential confounding, unobservable factors which
affect both initial disclosed performance and subsequent responses. To meet this methodological challenge, I exploit a natural experiment on how performance measures are calculated
and presented. Specifically, a regression discontinuity design (RDD) is applied (Thistlethwaite and Campbell 1960). With the RDD, I exploit sharp cut-offs in how the performance
on gender equality was publicized. By doing so, I am able to isolate the effect of poor performance feedback on subsequent performance from potential confounding factors. Current
studies of performance measures apply mostly observational data and few seem to address
the fundamental problems to causal inference. Some have used various experimental designs
in order to estimate the causal effect of performance measures on organizations and citizens
(James 2011; Hibbard et al. 2005, 2003). With the natural experimental RDD, a pragmatic
middle-ground is found between rigorous causal inference and ethical and political concerns
of experimental studies into performance measures.
The findings indicate that the isolated effect of performance information disclosure may
be very limited to non-existent. Organizations being outed as performing poorly in terms
of gender equality saw no more improvements in their performance compared with those
who gained more favorable ratings. Theoretically, this challenges the idea of a performance
enhancing effect from solely naming poorly performing organizations. This is an important
finding which indicate that performance information cannot only have either performance
improving effects or induce unintended consequences. In fact, performance information can
also be ignored and discarded by the organizations under scrutiny. From a societal point
of view, the article questions if the diffusion of performance disclosure to less salient policy
areas is an effective way of enhancing performance and achieving policy goals.
The paper is structured in the following manner: first, I formulate a general hypothesis on
how performance disclosure via reputational damage can affect organizational performance.
Second, the empirical setting of the Danish gender equality reports is outlined. Third, a
regression discontinuity design is proposed for testing the effect of potential reputational
3
damage. Finally, the results are presented and implications for further studies of the effects
of performance measures are discussed.
Performance Disclosure, Reputational Damage, and Improvements
The performance literature has multiple accounts of how performance measures can affect organizations both indirectly via exit or voice and directly via formal rewards or punishments.
Citizens have been found to hold organizations accountable via performance information
at the ballot box or exit options (James and John 2007; Boyne et al. 2009; James 2011;
Lavy 2010). Managers and street-level bureaucrats haven been affected by performance pay
schemes (Heckman et al. 1997; Lavy 2009) or general career concerns (Propper et al. 2010).
However, what is less clear is whether disclosing performance measures by itself can improve
organizational performance? There are few empirical studies of the causal impact of disclosing
performance information and their empirical findings are mixed (Heinrich 2002; Bird et al.
2005; Besley et al. 2009). Bird et al. (2005, 18) question the effectiveness of disclosing performance measures and call for further research into how different dissemination strategies
affect organizational behavior. Besley et al. (2009) look at “naming and shaming effects”
inherent in the English hospital star rating system and find some evidence of improvements.
In the case of school performance, Hanushek and Raymond (2005) found that there is only a
positive subsequent achievement effect if formal consequences are put in place. On the other
hand, Figlio and Rouse (2006) (also in a school context) find that the stigma of a low ranking
among local schools had greater impact on future improvements than the more formal threat
of exit via vouchers. The strongest case for the effect of naming poor performance has been
made by Hibbard et al. (2003, 2005) in an experimental study of US hospitals. One of the
findings is that managers of poor performing hospitals care about hospital reputation even
in the absence of treats of exit and voice options (Hibbard et al. 2005). The hypothesis put
forward here will draw on the notion that negative feedback from performance information
can be a source of reputational damage to organizations. Importantly, this potential reputational damage gives managers and policy makers an incentive to improve organizational
performance.
The often cited concept of “naming and shaming” points to the idea that the simple disclosure of performance information can affect organizations (Pawson 2002; Bird et al. 2005).
’Naming’ denotes the disclosure of performance measures which makes it possible to identify
poor performing organizations (Pawson 2002, 215). By publicizing organizations’ performance
it becomes easier to identify and point fingers at presumably low-performing organizations
(Pawson 2002). It has been argued that disclosing poor performance will potentially generate
political pressure (Van de Walle and Roberts 2008), concern about public image (Hibbard
4
et al. 2003), evoke an embarrassment effect (Mayston 1985; Johnsen 2005), humiliate the
employees (Le Grand 2010, 61), or scapegoating (Johnsen 2012).
Here I use reputational damage as a general description of the non-formal effects of
disclosing performance measures (Hibbard et al. 2003, 2005; Bevan and Hamblin 2009). Organizational theorists have no unified working definition of organizational reputation (Lange
et al. 2011, 155). However, many accept that organizational reputation denotes the belief
held by outsiders about what distinguishes an organization (Dutton et al. 1994). Following
Carpenter and Krause (2012), organizational reputation can be defined as, “a set of beliefs
about an organization’s capacities, intentions, history, and mission that are embedded in a
network of multiple audiences.” Reputation is a matter of being known for something, that is,
organizations have a varied set of reputations for something, which corresponds to different
issues that are important to different audiences. Reputational effects are the initial effect of
disclosing performance information (Bevan and Hood 2006, 519). In general, organizational
reputation is shaped by the historical behavior of the organization and will be updated by
new information (Lange et al. 2011, 154).
Performance disclosure can affect organizational performance through realized reputational costs as well as via potential reputational damage (Pawson 2002, 216). The organizational reaction to disclosed performance information can be seen as part of the blame
avoidance game (Weaver 1986; Hood and Dixon 2010). The negativity bias held by most
actors, not least citizens (Lau 1982) and the media (Soroka 2006), shifts the focus of policy
makers and managers toward avoiding reputational damage rather than seeking credit. Hood
and Dixon (2010) argues that public managers and policy makers care about blame if it
damages their reputation in ways that will harm their careers and possibilities of promotion.
Organizational studies find that damages to organizational reputation can affect employees
and that this effect is even stronger for reputational damage caused by public dissemination
(Dutton and Dukerich 1991; Sutton and Callahan 1987). For instance, Mannion et al. (2005)
and Bevan and Hamblin (2009) show how the English star rating system for hospitals affected the low-scoring hospitals with a feeling of shame among hospital employees. From this
perspective reputational damage is not only affecting incentives to improve: improvements
can also stem from how reputational damage turns on professional norms about improving
on bad performance (Hibbard et al. 2005, 1151).
In both cases, organizations will go to great lengths to restore reputational damage.
Given that the media, the public, and citizens are affected by a negativity bias, being named
as performing poorly will cause more profound reputational damage (Johnsen 2012). The
experience of failure has been suggested to be a highly salient feedback signal which attracts
organizational attention and increases search for solutions (March and Simon 1958; Cyert
5
and March 1963), reduces slack (Cyert and March 1963; Levinthal and March 1981), induces
risk-taking (Kahneman and Tversky 1979), affects motivation to adapt (Sitkin 1992) and
increases the likelihood of organizational change in general (Greve 1998). Downs (1967, 191)
has noted how performance gaps will provide a motivation to look for alternative actions
which are believed to satisfy aspirational levels (Johnsen 2012).
We have now established how disclosing poor performance measures can cause reputational damage and thereby turn on incentives (or professional norms) for improving organizational performance. This leads to the following hypothesis: Organizations which are
publicly named for poor performance will improve their subsequent performance in order to
avoid reputational damage. This simple hypothesis captures the central reasoning underlying
government efforts to name and shame via performance disclosure. Thus, while it seems to
represent a simplistic view of public organizations, it is at the same time highly relevant to
policy making.
Empirical Setting: The Case of Danish Gender Reports
As noted in the introduction, it is difficult to isolate the effect of reputational damage from
other mechanisms through which performance information can affect organizational performance. In addition, there are confounding factors which affect both the disclosed performance,
the assignment of reputational damage, and subsequent performance improvements. Given
these difficulties I test the proposed hypothesis with data from the case of the Danish gender
reports. In the following, some context for this case is provided and it is explained how it
may be seen as a case for testing the isolated effect of disclosing poor performance.
Danish gender equality policy is located under the Ministry for Gender Equality.2 In 2011
the Department of Gender Equality had a budget of approximately 3 million USD (DKK
14.1 million) and administered funds for different gender equality purposes of more than 20
million USD. In addition to developing the government’s gender policy the department is
also responsible for collecting gender equality reports from state and local institutions. In
2001 the Danish parliament passed a law mandating local governments, counties and larger
state agencies to report a number of gender policy related figures on a two-year basis. This
paper applies data for the period between 2003 and 2005. In 2003 it became mandatory
for municipalities to report a number of indicators to the Ministry for Gender Equality
every second year. The results for local and regional governments were published online in
December 2003 and results for central government institutions in February 2004. This was
repeated again in 2005 with new sets of indicators being published online.
2
The minister is typically also co-responsible for another larger ministerial resort where the designated Department of Gender Equality is organizationally integrated.
6
The different indicators of the gender equality reports are centered on four different aspects: First, gender policy indicators which capture the extent to which an organization has
defined gender specific policies. Second, it focuses on content and initiatives which are directed towards the concrete actions and decisions an organization has taken to strengthen
gender matters. Third, leadership equality covers indicators for the degree of representation of
females at different executive levels of the organization. And finally, for organizations with an
independent political body (such as counties and municipalities), the degree of representation
at the political level is measured.
Hibbard et al. (2003) has proposed four requirements for a performance scheme to be
effective via reputational pathways3 . First, the performance scheme should take the form of
a ranking. In this case a simple color scheme was used to rank organizations with respect to
their gender policy focus and achievements. The color red was given to the lowest scoring,
yellow to the middle group and green to the highest scoring.4 Second, it should be publicized
and disseminated to the public. The gender reports were published on a website with the
sole purpose of reporting on the results of the reports.5 Third, it should be easily digestible
by the public. The gender reports applied traffic-light colored maps which clearly makes the
results visually digestible for a wider audience. These traffic light colors are obviously not a
coincidence. Colors have both physiological and psychological effects.6 For municipalities and
regional councils political maps are presented with the colors. At the state level organizational
names are colored.7 The mapping of results foster comparison. The map gave visitors a quick
identification of specific municipalities and they instantly provided a means for comparison
with neighboring municipalities. Finally, the performance information initiative should entail
a follow up with subsequent periods of measurement. The gender policy reports were from
the outset planned as an ongoing process with reporting every second year. To this date
performance data has been published for 2003, 2005, 2007, 2009 and 2011. Participating
organizations were therefore well aware that they would be measured again on their gender
policy efforts. At face value, one may conclude that the case of the Danish gender equliaty
reports live up to the standards set out by Hibbard et al. (2003).
3
These requirements are also discussed extensively in Bevan and Hamblin (2009, 165).
Green colored organizations are described as working extensively with equality and as having achieved
results on gender equality. Yellow colored organizations work to some extent with gender equality and/or
have achieved some results on the matter. Red colored organizations focus to a lesser extent with gender
equality and/or have achieved fewer results.
5
The main page today is: www.ligestillingdanmark.dk. The reports for 2003 and 2005 can be found under
www.2003.ligestillingidanmark.dk and www.2005.ligestillingidanmark.dk.
6
For the psychological effects, the color red is among other things associated with negativity, anger and
danger. Green on the other hand engenders feelings of comfort, relaxation and calmness. Color highlighting
was also applied in the American Quality Counts reports of hospitals studied by (Hibbard et al. 2003).
7
By clicking on the map or an organizational name the viewer could see the exact aggregate score underlying
the coloring.
4
7
The case of the Danish gender policy reports represents a topic of lower policy salience
policy than typically found in the research for performance disclosure. Most studies of naming
and shaming efforts have been conducted for public services with a clearly defined user-group
to hold the responsible accountable. These include settings such as hospital services, schools
performance, or general local government services (Hibbard et al. 2003; Lindenauer et al.
2007; Propper et al. 2010; Hemelt 2011). These policy areas have high saliency, take up many
resources, and see frequent interaction with a well-defined set of citizens (i.e. patients, kids,
parents, etc.). Accordingly, citizens have incentives to vote with their feet or use their voice
option as a response to the reported performance (Hirschman 1970). In addition, the central
government can attach formal rewards or punishments to how organizations perform. This
is not the case for the Danish gender policy reports. Thus, the case allows us to separate
reputational effects of disclosing performance information from alternative mechanisms of
exit or voice as well as formal rewards or sanctions. In addition, the performance scheme
resembles the ones applied in more high-salience policy areas. The case thus resembles a
clear trade off. On one hand the low salience implies that findings for this particular case are
difficult to generalize to more traditional high-salience areas of performance information such
as health and education. On the other hand, the low-salience characteristic is what allows
us to isolate potential reputational effects from the alternative mechanisms by performance
measure work.
Research Design: Regression Discontinuity Design
It is inherently difficult to estimate the effect of performance measures on some outcome. One
can think of a number of confounding factors which affect to what extent an organization
is measured as performing poorly, and how the organization changes behavior in response
to such information. However, many of these factors cannot be observed or controlled for.
Therefore, some researchers have used experimental designs to study the effects of performance information. For instance, James (2011) uses both field and laboratory experiments
to estimate the causal impact of performance information on citizens service satisfaction
and vote intention. Hibbard et al. (2003) and Hibbard et al. (2005) use field experiments to
test the causal impact of performance reports among US hospitals on quality improvements,
market share, and reputation.
The case of the Danish Gender reports allows for a natural experimental approach to
dealing with confounding factors. Natural experiments (sometimes) offer a straightforward
and rigorous approach to causal inference and at the same time avoid some of the ethical
and political concerns surrounding experimental studies on performance measures. Here I
employ the regression discontinuity design (RDD) to separate the reputational effect on
8
subsequent performance from potential confounding factors (Thistlethwaite and Campbell
1960; Olsen 2012). The design idea draws on Hemelt (2011) which used RDD to measure the
effects of a school accountability system introduced in the US. Comparison with experimental
benchmark indicates that the RDD may provide very similar results to actual experimental
estimates (Green et al. 2009; Berk et al. 2010; Shadish 2011). Importantly, the way each
organizations gender performance was calculated and summarized provides a good setting
for using the RDD. As already noted, in the gender policy reports the color labeling of red,
yellow, and green are given to organizations on the basis of their score on a complex index. The
index captures different gender policy-related issues. Red denoted the lowest scoring, yellow
the middle group and green the highest scoring. The assignment was determined by sharp
thresholds in the index score. This is important, because the RDD is useful for settings where
a continuous assignment variable (i.e. the index score) deterministically assign treatment of
interest (i.e. the color labels) to the subjects under study. Hereby the case allows me to
estimate the causal effects of negative performance disclosure as given by the assignment
of color labels to organizations. Specifically the colors are assigned to organizations by the
following index score thresholds:



Red label:
≤ 29.99


Gender Equality Color Scheme = Yellow label: ≥ 30 ; ≤ 54.99



Green label: ≥ 55
That is, all organizations with an index score below 29.99 are red, those with a score between
30 and 54.99 are assigned to yellow and finally the few scoring more than 55 points receive
a green mark. Such step function for performance is common for accountability systems
(Burgess and Ratto 2003, 287). In the first year of 2003 the participating organizations scored
and average of 38.3 with a standard deviation of 12.8 on the composite score. 93 organizations
received a red label for they performance, 224 a yellow label, and 36 a green label. In the
subsequent gender policy report of 2005 the average score was 36.4 with a standard deviation
of 15.4. In terms of performance labels 118 organizations were given the red label, 191 a
yellow, and 44 a green label in 2005.8 Given the theoretical outline of naming and shaming, I
define those obtaining the ’red label’ as achieving the most reputational damage. That is, for
organizations in the vicinity of the index scores assigning them the red or yellow label, the
red label will constitute a larger degree of reputational damage compared to those ending up
with a yellow label. In addition, I also test for differences between getting the mediocre color
8
All the data used in the analysis was obtained directly from the Department of Gender Equality. In addition,
the obtained data files were cross checked with with actual results published online.
9
labeling of yellow and the top color of green. Formally I test this proposition in the RDD
framework by modeling the subsequent organizational performance as a function of the color
labeling and the index assignment score to color labeling in 2003:
Gender Equality Scorei,2005 = α + β1 Red Labeli,2003 + β2 Gender Equality Scorei,2003 + i
The dependent variable, Gender Equality Scorei,2005 , captures the gender policy score for each
organization in 2005. The selection variable Gender Equality Score is the aggregated gender
policy score from 2003 which deterministically allocates a color label to an organization.
Red Label2003 constitutes the treatment dummy of receiving the gender policy label denoted
by the color red (vs. yellow). A similar dummy is made to evaluate the difference between
receiving a yellow color labeling and the green color label (Yellow Label2003 ). Accordingly,
β1 is the RDD estimate of the causal effect of falling below the threshold between a red and
yellow label.9 To give β1 a causal interpretation a number of different estimations must be
made. First, the causal interpretation of β1 rests on a correctly specified functional form for
the relationship between the assignment variable and the dependent variable. For instance, if
the assignment score is modeled as a linear term but is in fact non-linear, β1 could possibly
reflect this and be biased. In order to avoid this, the model will be estimated with different
functional assumptions about Gender Equality Score2003 , including a simple linear fit (as
shown above), a quadratic, a cubic, and a 4th order polynomial (Green et al. 2009). In
addition, the models are also fitted with interactions allowing not only different functional
forms for the assignment variable but also varying lines at either side of the cut-off10 . Second,
a number of specifications are made with different subsets of the data around the threshold in
order to make the causal estimate less dependent on a correct specification of the functional
form (Green et al. 2009). Here the validity of the causal effect becomes a matter of choosing a
sample of the data in the neighborhood of the discontinuity. The narrower the bandwidth, the
smaller the chance that omitted variables will bias the RDD estimate of the causal effect.11
In summary, figure 1 outline the expectation that the color labeling affect reputation. On
the y-axis there is a hypothetical variable for the damage to an organization’s reputation.
9
Importantly, as a local average treatment effect (LATE) the RDD estimate is restricted in its external
validity to observations around the threshold.
10
In order to interpret the treatment dummy as the effect at the cut-off in the interactive models, the
assignment variable has been centered around the threshold of 30 (Green et al. 2009).
11
In the neighborhood of the threshold all organizations should on average be equal on all relevant characteristics. On the other hand, the narrower the bandwidth, the smaller the amount of available data and
accordingly a less efficient estimate of the causal effect is derived. Here I apply three bandwidth of +/2.5, 5 and 7.5 composite points around the cut-off. For these estimations the dependent variable measure
changes in gender equality to make in comparable with the models where a lagged dependent variable is
included.
10
80
60
Red Label
40
Yellow Label
20
Green Label
0
Potential Exposure to Repuational Damage
100
On the x-axis the gender policy index score is provided. The line represents the way in
which the color labels translate the gender policy score into reputational damage. I expect
that the reputational damage is more or less constant within color labels and only shifts as
organizations move between the color labels.
0
20
40
60
80
Gender Equality Score 2003
Figure 1: Theoretical Treatment of Reputational Damage Caused by the Color Labeling
The central assumption of the RDD is that around the threshold, the observations are assigned to treatment ’as if random’. That is, the organizations should not be able to affect
whether they are given a value on the assignment variable just below or above the threshold
of treatment. Accordingly, observations at one side of the threshold serve as the counterfactual for the observations at the opposite site of the threshold. In figure 1 this equals
comparing organizations just in the vicinity of the color labeling thresholds. Here I see how
organizations with almost identical gender policy scores are treated with very different levels
of organizational blame given their assigned color label. In order to validate the assumption
of random assignment of organizations around the treatment labels, it is useful to look at the
distribution of organizations around these thresholds (McCrary 2008). In figure 2 the distribution of observations around the two thresholds (red/yellow and yellow/green) are shown.
The organizations are ordered in 1 point bins on the gender index score. If organizations were
able to affect there labeling, one would expect them to sort themselves to the right side of
the threshold. Judging from the plots this is not the case. This fits well with the fact that
the participating organizations did not know the exact thresholds values when providing the
data for the reports. This supports the assumption of random assignment to color labels and
11
0.15
0.10
Density
0.05
0.00
0.00
0.05
Density
0.10
0.15
provide a more credible causal interpretation to the effects of reputational damage from color
labels.
20
25
30
35
40
45
(a) The Red-Yellow Threshold (n=177)
50
55
60
65
(b) The Yellow-Green Threshold (n=92)
Figure 2: Distribution of Organizations in 1 Point Bins in the Vicinity of the two Thresholds
Empirical Findings
First, I plot the lagged gender policy score against the change in gender policy score between
2003 and 2005. In figure 3a one can see a negative slope on the fitted line which indicates
that previously low-scoring organizations see larger increases in their gender policy score than
high-scoring ones. The coefficient for the fitted line is −0.23 and highly significant (p < 0.01).
This effect can also be evaluated in terms of differences in mean changes in the gender policy
score between organizations with different color labels. Organizations with a red label in 2003
improve on average with 5.80 (p < 0.05) gender policy points in 2005 compared with those
achieving a green label.
For those with a yellow label the improvement is 2.17 (p = 0.32) points compared with
those with a green label. At face value this could indicate that the shameful red labels were
effective at improving performance. However, both the negative slope and the differences in
the average improvement between the different color labels can also be attributed to a simple
regression to the mean effect (Linden et al. 2006). In addition, I have a potential problem
with confounding variables as organizations with a red label are likely to be different from
12
−20
●
●
●
●
●
● ●
●
●
0
20
40
60
40
●
20
●
●
●
Red
●
●
●
●
●
●
●
●
●
●
Yellow
●
●
●
●
●
80
●
●
●
● ●
●
●
0
Gender Equality Policy 2003
Green
●
●
●
●
●●
● ●●
●●
●●
●
●
●
●●
●
●
●●●
● ● ●
●
●●
●
●●
● ●
●● ●
●
●
●
●● ●
●
●● ● ●
●
●
●
●
● ●
●●
●
●
●●● ●
●
●
● ● ●●●
● ●
● ●●
●●
●
●●●●●●●● ●● ● ●●● ●
●
●
●●
●● ●
●
●
●
● ● ●●
●
●●
●●
●●
●
● ●● ●
●●
●
●
● ●●
●●
●●●● ●
●
●
●
● ● ●●
● ●●
●
●●
●●● ● ●
●
●● ● ● ●
●
●●
●● ●●●
●
●
●
●
● ●
● ●
●●
●
● ●
●●● ●● ●● ●
●
●●
●
●
●
● ●
●
● ●
●
●
● ● ● ●●●
● ●●● ● ●●
●●
●
●
●
● ●●
● ● ●
●
●●●● ●
● ●●
●
●
● ● ●● ●
● ●
● ●● ● ● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●● ● ●
●
●
●
●
● ●●
● ●
●
●●
●
●
●
●
●
●
● ● ●
●●
●
●
●
● ●●
●
●
●●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
0
●
●
●
Green
−20
0
●
●
●
Yellow
●
●
●
●
●●
● ●●
●●
●●
●
●
●
●●
●
●
●●●
● ● ●
●
●●
●
●●
● ●
●● ●
●
●
●
●● ●
●
●● ● ●
●
●
●
●
● ●
●●
●
●
●●● ●
●
●
● ● ●●●
● ●
● ●●
●●
●
●●●●●●●● ●● ● ●●● ●
●
●
●●
●● ●
●
●
●
● ● ●●
●
●●
●●
●●
●
● ●● ●
●●
●
●
● ●●
●●
●●●● ●
●
●
●
● ● ●●
● ●●
●
●●
●●● ● ●
●
●● ● ● ●
●
●●
●● ●●●
●
●
●
●
● ●
● ●
●●
●
● ●
●●● ●● ●● ●
●
●●
●
●
●
● ●
●
● ●
●
●
● ● ● ●●●
● ●●● ● ●●
●●
●
●
●
● ●●
● ● ●
●
●●●● ●
● ●●
●
●
● ● ●● ●
● ●
● ●● ● ● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●● ● ●
●
●
●
●
● ●●
● ●
●
●●
●
●
●
●
●
●
● ● ●
●●
●
●
●
● ●●
●
●
●●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
−40
20
●
●
●
Change in Gender Policy Score 2003−2005
40
Red
−40
Change in Gender Policy Score 2003−2005
the high performers with a green label. In other words, a simple comparison of color labels
and subsequent change in performance tell us very little about the effect of disclosing poor
performers.
20
40
60
80
Gender Equality Policy 2003
(a) Simple linear regression plot
(b) Regression discontinuity plot
Figure 3
As already noted, these issues are possible to avoid in the RDD setup. With the RDD I avoid
problems of regression to the mean because I only seek to identify jumps around specific
thresholds and not along the hole spectrum of the data (Thistlethwaite and Campbell 1960,
315)12 . In addition, discontinuities around color labels are less likely to be influenced by
confounding factors as organizations could not sort themselves into color labels in the close
vicinity of the color label thresholds. Next I conduct a visual inspection of the possible
discontinuities around the treatment threshold of the various colors labels. Thus, any shifts
in the line around color label thresholds indicate that an effect exists in the subsequent change
in gender equality score for organizations which initially received different color labeling. In
figure 3b the plot is made with separate smooth lines for all three color labels. Around the
thresholds for the different colors there are no particular shifts in the lines. If anything the
organizations which narrowly obtained a better color label performed slightly better in 2005
compared to those who fell short of a more attractive color label. In table 1 the effects are
tested more formally.
The different specifications vary the functional form of the assignment variable and the
12
See Linden et al. (2006) for a discussion of regression to the mean effects in the RDD.
13
Table 1: RDD estimates: effect of red labeling
Intercept
Red Label (ref: Yellow Label)
Discontinuity samples
+/- 2.5 +/- 5 +/- 7.5
3.40
3.07
1.50
(2.86) (1.84) (1.50)
-2.83
-3.69
-2.38
(3.31) (2.39) (2.05)
Gender Policy Score
Red Label*Gender Policy Score
Gender Policy Score2
Gender Policy Score3
Gender Policy Score4
N
adj. R2
Resid. sd
45
-0.01
10.99
79
0.02
10.58
112
0.00
10.95
Varying the functional form for assignment
I
II
III
IV
V
32.39∗∗ 32.76∗∗ 32.42∗∗
31.77∗∗
31.79∗∗
(1.36)
(1.62)
(1.39)
(1.74)
(1.83)
-4.73
-4.34
-4.48
-3.42
-3.38
(2.41)
(2.30)
(2.30)
(2.92)
(2.84)
0.55∗∗
0.52∗∗
0.57∗∗
0.68∗∗
0.68∗∗
(0.12)
(0.15)
(0.11)
(0.23)
(0.22)
0.14
(0.22)
0.002
-0.006
-0.00
(0.005)
(0.005)
(0.01)
-0.0002
-0.00
(0.0004)
(0.00)
0.00
(0.00)
317
317
317
317
317
0.29
0.29
0.29
0.29
0.29
11.86
11.87
11.87
11.89
11.91
OLS regressions - I: Linear, II: Linear with interaction, III: Quadratic, IV: Cubic, V: Fourth order polynomial.
Robust standard errors in the parentheses. ∗∗ p < 0.01 and ∗ p < 0.05 (both two-tailed tests).
analyzed range of data around color label thresholds. For the red label there are consistently
insignificant, negative effects on the subsequent change in gender policy score when compared
with organizations that narrowly received a yellow label. The results thereby provide no
support for the hypothesized effect. If anything the naming and shaming of the red labeling
made the affected organizations perform worse compared with the counter-factual of those
given the yellow label.
In table 2 the effects are tested for the yellow-green threshold. Here there is a similar pattern. The difference is negative for the less fortunate label but for the most part insignificant
across the various specifications. The coefficient varies in a magnitude of −4.81 to −13.95.
That is, the depreciation in performance is slightly higher for organizations which fell short of
the green label compared to organizations that narrowly received it. Overall the analysis does
not lend support to the hypothesis that disclosing poor performance of public organizations
stimulated subsequent improvements in performance.
The next step in the analysis is to test for heterogeneity in the effects. Specifically, the gender
reports included a very varied set of institutions ranging from state level ministries to local
and county governments. For this reason, I set out to test whether the effects varied between
these two very different groups of institutions. In figure 4a and 4b the effects are plotted for
local and state organizations separately. For local organizations the aggregate pattern seems
to hold with no sign of shifts in the lines around thresholds. However, for state organizations a shift is found for organizations around the yellow-green threshold. Here organizations
14
Table 2: RDD estimates: effect of yellow labeling
Discontinuity samples
+/- 2.5 +/- 5 +/- 7.5
0.35
-2.76
-1.58
(2.81) (2.12) (1.77)
-10.39
-8.28
-4.81
(6.19) (4.43) (3.06)
Intercept
Yellow Label (ref: Green Label)
Gender Policy Score
Red Label*Gender Policy Score
Gender Policy Score2
Gender Policy Score3
Gender Policy Score4
N
adj. R2
Resid. sd
20
0.04
16.63
39
0.04
15.77
70
0.01
15.04
Varying the functional form for assignment
I
II
III
IV
V
53.30∗∗ 53.52∗∗ 53.02∗∗
53.11∗
57.09∗∗
(2.00)
(2.42)
(2.42)
(2.37)
(2.84)
-7.64∗
-7.83∗
-7.26
-6.81
-13.95∗
(3.59)
(3.43)
(4.17)
(4.84)
(6.01)
0.51∗∗
0.48
0.55∗
0.60
-0.43
(0.14)
(0.33)
(0.23)
(0.32)
(0.60)
0.04
(0.37)
0.002
-0.00
-0.04∗
(0.000)
(0.00)
(0.02)
-0.00
0.00
(0.00)
(0.00)
0.00∗
(0.00)
260
260
260
260
260
0.25
0.25
0.25
0.24
0.26
12.07
12.09
12.09
12.11
12.01
OLS regressions - I: Linear, II: Linear with interaction, III: Quadratic, IV: Cubic, V: Fourth order polynomial.
Robust standard errors in the parentheses. ∗∗ p < 0.01 and ∗ p < 0.05 (both two-tailed tests).
20
●
●
−20
0
●
●
●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
● ●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●● ●
●●
●
●
● ●
●
● ● ●● ●●
●
●
● ● ●●
●
●
●●
●
●
●
●●●● ●
●●●●●
●
●●
● ● ●
● ●●● ● ●
●
● ●
●●●●● ●●
● ● ● ●
●
●
● ●
●
●●●
● ●
●
● ●● ●
●●
●
●
● ●
●
● ●
●
●
●
● ● ● ●●
●
●● ● ●●● ● ●
●
●
●● ● ● ●
● ●●
●
●
●
●● ●
● ●
● ●● ● ● ●● ●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
● ●
●
●
● ●●
●
●
●
●
● ●
●
●
●
●
●
●
0
20
40
60
Yellow
20
Green
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●●
● ●●
● ●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●● ● ●
●
0
●
●
Red
−20
Green
●
●
●
●
●
●
● ●
●●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
80
● ●
●
0
Gender Policy Score 2003
20
40
60
Gender Policy Score 2003
(a) Local and county organizations
(b) State organizations
Figure 4
15
●
●
●●
−40
Yellow
●
Change in Gender Policy Score 2003−2005
40
Red
−40
Change in Gender Policy Score 2003−2005
marginally failing to obtain a green label see larger deterioration in the subsequent 2005 score
compared to those just receiving a green label (as indicated by the upward shift in the line
above the threshold.)
80
Formally, I therefore specify separate models for the subsets of local/county organizations
on one hand and state institutions on the other. In table 3 only the main effects of interest are
reported (i.e. the coefficients of the dummy variable for each color labeling). First, there are no
significant effects for the local/county organizations. In addition, most of the coefficients are
negative which indicates that local and county jurisdictions that got a red label performed
slightly worse than those who obtained a yellow label. A similar pattern is found for the
threshold between obtaining the yellow and green label. There is no significant improvement
effect of obtaining the less favorable rating of the yellow label and most coefficients are even
negative.
However, for the state institutions one can see some significant differences. As indicated by
the plot, organizations failing to obtain a green label seem to deteriorate much more in their
performance compared to those that obtained a green label. The coefficients are relatively
stable and vary across the different specifications from −9.39 to −27.83. This effect runs
contrary to our expectation and seems to suggest that getting very favorable performance
feedback can help sustain high performance. This being said, the initial plot indicated that
the effect is mostly driven by a small number of organizations which fell below the green
label threshold and saw sharp decreases in their 2005 gender policy score. I will therefore be
cautious with drawing any strong conclusions from this finding, but rather emphasize that
the sub-level analysis also rejects the expected performance-enhancing effect of disclosing
poor performance results.
Finally, a number of robustness checks were made. First, all models were also estimated as
Tobit regressions. With Tobit models one can define whether the data is left censored, right
censored, or both. For the case of the gender reports, the gender score was bounded in the
interval between 0 and 100. That is, it is impossible for organizations to get negative gender
policy scores as well as scores of more than 100. Tobit regressions for this interval reveals
substantially similar results to the ones already reported. In addition, a sensitivity check was
made to test for effects due to changes of organization names. From 2003 to 2005 around 11
organizations changed their names, in part because of organizational amalgamations etc. To
make sure that dramatic changes in the gender policy performance were not correlated with
these changes, I estimated all the models without these 11 organizations. Again the models
were substantially similar to the full models.Both robustness checks and sensitivity analysis
can be obtained from the author.
Conclusion and Limitations of the Study
The world-wide dissemination of public sector performance disclosure is well documented.
However, their ability to improve organizational performance via reputational damage is
16
Table 3: RDD estimates: subgroups of local/county and state organizations.
Discontinuity Sample
+/- 2.5 +/- 5 +/- 7.5
I
Parametric Specifications
II
III
IV
V
Local and County Level:
Red Label (ref: Yellow Label)
N
adj. R2
Resid. sd
Yellow Label (ref: Green Label)
N
adj. R2
Resid. sd
-4.40
(4.18)
32
0.01
11.46
-4.51
(2.92)
59
0.02
11.11
-1.93
(2.40)
84
-0.00
11.03
-2.74
(2.72)
227
0.35
11.09
-2.85
(2.70)
227
0.35
11.11
-3.88
(2.80)
227
0.35
11.09
-6.47
(3.46)
227
0.35
11.08
-4.80
(3.53)
227
0.36
11.07
3.68
(2.83)
9
0.02
4.89
-2.16
(4.58)
22
-0.04
12.97
-3.01
(3.13)
43
-0.01
12.17
-4.34
(4.11)
181
0.32
11.38
-3.91
(3.73)
181
0.31
11.41
-0.51
(4.31)
181
0.32
11.37
1.77
(4.60)
181
0.32
11.34
-1.35
(4.97)
181
0.32
11.35
1.65
(5.20)
13
-0.08
9.94
-0.49
(3.71)
20
-0.05
8.71
-3.36
(3.87)
28
-0.01
10.66
-8.11
(4.61)
90
0.18
13.44
-6.51
(4.19)
90
0.19
13.35
-6.08
(4.24)
90
0.20
13.28
0.28
(4.95)
90
0.22
13.16
-0.42
(4.85)
90
0.21
13.21
-22.19
(9.84)
11
0.20
19.04
-17.85∗
(8.27)
17
0.17
17.70
-9.39
(6.38)
27
0.03
18.50
-14.47∗
(6.43)
79
0.16
13.14
-16.79∗
(6.40)
79
0.15
13.19
-21.38∗
(7.74)
79
0.17
13.03
-22.00∗
(8.43)
79
0.16
13.10
-27.83∗
(10.28)
79
0.17
13.07
State Level:
Red Label (ref: Yellow Label)
N
adj. R2
Resid. sd
Yellow Label (ref: Green Label)
N
adj. R2
Resid. sd
The same set of controls variables are included as in table 1 and 2.
OLS regressions - I: Linear, II: Linear with interaction, III: Quadratic, IV: Cubic, V: Fourth order polynomial.
Robust standard errors in the parentheses. ∗∗ p < 0.01 and ∗ p < 0.05 (both two-tailed tests).
17
mostly undocumented. Still, a lot of policy makers believe in the wonders of performance
disclosure and how pointing to areas of improvements by itself has beneficiary effects. This
article tested this powerful line of thought with data from Danish gender equality reports.
The data allowed me to separate reputational effects from other mechanisms through which
performance information can affect organizational performance (i.e. formal sanctions or rewards, and threats of exit or voice). In addition, a novel regression discontinuity design made
it possible to make causal claims about the (non-) effect of reputational damage by reducing
the possibility of confounding factors and regression to the mean effects.
The findings showed that the effect of potential reputational damage on subsequent performance improvements were limited to non-existent. Organizations which narrowly received
poor performance results did not improve more in their subsequent performance compared
to organizations which narrowly received much more favorable performance feedback. This
leads to a rejection of the powerfully simple and highly policy-relevant hypothesis that reputational damage induced by naming the poor performing organizations should affect future
organizational efforts to improve.
However, one have to consider possible alternative explanations for the non-effects of disclosing poor performance. The question relates to how the empirical setting limits the scope
of our findings. As noted earlier, the case of the Danish gender reports is to be considered
a hard test of the potential performance enhancing effects of reputational damage. Intraorganizational gender equality and gender policy are relatively low-salience issues. If I had
encountered strong effects, it would indicate that pure reputational damage could be effective
at encouraging poorly performing organizations to perform better in even low-salience settings. The low-salience character of the case allowed us to isolate the reputational concerns
from indirect and formal concerns of exit, voice, or punishments. However, the low-salience
status may also have affected the media attention and public dissemination of the performance results. Performance information is in a constant competition for attention with other
forms of feedback which organizations receive. Under bounded rationality attention is limited
and organizations are therefore not equally responsive to all types of feedback they get from
the environment (Cyert and March 1963). Low salience can also lower the reputational costs
of poor performance. If the reputational costs of being named and shamed are particularly
low, managers will tend to ignore it and focus their efforts on other matters (Hood and Dixon
2010).
Both the lack of organizational attention and low costs point to the importance of disseminating the results of performance information to the public. The politics of organizational
attention is shaped by the media attention given to policy issues (Baumgartner and Jones
2005). In fact, if one searches print and online news coverage of the gender policy reports,
18
the public dissemination of the results turns out to be fairly limited. In the Danish media
database ’Infomedia’, which contains all major print and online news outlets, one may search
for specific key words within a given time periods. The results for local and county governments were released on the 10th of December 2003, and the state results were released about
a month later. From the 10th of December 2003 to 1st of March 2004 around 694 articles
mentioned gender equality in some form. However, only 11 of them referred explicitly to
the gender equality reports and only six articles referred to the online address of the gender
equality reports (i.e. www.ligestillingidanmark.dk).
This very limited media attention to the issue of the gender equality reports shows that low
reputational costs may account for the non-effects. While the poorly performing organizations
were disclosed to the public and thereby “named” for their poor performance, they were not
to any larger extent “shamed” or “blamed” by the media or the wider public. This is a weak
point for our conclusions as current studies, which find strong support for reputational effect,
all emphasize the importance of wide public dissermination of the results (Hibbard et al.
2003, 2005).
In addition, one can also think of a few other reasons for the non-effect of disclosing poor
performance. For instance, organizations may have relied on the blame avoiding strategy
of circling the wagons (Weaver 1986). Here organizations all cover their neck and provide
political cover by ignoring potential blame attribution. One could say that putting efforts
into changing a poor outcome is indirectly a way of acknowledging that there is a problem,
which by itself can attach blame. Staying under the radar and doing nothing, in case of
potential blame attribution, may be a better choice than engaging in improvement efforts.
Also, Pawson (2002) has put forward the idea that naming can fail if organizations either
passively accept the label they have been given, ignore it, or reject it.
In summary, while most studies focus on either the potential performance improvements
or the unintended consequences caused by performance information, one should perhaps
increasingly look at the extent to which performance information is acted upon or ignored by
organizations. The main conclusion of this study is that we should not automatically expect
the disclosure of performance information to affect organizational change efforts. Disclosing
performance information is by itself no quick fix to improving organizational performance.
References
Baumgartner, F. R. and B. D. Jones (2005). The politics of attention: How government
prioritizes problems. Chicago: The University of Chicago Press.
Berk, R., G. Barnes, L. Ahlman, and E. Kurtz (2010). When second best is good enough:
19
a comparison between a true experiment and a regression discontinuity quasi-experiment.
Journal of Experimental Criminology 6 (2), 191–208.
Besley, T. J., G. Bevan, K. B. Burchardi, and C. for Economic Policy Research Great Britain
(2009). Naming & Shaming: The impacts of different regimes on hospital waiting times in
England and Wales. Centre for economic policy research.
Bevan, G. and R. Hamblin (2009). Hitting and missing targets by ambulance services for
emergency calls: effects of different systems ofperformance measurement within the UK.
Journal of the Royal Statistical Society: Series A (Statistics in Society) 172 (1), 161–190.
Bevan, G. and C. Hood (2006). What’s measured is what matters: targets and gaming in
the English public health care system. Public administration 84 (3), 517–538.
Bird, S. M., C. David, V. T. Farewell, G. Harvey, H. Tim, and Peter (2005). Performance
indicators: good, bad, and ugly. Journal of the Royal Statistical Society: Series A (Statistics
in Society) 168 (1), 1–27.
Boyne, G. A., O. James, P. John, and N. Petrovsky (2009). Democracy and Government
Performance: Holding Incumbents Accountable in English Local Governments. The Journal
of Politics 71 (04), 1273–1284.
Brandeis, L. D. (2009/1914). Other People’s Money - and How Bankers Use It. Mansfield
Centre, CT, USA: Matino Publishing.
Burgess, S. and M. Ratto (2003). The role of incentives in the public sector: Issues and
evidence. Oxford review of economic policy 19 (2), 285–300.
Carpenter, D. P. and G. A. Krause (2012). Reputation and Public Administration. Public
Administration Review 72 (1), 26–32.
Cyert, R. M. and J. G. March (1963). A behavioral theory of the firm. Englewood Cliffs,
New Jersey: Prentice-Hall International Series in Management and Behavioral Sciences in
Business Series.
Dixon, R., C. Hood, and L. R. Jones (2008). Ratings and rankings of public serviceperformance: Special issue introduction. International Public Management Journal 11 (3),
253–255.
Downs, A. (1967). Inside bureaucracy. Boston, MA: Little, Brown and Co.
Dutton, J. E. and J. M. Dukerich (1991). Keeping an Eye on the Mirror: Image and Identity
in Organizational Adaptation. The Academy of Management Journal 34 (3), 517–554.
Dutton, J. E., J. M. Dukerich, and C. V. Harquail (1994). Organizational Images and Member
Identification. Administrative Science Quarterly 39 (2), 239–263.
Figlio, D. and C. Rouse (2006). Do accountability and voucher threats improve lowperforming schools? Journal of Public Economics 90 (1-2), 239–255.
20
Fisse, B. and J. Braithwaite (1983). The impact of publicity on corporate offenders. Albany:
State University of New York Press.
Green, D. P., T. Y. Leong, H. L. Kern, A. S. Gerber, and C. W. Larimer (2009). Testing the
Accuracy of Regression Discontinuity Analysis Using Experimental Benchmarks. Political
Analysis 17 (4), 400–417.
Greve, H. R. (1998). Performance, Aspirations, and Risky Organizational Change. Administrative Science Quarterly 43 (1), 58–86.
Hanushek, E. A. and M. E. Raymond (2005). Does school accountability lead to improved
student performance? J. Pol. Anal. Manage. 24 (2), 297–327.
Heckman, J., C. Heinrich, and J. Smith (1997). Assessing the Performance of Performance
Standards in Public Bureaucracies. The American Economic Review 87 (2), 389–395.
Heinrich, C. J. (2002). OutcomesBased Performance Management in the Public Sector:
Implications for Government Accountability and Effectiveness. Public Administration Review 62 (6), 712–725.
Hemelt, S. W. (2011). Performance effects of failure to make Adequate Yearly Progress
(AYP): Evidence from a regression discontinuity framework. Economics of Education
Review 30 (4), 702–723.
Hibbard, J. H., J. Stockard, and M. Tusler (2003). Does Publicizing Hospital Performance
Stimulate Quality Improvement Efforts? Health Affairs 22 (2), 84–94.
Hibbard, J. H., J. Stockard, and M. Tusler (2005). Hospital Performance Reports: Impact
On Quality, Market Share, And Reputation. Health Affairs 24 (4), 1150–1160.
Hirschman, A. O. (1970). Exit, voice, and loyalty: Responses to decline in firms, organizations, and states. Cambridge, MA: Harvard University Press.
Hood, C. (2007). What happens when transparency meets blame-avoidance? Public Management Review 9 (2), 191–210.
Hood, C. and R. Dixon (2010, July). The Political Payoff from Performance Target Systems:
No-Brainer or No-Gainer? Journal of Public Administration Research and Theory 20 (suppl
2), i281–i298.
James, O. (2011). Performance Measures and Democracy: Information Effects on Citizens
in Field and Laboratory Experiments. Journal of Public Administration Research and
Theory 21 (3), 399–418.
James, O. and P. John (2007). Public Management at the Ballot Box: Performance Information and Electoral Support for Incumbent English Local Governments. Journal of Public
Administration Research and Theory 17 (4), 567–580.
21
Johnsen, A. (2005). What Does 25 Years of Experience Tell Us About the State of Performance Measurement in Public Policy and Management? Public Money & Management 25 (1), 9–17.
Johnsen, A. (2012). Why Does Poor Performance Get So Much Attention in Public Policy?
Financial Accountability & Management 28 (2), 121–142.
Kahneman, D. and A. Tversky (1979). Prospect theory: An analysis of decision under risk.
Econometrica: Journal of the Econometric Society 47 (2), 263–291.
Lange, D., P. M. Lee, and Y. Dai (2011). Organizational reputation: A review. Journal of
Management 37 (1), 153–184.
Lau, R. R. (1982). Negativity in Political Perception. Political Behavior 4 (4), 353–377.
Lavy, V. (2009). Performance Pay and Teachers’ Effort, Productivity, and Grading Ethics.
American Economic Review 41 (5), 1979–2011.
Lavy, V. (2010). Effects of free choice among public schools. Review of Economic Studies 77 (3), 1164–1191.
Le Grand, J. (2010). Knights and Knaves Return: Public Service Motivation and the Delivery
of Public Services. International Public Management Journal 13 (1), 56–71.
Levinthal, D. and J. G. March (1981). A model of adaptive organizational search. Journal
of Economic Behavior & Organization 2 (4), 307–333.
Linden, A., J. L. Adams, and N. Roberts (2006). Evaluating disease management programme
effectiveness: an introduction to the regression discontinuity design. Journal of Evaluation
in Clinical Practice 12 (2), 124–131.
Lindenauer, P. K., D. Remus, S. Roman, M. B. Rothberg, E. M. Benjamin, A. Ma, and D. W.
Bratzler (2007). Public reporting and pay for performance in hospital quality improvement.
New England Journal of Medicine 356 (5), 486–496.
Mannion, R., H. Davies, and M. Marshall (2005). Impact of star performance ratings in
English acute hospital trusts. Journal of Health Services Research & Policy 10 (1), 18–24.
March, J. G. and H. A. Simon (1958). Organizations. New York: Wiley.
Mason, A. and A. Street (2006). Publishing outcome data: is it an effective approach? Journal
of evaluation in clinical practice 12 (1), 37–48.
Mayston, D. J. (1985). Non-profit performance indicators in the public sector. Financial
Accountability & Management 1 (1), 51–74.
McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity
design: A density test. Journal of Econometrics 142 (2), 698–714.
22
Olsen, A. L. (2012). Regression Discontinuity Designs in Public Administration: The Case
of Performance Measurement Research. Paper presented at the ECPR Joint Sessions,
Workshop 5: Citizens and Public Service Performance: Demands, Responses and Changing
Service Delivery Mechanisms, Antwerp, 11-15 April 2012.
Pawson, R. (2002). Evidence and policy and naming and shaming. Policy Studies 23 (3),
211–230.
Pawson, R., T. Greenhalgh, G. Harvey, and K. Walshe (2005). Realist review: a new method
of systematic review designed for complex policy interventions. Journal of health services
research & policy 10, 21–35.
Propper, C., M. Sutton, C. Whitnall, and F. Windmeijer (2010). Incentives and targets in
hospital care: evidence from a natural experiment. Journal of Public Economics 94 (3-4),
318–335.
Shadish, W. R. (2011). Randomized Controlled Studies and Alternative Designs in Outcome
Studies. Research on Social Work Practice 21 (6), 636–643.
Sitkin, S. B. (1992). Learning through failure: The strategy of small losses. Volume 14 of
Research in organizational behavior, pp. 231–266. Greenwich, CT: JAI PRESS LTD.
Soroka, S. N. (2006). Good News and Bad News: Asymmetric Responses to Economic Information. Journal of Politics 68 (2), 372–385.
Sutton, R. I. and A. L. Callahan (1987). The stigma of bankruptcy: Spoiled organizational
image and its management. Academy of Management Journal 30 (3), 405–436.
Thistlethwaite, D. L. and D. T. Campbell (1960). Regression-Discontinuity Analysis: An
Alternative to The Ex Post Factor Experiment. The Journal Of Educational Psychology 51 (6), 309–317.
Van de Walle, S. and A. S. Roberts (2008, November). Publishing Performance Information:
An Illusion of Control? Working paper.
Weaver, R. K. (1986). The Politics of Blame Avoidance. Journal of Public Policy 6 (4),
371–398.
23
Appendix: Map of the results from the Gender reports of 2003.
24
Download