QUESTIONS OF EFFICIENCY ASSESSMENT IN PUBLIC SERVICES:

advertisement
QUESTIONS OF EFFICIENCY ASSESSMENT IN PUBLIC SERVICES:
IS THE STATE OF THE ART REALLY “STATE OF THE ART”?
ABSTRACT: We revisit and reformulate the seminal 1957 Series A paper of M.J.Farrell, and look at how it was
“generalised” as “data envelopment analysis” (DEA). The closely related technique of “stochastic frontier regression”
(SFR) is also reviewed, and both techniques are subjected to some reality tests for the specific problem of police force
efficiency assessment. The possibility or necessity of a less technical approach based on explicit value judgements is
considered: we call this “the third way” (TTW). The methods are pedagogically compared on a small set of data
from 21 Spanish law courts. The ground is prepared for their comparative & realistic application to the massive
amount of data recorded by the 43 police forces of England & Wales.
PREFACE
[These notes were prepared by Mervyn Stone as background material for the meeting of the Official Statistics Section of the Royal
Statistical Society on July 19th MM.]
The topic of this meeting falls between academic study & political action, and is therefore inevitably subject to
governmental influences of one sort or another. However, if government is to get the sort of advice that will be in
the public interest, it is vital that the topic be explored openly (in the best traditions of this Society) without fear
of political disfavour and without the complementary tendency to “hype” any particularly viewpoint for the benefit
of administrators who do not have the (increasingly rare) academic luxury of time to analyse apparent complexity.
For instance, these notes will not include either an “Executive Summary” or an appeal to authority of the sort that
graces the Spottiswoode (2000) report. They aim rather to pose questions that may induce total scepticism about
the approach recommended in that report, and preferably a disposition to favour some (if not “the”) “third way”.
The problem with appeal to authority lies in finding where “authority” lies: the Spottiswoode report says
“There was generally consistent advice from leading experts in the fields of economics and econometrics having practical
experience in efficiency measurement”
The report does not make clear that, even among micro-economists, opinions about the practical usefulness of
the techniques thus recommended are as inconsistent as can be—with some leading micro-economists being quite
dismissive about them. These notes will not end with a set of “conclusions” suggestive of some claim to authority:
it is better for the reader to sample the text than to rely on pruned & potted comments that could not accurately
convey a sense of the whole matter, just as reading a recipe is usually no substitute for tasting the dish. Readers
with the least time to spare might read Section 1 and then go straight to the Index or References to follow up items
that tempt their palate. Those with more time, but not enough to read the whole paper, should omit Sections 2.11,
2.12, and Section 3’s exposition of the stochastic frontier approach (since econometricians are very divided about the
latter’s value especially for the case of multiple outputs). For those for whom the topic is a necessary but rather dull
exercise, the informal status of these notes permits an occasional levity going beyond what would be acceptable in
any of the Society’s journals.
1
1. A GENERAL FRAMEWORK FOR THE GENERAL PROBLEM.
Given:
• A cross-sectional data-base from n operationally independent units, all engaged in the same range of productive
activities over the same time period [eg. the 21 High Courts of the Spanish Administrative Litigation Division in 1991, or
the 43 police forces of England & Wales in 1999. There will not be time or space to consider the use of potentially more informative
longitudinal (“panel”) data.]
• Measurements of r + s + t variables on each unit:
– inputs x = (x(1), ..., x(r))
– outputs y = (y(1), ..., y(s))
– environmentals z = (z(1), ..., z(t))
– data D = {(xi , yi , zi ); i = 1, ...n}
– (x,y,z) is the performance vector of a generic unit g in D
The problem is:
To generate from D an acceptably realistic assessment of the efficiency of an individual unit, with respect
to the totality of internal (undocumented) activities that produce y from x in an environment characterised
by z. The “holy grail” is the production of a single measure of efficiency for each unit, but a realistic
assessment may require more than one measure supplemented with a number of quality assurance indices.
No attempt will be made here to treat this problem in full generality [whatever that means] for the good reason that
an abstract general approach is unlikely to be sufficiently responsive to the special features of a practical realization.
There are however some relevant general considerations:
(i) Inputs are controllable variables that are “costs” for a unit in the generation of the outputs eg. salaries, capital
costs & depreciation. Most of this paper will be concerned with the case where multiple inputs can be individually
costed and aggregated into a single cost input x: in which case, “inefficiency” necessarily includes the “allocative
inefficiency” from non-optimal allocation of x to its individual components,
(ii) Outputs should be comprehensive and include everything that can be given a value, positive or negative, in the
outcomes of the activity associated with the inputs. [Most formulations of DEA consider only outputs that have positive value,
or that can be apparently transformed into positivity.] The degree of subdivision of outputs into different categories has
to be related to the extent to which there are associated differences in value. Such subdivision—so that outputs
are separately identifiable and weighted in the efficiency formula [for those techniques for which there is a formula to permit
this!]—may be necessary for the informed cooperation and motivation of units in improving efficiency. We will
distinguish between volumetric and non-volumetric outputs: a volumetric output measures the amount or quantity
of a particular output, whereas a non-volumetric output typically provides a numerical measure of quality.
(iii) It is desirable that any efficiency measure should be derived by a widely comprehensible method, and that, in
any particular application, its acceptability should be determinable by the intrinsic character & properties of the
measure (i.e. from the form of its functional dependence on x, y & z)— without appeal to other criteria such as the
number of Ph.D.s written about the method or the number of claims that “insight” may be gained into the problem
by its application.
(iv) There are broadly two types of efficiency measure. An intrinsic measure is one that could be calculated from
the generic unit g’s performance (x,y,z) alone, independently of the performances of the other n − 1 units in D. An
interactive measure is one whose determination is influenced by the other units’ performances, effectively positioning
the generic unit in the geometry of the n performance vectors in D.
2
(v) There are difficult questions in how to take account of environmentals z such as geography, social mix, unemployment level or other socio-economic factors. One approach is simply to ignore environmentals, while recognizing
that the efficiency measures may be influenced by them. A second approach is to stratify the n units into smaller
groups in each of which the environmental variables considered important do not vary very much. Another is to
try to allow for the influence of environment by adjustment of the input or output measures themselves, before
constructing any measure of efficiency. Yet another method is to adjust only an undoctored measure of efficiency
itself. Before considering such questions, we need to sort out the theoretical & practical difficulties that arise even
when environmentals are ignored. So we will initially exclude environmentals and take performance to be assessable
on the basis of inputs and outputs alone. The performance vector of our generic unit g is then simply (x, y), and we
abuse notation with the identification g = (x, y) .
Apart from such general & somewhat imprecise considerations, it can be argued that there is little to guide construction of efficiency measures apart from an intuitive sense of what might be judged appropriate for particular
realisations of the problem. It will therefore be necessary to use the example of the 43 police forces of England &
Wales (43PFs, for short) to test the realism of our thinking at every stage of the argument. However, the problem
with intuition is that it is a very personal matter. Those who have developed and applied the currently favoured
techniques for dealing with the above general problem [techniques described by some as “state of the art”] may well have been
motivated by well-formed intuitions. Since the conclusions of this paper will probably clash with these intuitions,
the issue had better be put as a question for an uncommitted intuition. . . one that touches deeply the philosophical
battle in the social sciences between empirical method and value-based approaches i.e whether facts can be separated
from values (Jarvie, 1985):
Can one really believe that there are self-defining indicators of efficiency—functions of D alone and
determinable without reference to context —that can be straightforwardly extracted or “measured” by
some almost mechanical technique not significantly influenced either by prices of inputs when r > 1, or
by value judgements on outputs when s > 1?
[ Intuition does not work in a vacuum. For the 43 PFs in Appendix 2 Fig.11, it is helpful to know that the following inputs, outputs
& environmentals have been considered for entry into D. The outputs & first-listed environmentals are those selected by Spottiswoode
(2000) from a much wider field.
Inputs:
• staff costs
• operating costs
• consumption of capital costs.
Outputs:
• recorded crimes
• percentage of recorded crime detected
• domestic burglaries
• violent crimes
• theft of & from motor vehicles
• number of offenders dealt with for supplying Class A drugs
• public disorder incidents
• road traffic collisions involving death or serious injury
• level of crime (British Crime Survey)
• fear of crime (ditto)
• feelings of public safety (ditto)
Most of these outputs appear, among many more, in the listing & discussion of “best value performance indicators” in the consultation
document DETR (1999).
Environmentals:
3
• number of young men
• stock of goods available to be stolen
• changes in consumer expenditure.
These three environmentals were given as examples. The complexity of the environmentals problem is indicated by the fact that the
following ones have already been used in the Police Funding Formula (PFF; see Appendix 2) to determine the money thought appropriate
for police forces with different environmentals.
PFF environmentals:
• resident population
• daytime population
• population in terraced housing
• population in Class A residential neighbourhoods
• “striving” areas
• population in one-parent families
• households with only one adult
• households in rented accommodation
• population at a density of more than one per room
• population density
• sparsity of population
• length of built-up roads
• length of motorways.]
There is a burgeoning literature about techniques, such as Data Envelopment Analysis (DEA) & Stochastic Frontier
Regression (SFR) (eg. Schmidt, 1985; Norman & Stoker, 1991; Cooper, 2000; Khumbakar & Lovell, 1999). These
two approaches aim or claim to contribute in a major way to the solution of the above problem in its diverse
manifestations. DEA does more: it induces in its users a sense of freedom from the subjectivity of value judgement
& arbitrary specification that is well-expressed in the following quotation from Cooper et al (2000):
“In addition to avoiding a need for a priori choices of weights, DEA does not require specifying the form of the relation
between inputs and outputs in, perhaps, an arbitrary manner and, even more important, it does not require these relations
to be the same for each [unit].”
Not all supporters of DEA are as committed. The paper of Pedraja-Chaparro et al confirms many of the welldocumented concerns about the practical usefulness of DEA. However its attempt to understand, by simulation of
data sets from a known model, how they might be dealt with is of limited interest for multiple output efficiency
studies. (Their model is for multiple error-free inputs and a single output, so it does not treat the problems raised
in Section 4 for SFR.)
The literature on DEA is rooted in the seminal econometric work of Farrell (1957) which will now be considered in
slightly amended form.
4
2. FARRELL’S ECONOMETRIC APPROACH.
2.1. GENERAL DESCRIPTION.
At the heart of the problem is the difficulty, in the applications we have in mind, of comparing any two performances
(x,y) & (x0 , y0 ). How can we (pessimistically) judge whether or not (x,y) is worse than (x0 , y0 )? For “positive”
outputs (“the more the better”), there are two closely related criteria (expressions of “Pareto optimality”) that
no-one is likely to dispute:
(x,y) is certainly worse than (x0 , y0 ) if (a) x ≥ x0 & y≤ y0 and (b) x> x0 or y< y0
[where x ≥ x0 means x(j) ≥ x0 (j), j = 1, ..., r, and x> x0 means x(j) > x0 (j), j = 1, ..., s, and likewise for the reverse
inequality.]
If condition (b) is not imposed, we have a weaker criterion—that of worse without the certainty.
The condition x > x0 or y < y0 means that we are not prepared to say that (x,y) is certainly worse than (x0 , y0 ) if
only some of the inputs of (x0 , y0 ) are increased and only some of the outputs of (x0 , y0 ) are decreased.
Weak though it may be, the first criterion is immediately useful in defining a superior subset of any given set S of
performance vectors (points for short):
The efficiency frontier of S is the subset F of points that are not certainly worse than any other point in
S.
The criteria do not tell us how much worse (x,y) is i.e. they do not provide a measure of the relative efficiency of
(x,y) & (x0 , y0 )—except in one very specialized comparison that is at the heart of Farrell’s approach. Suppose that
(x,y) is proportionally worse than some different (x0 , y0 ) [and (x0 , y0 ) is proportionally better than (x,y)] in the sense that
x0 = cx for c ≤ 1
y0 = dy for d ≥ 1.
It is then reasonable to define the efficiency of (x,y) relative to (x0 , y0 ) as the ratio c/d. For, if it were agreed that
the intrinsic efficiency of (x,y) should be given by an index ratio of the type
veff = [v1 y(1) + ... + vs y(s)]/[u1 x(1) + ... + ur x(r)],
it is immediate that
[veff for (x, y)]/[veff for (x0 , y0 )] = c/d,
whatever the values of the weight vectors u & v.
[veff might be sensible if the y’s and x’s were volumetric and the v’s & u’s
In this way, we could compare the efficiency of a generic unit g
in D with that of another unit—provided one of the two was proportionally worse than the other. The problem is
that this necessary condition cannot be expected to hold for any pair of units in D, even approximately, since the n
points in D, in typical applications, can be expected to be well-distributed in their r + s dimensional space, when
the ratio (r + s)/n (an index of their sparseness in the region that they span) is not small. [For 43PFs, we can expect
were values & costs/prices per unit volume respectively.]
(r + s)/n to be of the order of 20/43.]
Farrell’s econometric approach creatively enlarges D to a continuum, C say, in which new points are constructed
by specified procedures starting with the discrete “scattergram” that is D. These points are to be regarded as nonexistent but, at least hypothetically, feasible performances: they are sometimes referred to as the “technology set”.
There will be a (perhaps empty) subset, Cg say, of points (cx, dy) of C, other than g = (x, y), that are proportionally
5
better than g. When Cg is not empty, the ratio c/d can therefore be calculated for each of its points. The technical
efficiency (teff for short) of g can then be straightforwardly defined as the minimum value of c/d for points in Cg i.e.
the generic unit g is to be compared with any performance in Cg that gives this minimum. For the possibility that
Cg is empty [i.e. there is no (x0 , y0 ) in C proportionally better than g = (x, y)] the teff of g is quite properly taken
to be unity or 100%: in this case, g is on the frontier F and is called a frontier unit.
That, when Cg is not empty, the minimising point or points lie on the efficiency frontier F of the continuum C may
be seen by reductio ad absurdum, with just one condition on the construction of C—that, if a point is in C, so are
all points certainly worse than it [this condition is satisfied if the construction of C uses procedure (i) below]:
Suppose a minimising point in Cg , (cx, dy) = (x0 , y0 ), say, were not in F . Then (x0 , y0 ) would be certainly
worse than some point (x00 , y00 ) in C, with x0 ≥ x00 , y0 ≤ y00 and either x0 > x00 or y0 < y00 . If x0 > x00 ,
there exists some c0 < c such that x0 > c0 x > x00 , whence (c0 x, dy) is certainly worse than (x00 , y00 ) and is
therefore in C, and certainly better than (x0 , y0 ) and therefore in Cg . [An analogous consequence for a point
(cx, d0 y) with d0 > d would follow if y0 < y00 .] The inequality c0 /d < c/d [or c/d0 < c/d] then contradicts the
initial supposition.
A simpler version of this argument shows that, when Cg is empty, g itself has to be on the efficiency frontier F .
2.2. CONSTRUCTION OF THE SET C OF FEASIBLE PERFORMANCES AND ITS EFFICIENCY FRONTIER
F.
The feasible points are taken to include the n points in D [which do exist!]. Starting with them, the continuum C is
progressively constructed by three procedures, (i), (ii), & (iii), for each of which there is
some justification:
[for most of what Farrell does]
(i) WORSENING—If (x, y) is feasible, so is any point worse (a fortiori certainly worse) than (x, y). Worsening
creates the feasible set
{(x0 , y0 ) : x0 ≥ x & y0 ≤ y}.
[Justification rests on the truism that inefficiency can always manage to achieve smaller outputs with larger inputs!]
(ii) RESCALING—If (x, y) is feasible, so is (cx, cy) for any scale factor c.
[Farrell justifies this by the idea of “constant returns to scale” (CRS) —an econometric concept attractive to economists concerned with
industrial production, where it is usually invoked for the case of multiple inputs & a single output for which the idea of a “technical”
production function makes sense. The idea of CRS is less obviously relevant to cases with multiple outputs involving complex sociological
interactions and inefficiencies (our main concern). But, if CRS is deployed in such cases, there are at least two distinct ways of doing so:
• An actual unit g = (x, y) would display CRS if it were to change all its inputs & outputs by the same scale factor c.
• An efficiency frontier F satisfies the CRS condition if (x0 , y0 ) & (cx0 , dy0 ) both in F implies c = d.
Applied (in the interests of efficiency assessment) to the generic unit g with performance vector (x, y), the concept creates, by hypothetical
resizing (up or down), a set {(cx, cy); c > 0} of role models (good or bad!) for other units. For 43PFs, it would claim that the very
existence of a police force with performance vector (x, y) means that the feasibility (potential existence) of a police force with performance
vector (cx, cy) (for any c > 0) should be admitted for the purpose of assessing efficiency. This is a subtle idea and not one to be cynically
dismissed as a cheap extension of the data set D. Farrell refers to CRS as an “assumption” but (with 43PFs in mind) we see it here more
as a purposive & perhaps reasonable device for creating hypothetical units to be used in efficiency assessment. The “assumption” would
be better termed an acceptance, for the purposes of efficiency assessment, of the frontier F (constructed with CRS-motivated rescaling)
as if it were a realistic approximation to a true limiting efficiency frontier satisfying the econometrically attractive property of constant
returns to scale.
Application of the CRS concept also has implications for the choice of input & output variables, which we consider in Section 2.9 as far
as outputs are concerned. ]
(iii) MIXING—If (x0 , y0 ) & (x00 , y00 ) are feasible, so is the affine combination
6
(ax0 + (1 − a)x00 , ay0 + (1 − a)y00 ) for 0 < a < 1
on the straight line between the two points.
[One justification of mixing involves CRS: scale (x0 , y0 ) down to (ax0 , ay0 ), and (x00 , y00 ) down to ((1 − a)x00 , (1 − a)y00 ), and suppose that
these two performances can be added without any interaction. Another is to think of mixing as simply a perhaps reasonable interpolation
between two actual units. Whatever the justification, mixing ensures that the set of feasible performance vectors has the mathematically
convenient property of convexity i.e. the set C includes its own affine combinations.]
The construction of C is a purely conceptual stage in the definition of the frontier F . Only F matters as far as the
definition of the efficiency measure teff is concerned: F could have been constructed more directly by the rescaling,
mixing & worsening of the frontier units alone—if only we had been able to identify them in advance.
2.3. RESTRICTION TO A SINGLE INPUT.
The generalities of the last section call for simple illustration. Simplicity will be aided by immediate & continuing
restriction to what we have in mind for application of our alternative approach—problems with a relatively uncontentious single input. The data D will now be the (s+1)-dimensional performance vectors (xi , yi (1), ..., yi (s)), i =
1, ..., n, generically g = (x, y).
[For the 43PFs assessment, a single input x could be provided by the newly available measure of total costs called “Resource Accounting
& Budgeting”—the only real arguments being whether to include the cost of pensions (an appreciable element of police force expenditure
not controllable in the interests of efficiency of the current year) and the London Allowance. Aggregation of the three components of x
(see Section 1) does not rule out posterior analysis & interpretation of any relationship uncovered between an efficiency measure and the
way that x is constituted from these components. ]
2.4. ONE OUTPUT.
We will start with the simplest case of all, that Farrell does not deign to mention: just one input & one output!
[For
43PFS, can it be realistic to be concerned with such a simple case? This question can be answered affirmatively if it can be supposed that
an aggregation y of estimated values of all the potentially separable outputs of a unit’s activities could be devised—by a combination of
judgement & political will. This paper will argue that this supposition may be a necessity in applications to organizations of the kind
represented by 43PFs. The argument can only be decided by careful inspection of techniques that, as if by magic, appear to dispense
with the need for value judgement. If it were possible to agree easily on a single variable that managed to evaluate the various outputs
of police force activity, there would have been no need for this meeting of the Official Statistics Section.]
The obvious estimate of efficiency for single input & output is the intrinsic measure veff = y/x. Even though it
may be thought of as the amount of value (in y) per unit of cost (in x), veff does not explicitly depend on the CRS
concept.
How does veff relate to the teff measure produced by the Farrell approach when the continuum C is produced by
a combination of worsening & rescaling? [The low dimensionality of the case (r = 1, s = 1) makes mixing redundant once you
allow rescaling.] Fig.1 is almost self-explanatory. [The axes are unconventionally labelled for a reason that will appear]. The
continuum C can in this case be created by rescaling the unit m that has the largest value of veff and then worsening
the rescaled points, and Cg is the subset {(cx, dy) : c ≤ 1, d ≥ 1} of C. The rescaling ray through unit m is the
efficiency frontier F . All the feasible points in the interval f f 0 give the minimum value teff of c/d, which is also
X/x & y/Y . [Note that the low dimensionality also renders vacuous the “proportional” in the definition of Cg .]
We then have teff = X/x = y × min{xi /yi }/x = veff/ max{veff i }.
[The case for thinking about the data in its unreduced, two-dimensional form is that we thereby retain an awareness of size of unit, which
would be lost if we were to start with the ratio y/x. Nothing essential is lost by doing so. Realistic applications will take us into many
more than three dimensions: one extra dimension is neither here nor there.]
Farrell takes seriously the possibility that the CRS motivation may not be reasonable as far as the shape of the
efficiency frontier is concerned. He claims that it is “more difficult to relax the assumption” than to deal with the
case of multiple outputs. [If only that were so (in a non-technical sense)—some may think!]
7
Figure 1: A hypothetical scattergram for n = 4, r = 1, s = 1: estimation of teff for the generic unit g with input x
& output y, using worsening & rescaling to create comparison performances.
8
Figure 2: The case of Fig.1 without rescaling.
Fig.2 shows how Fig.1 changes when the comparison continuum C is created without using CRS-motivated rescaling
but only worsening & mixing (the latter no longer redundant) and with the n units augmented to include the origin O
regarded as a hypothetical unit with zero input & output. [This use of the origin is a bit of covert rescaling.] The superficial
change from Fig.1 to Fig.2 is that F is now the least-efficient convex frontier with the property of decreasing returns
to scale (DRS), that is consistent with the data D. We therefore agree with Farrell when he says that it is “quite
simple to allow for diseconomies of scale” (i.e. DRS), at least in this particular case. [The frontier line can be thought of
as what you would get if the y-axis were a flexible string and were pulled up tightly against the frontier points. It is clear that y/x can
never increase with x along such frontier lines. The feasibility of the units on the vertical line joining the unit with the most output to
the “point at infinity” on the x-axis is based not on mixing but on worsening.] The less superficial change is that teff (defined
as the minimum of c/d in Cg ) is now given, at the point f , by the value x/X, which is less than the value y/Y at f 0 .
In his Section 2.4, Farrell talks about these two values as distinct options: one corresponding to input minimization
holding output constant, the other to output maximization holding input constant. He appears to favour the first,
which agrees with our definition of teff as the minimum of a continuum of values of c/d on the frontier.
2.5. TWO OUTPUTS.
The alternative geometry for the case (r = 1, s = 2) will take us to three dimensions. So it is a good thing that
three dimensions can portray, with a little imagination, all but one of the essential features of the quite undrawable
9
Figure 3: Estimation of the technical efficiency of the generic unit g with input x and outputs y(1) & y(2), using
worsening, mixing, and rescaling.
geometry of the construction procedure for the typical application, in which s+1 will exceed the artistic limit of
three.
[The “simple case” of Farrell was (r = 2, s = 1) which lies outside our restriction to the single input case. It is therefore of passing
interest only that the full geometry of (r = 2, s = 1) is different from that of (r = 1, s = 2): there is a fundamental asymmetry between
the two cases that makes a difference. The asymmetry arises because worsening involves either decreasing outputs or increasing inputs:
one cannot simply exchange r with s, and input with output. The asymmetry involves the definition of Farrell’s “points at infinity” and
their analogues for (r = 1, s = 2): the difference may have been overlooked by Farrell because his sole application was to a problem
with s = 1—agricultural output of the 48 states in 1957 America with 4 input variables. It should, however, have been elucidated in the
general algebraic treatment for (r > 1, s > 1).]
Fig.3 shows the typical final step in the construction of F for two outputs, whose stages will first be described in
simple mechanical terms without explicit reference to mixing or worsening:
(i) Rays are drawn from the origin to infinity through each of the n points representing the performances of the n
units. These rays are the continua of feasible points created by rescaling. Fig.3 depicts the case where there are at
least two frontier units i & j. Three extra infinite rays are drawn: the x-axis itself [corresponding to a Farrell “point at
10
and two side-rays in the planes y(2) = 0 & y(1) = 0. These side-rays are, respectively, the ray from the
origin through the point (x1 , y1 (1), 0), and the ray from the origin through (x2 , 0, y2 (2))—where (x1 , y1 (1), y1 (2)) is
the performance of the unit with the largest value of y(1)/x, and (x2 , y2 (1), y2 (2)) is the performance of the unit
with the largest value of y(2)/x. [They exemplify the “points at infinity” that Farrell would have needed for the case s > 1.]
infinity”]
(ii) Imagine that the three plane quadrants enclosed by the three pairs of axes —the horizontal one between the
y(1) & y(2) axes, and the side ones between the x & y(1) axes and between the x & y(2) axes—constitute a continuous
sheet of shrink-wrappable plastic. Keeping it fixed to the x-axis and extending to infinity, shrink-wrap the sheet so
that it is a tight fit to the “frontier rays”—defined as those of the n+3 drawn in (i) (apart from the x-axis) that stop
the plastic shrinking any further. The surface defined by the shrunken plastic is Farrell’s estimate of the efficiency
frontier under the CRS “assumption”. The frontier units are those that lie on the frontier rays: all the other units
lie above the frontier surface.
(iii) Fig.3 shows the typical case in which the vertical line from the generic unit g = (x, y(1), y(2)) to the horizontal
outputs plane (moving down to the technical efficiency frontier while keeping the two outputs constant) meets that
frontier at the feasible point f in the facet between the rays of two frontier units i & j. The point f 0 of the same
facet is the point with the same input and the same ratio of outputs as g. (The atypical case is where a “side-ray”
is one of the rays defining the facet containing the points f & f 0 .)
The teff of the generic unit g will then be
teff = X/x = y(1)/Y (1) = y(2)/Y (2) .
The construction (i)-(iii) implicitly uses mixing to fill in the facets between the rays for the n units, and worsening to
fill in the “side flaps” that correspond to the two side-rays. The picture does not generalise easily to the case s > 2
because one cannot readily visualise a shrink-wrapping in more than three dimensions! Fortunately, explicit use of
mixing & worsening gives a construction that extends to the case s > 2 as easily as any excursion into hyperspace
can.
Here, therefore, are the steps of the alternative construction for the case s = 2 that has the same outcome as the
shrink-wrapping. The alternative construction (i*)-(iv*) obviates the need for the three extra rays in (i). [We leave
the reader to “see” how well it serves in more than three dimensions. If it does no more than provide a vague picture, even that may be
useful in understanding the algebraic logic that is expressed in definitions & arguments.]
(i*) Use rescaling to get n infinite rays from the origin through the n units.
(ii*) Use mixing of all the points in these n rays to make a “solid” cone Cn whose vertex is the origin
[the so-called
“convex hull of affine combinations”].
(iii*) Apply worsening to each point of Cn to complete the construction of the continuum C.
(iv*) For a generic unit g = (x, y) not on the efficiency frontier F of C, draw Cg , the subset of C of points (cx, dy)
proportionally better than g. [The “proportionally” now matters as far as y(1) & y(2) are concerned.] The teff of g is the
minimum of c/d. [It is the enlargement from Cn to C by the worsening procedure that creates the “side flaps” of the construction
(i)-(iii), that may or may not be involved in determining the teff of some units.]
2.6. MORE ON NON-CONSTANT RETURNS TO SCALE: DRS, IRS, & VRS.
Dispensing with CRS-motivated rescaling in the construction of C for the case (r = 1, s = 1) led to a frontier F that
satisfied the DRS condition. Before considering the same step for the case (r = 1, s = 2), we need to clarify what
is meant by decreasing returns for scale for that case. For the general case of multiple outputs, we can state two
distinct deployments of the DRS concept:
(a) An actual unit g = (x, y) would display DRS if it were to change its performance to (cx, dy) with c > 1 & c/d > 1.
(b) An efficiency frontier satisfies the DRS condition if the ratio c/d is necessarily greater than or equal to 1, when
(x0 , y0 ) & (cx0 , dy0 ) with c > 1 are both in F . [This deployment of the concept allows CRS as a limiting case of DRS].
11
It was easy to see graphically what happens in the case (r = 1, s = 1) when rescaling was excluded in the construction
of C but mixing with the origin was allowed. It is still possible to see, in much the same way, what emerges when the
same exclusion is applied to the construction (i*)-(iii*) for the case (r = 1, s = 2): in the figure that would replace
Fig.3, the n rays from the origin in (i*) would terminate at the points representing the n units. It might then be
“seen” that the frontier F satisfies the DRS condition. But, if not, the following reductio ad absurdum argument
may be preferred, especially as it applies to the problems of interest with s > 2 :
Suppose (x0 , y0 ) & (cx0 , dy0 ) were in F (and therefore in C) with c > 1 & c/d < 1, contradicting DRS for F . Mixing of
(cx0 , dy0 ) and the origin (0, 0) with weights a = 1/c & 1−a = 1−1/c, respectively, would give a feasible point (x0 , dy0 /c) also
in C. But, with d/c > 1, the latter would be proportionally better than (x0 , y0 ), which contradicts the initial supposition.
Hence F must satisfy the DRS condtion.
For fixed g = (x, y) and (cx, dy) in F , DRS means c/d is a non-decreasing function of c. It follows that teff for
g = (x, y), which is the minimum of c/d in the intersection of Cg & F , is given by the point with the smallest value
of c which is on the line of constant y through g.
We note that the econometric concept of increasing returns to scale (IRS) is incompatible with an efficiency frontier
of a convex set such as C—convex because of the mixing in its construction. Only a technique different from Farrell’s
can incorporate that concept. Farrell proposes, as “the only practical method”, that the units be divided into groups
of “roughly equal output” and that the CRS-based method be applied to each group separately (Farrell & Fieldhouse,
1962).
For completeness, we mention the somewhat ambiguously termed variable returns to scale (VRS), a “model” introduced by Banker et al (1984). It is simply what you get if the origin (zero outputs for zero input!) is excluded
from the mixing used in the construction of C for DRS. Efficiencies calculated with VRS may be considered quite
unacceptable: for example, unit j in Fig.8 would have a VRS efficiency of 100%.
[Would you accept VRS efficiencies for 43PFs? Levitt & Joyce (1987) seem to have no qualms in presenting VRS/Banker efficiencies in
their study of a single output for 38 police authorities in England & Wales.]
What, incidentally, is to be made of two general comments in Banker et al (1984)?:
“the concepts and definitions of theoretical economics as formulated for applications to private sector market behavior may
not always be best suited for management science (and related) applications in the not-for-profit sectors.”
“economics concepts such as returns to scale, etc., have no unambiguous meaning until the efficiency frontier is attained.
Thus, by virtue of this comment alone, most of the statistical-econometric studies on this topic are put in serious question.”
Thinking of 43PFs, the question of whether or not it is “quite simple to allow for diseconomies of scale” (Farrell’s
Section 2.4) may be irrelevant to the question of whether we should do so or not. It is far from obvious that, without
further information, the definition of efficiency should be changed to accommodate apparent decreasing or increasing
returns to scale. In some applications, such features of the data ought to be treated as possible consequences of
inefficiency (when the size of units is under administrative control)— inefficiency in large units when the frontier
shows evidence of pronounced DRS, inefficiency in small units when it shows IRS. [This question will resurface when we
look at some econometric thinking about 43PFs in the paper of Drake & Simper (2000).]
2.7. A SIMPLER PAPER-THIN PICTURE.
It is easy to forget that teff does not and cannot give a complete ordering or ranking of the units in D: in general, it
does not even give a partial ordering. What Farrell/DEA does is to place each unit in a two-dimensional continuum
of feasible units constructed by hypothetically realistic procedures from the performances of other units—the frontier
of the continuum being pulled this way or that by the performances of a handful of units.
As far as the determination of the teff of the generic unit g = (x, y) is concerned, the essential features of Sections
2.1-2.6 can be given a paper-thin representation in only two dimensions. In Fig.4, f is the unique frontier performance
that is both in Qg and on the frontier FCRS constructed with re-scaling, mixing & worsening: in the absence of
slacks, f is an affine mixture of the actual performances of s frontier units in D, as also are the points h & i. Point h
is uniquely determined as the one that is on the frontier FV RS (constructed for VRS without re-scaling and without
12
Figure 4: Frontier lines for CRS, DRS, & VRS efficiencies in the quadrant, Qg say, of performances proportionally
worse or better than g = (x, y): the variables on the axes are the scalar multipliers c & d.
13
mixing with the origin) but not on FCRS or FDRS , and that has the smallest value of c. Point i by comparison is on
both FV RS & FDRS but not FCRS . [In the absence of slacks, the segment f i is the intersection with Qg of the facet that is the
affine convex hull of s+1 prontier units in both FDRS & FV RS .]
The CRS-motivated teff is AQ/AR. For the g shown, the DRS teff is AQ/AR and the VRS teff is AS/AR: they
would be equal if g were to the right of f . The lettering A,Q,R,S,V,V0 facilitates reference to the paper by Drake
& Simper (2000) to be considered in Section 5.2. In what appears to be a comment on an equivalent picture, these
authors claim that:
“All economic organizations which use resources to produce outputs are prone to output ranges which display first increasing
then constant and finally decreasing returns to scale.”
The comment can be accepted if it is intended to refer to an underlying non-convexity of the sort given serious
consideration by Farrell & Fieldhouse (1962), but not if it goes no deeper than the fact that any VRS boundary
drawn by the construction method that gives Fig.4 will automatically have the property referred to, whatever the
units in D are trying to tell us. This point will be seen to have relevance when we look at Drake & Simper’s
suggestions for the reorganization of the police forces in England & Wales.
2.8. DOING WITHOUT SIZE: A DECEPTIVELY SIMPLE PICTURE?
Although we have argued in favour of keeping size of unit in the picture, there is a commonly presented picture that
has pedagogical merit when CRS can be invoked with conviction. In this case, it is both legitimate & helpful to
reduce the geometry of Section 2.5 from three dimensions to the plane of the outputs per unit input p(1) = y(1)/x
and p(2) = y(2)/x, as in Fig.5 (where the axes are really p(1) & p(2)). In Fig.5, the efficiency frontier surface F in
three dimensions shrinks to a frontier line, and teff is determined by the fraction of the distance of the unit to that
line (from the origin). The two segments of the frontier line parallel to the axes correspond to the two side-flaps in
the full three-dimensional representation, equivalent to the “slack variables” feature of the simplex method.
The fact that CRS has to be invoked with conviction to support this picture is easily overlooked. The points in Figs.
5 & 9 correspond to rays through the origin of Fig.2, and this has to be kept in mind when we are tempted to think
of a unit, processed by CRS-motivated DEA, as a mixture of such & such “peer” units.
2.9. CONSTRAINTS ON OUTPUT VARIABLE TYPE:
PROBLEMS OF SIGN, ORIGIN, & DEPENDENCE ON SIZE.
When cost, in the customary financial or finance-equivalent sense, is the single input, the idea of technical efficiency
and its concept of constant returns to scale (CRS) imposes some conventional constraints on the type of output
variables that are used in DEA.
The first constraint is a relatively weak one, already incorporated in our definition of the worsening procedure:
outputs, whether volumetric or not, are taken to be positive in the sense of “the more the better”—other things
being equal, outputs are expected to increase with input cost. So, for DEA as currently formulated, we would not
take y(1) = “number of burglars apprehended” and y(2) = “ number of burglaries for which no-one was apprehended”
. [Note that y(2) can be expected to be positively correlated with input cost across police forces, if only because cost is highly correlated
with population.] It is tempting to deal with this problem by subtracting y(2) from a number N larger than all the
values of y(2) in D, so that the output is positive in both senses of the word. However, the origin plays a crucial role
in DEA: the choice of N would strongly influence the teffs then calculated, as is easily seen from Fig.5 by moving
the origin a long way down the y(2) axis. Thanassoulis et al (1987) have suggested another “solution”: simply use
the reciprocal of any input that goes in the wrong direction. Something similar is suggested in Box 3 of Spottiswoode
(2000):
“When it comes to efficiency measurement, all of the indicators will have to work in the same direction. This will mean that, in practice,
some indicators will have to be ’inverted’.”
[In Section 4, we will introduce a less technical approach to efficiency measurement in which both y(1) & y(2) will be admissible, by the
assigment of positive value-weight to y(1) and negative value-weight to y(2). The origin, just a point on an aggregate value scale, still
plays a role but one that is neutral with respect the direction in which value is accumulated as additional outputs are brought into the
14
Figure 5: An illustration of the case (r = 1, s = 2) of reduced data, based on indifference to size.
15
numerator. In Section 2.11, an extension of DEA is proposed that also admits negative (i.e. “smaller the better”) outputs: however, this
generalization is designed to throw light on the character of DEA rather than to be considered seriously in the analysis of problems such
as 43PFs.]
A stronger constraint on the type of output variables stems from the size dependence in CRS-motivated rescaling
(or from the mixing with the origin for DRS) : it is that (x, y) & (cx, cy) have to be considered equally efficient.
This excludes outputs of the type y(3) = “percentage of burglaries for which someone was apprehended”: if y(3)
were not excluded, it would be difficult to maintain that halving all the variables (including cost) would not change
our measure of efficiency or that percentages over 100 were sensible. The constraint also plays havoc with the idea
that N − y(2) might be used to deal with “unapprehended” burglaries. [ When a single input variable is volumetric, like
financial input in 43PFs or like total acreage in Farrell’s application, then it makes sense in DEA to use volumetric output variables.
Conventionally, these are positively correlated to input. But positive correlation is not enough: use of the reciprocal of a negatively
correlated volumetric output violates any compatibility with the CRS concept. It is noteworthy that the public conception of police
efficiency is, more often than not, based on some simple proportion such as “detection rate (per crime)”—with no cost input at all. The
Section 2.11 generalization would accommodate such outputs. Note that y(3) is derived from two volumetric outputs one “positive” and
one “negative”—the number of burglaries for which someone was apprehended, and the total number of burglaries. Efficiency assessment
may be easier if these two elements are kept separate. ]
2.10. TECHNICAL EFFICIENCY AS A FLATTERING UPPER BOUND.
There is an well-known alternative derivation of Farrell’s CRS-motivated teff measure that has somewhat schizophrenic
consequences: there seem to be two competing justifications for the measure. The alternative does not need to create
any feasible points at all!
Suppose we adopt the view, introduced in Section 2.1, that we should try to use an intrinsic measure of the form
veff = v1 p(1) + ... + vs p(s)
(1)
where p(j) = y(j)/x and v = (v1 , ..., vs ) are intended to be socially agreed value-weights for volumetric outputs
y. [Note that, without explicit reference to the CRS concept, veff is a function of the reduced data p.] Suppose also that it
proves impossible to agree on v. Then each unit, if it has a spokesman [the Chief Constable would serve for 43PFs] might
legitimately ask to be assessed not by the undecided v but by a vector w that puts the unit’s own performance in
the “best possible light”. Suppose that it is then agreed to interpret “best possible light” as the maximization with
respect to w ≥ 0 of the interactive measure defined as the ratio of
weff = w1 p(1) + ... + ws p(s)
to the maximum of weff over the n units in D. [Note that no feasible units are needed.] It is straightforward Cartesian
geometry in the s-dimensional space of p(1), ..., p(s) [the case s = 2 is widely portrayed in the DEA literature] to show that
the the measure thus defined is none other than teff itself. The interpretation of teff here is therefore as an upper
bound to the “true” veff—where “true” refers to the undecided & unknown vector v. How useful would such upper
bounds be, if the veff approach were to be preferred? The question clearly merits further study in problems such as
43 PFs.
That this flattering use of veff will not reproduce the generally different, DRS-motivated teff is obvious, given that
use of DRS rather than CRS affects only the continuum C, which does not come into the derivation of the veff-based
ratio. Nor would it do so if the latter were defined using as denominator the maximum of weff over C rather than
over the n units, since the maximum is determined by one of the units.
Farrell argued against any use of a measure such as the veff of Section 2.1. In Farrell’s industrial framework, u1 , ..., ur
would be input prices and v1 , ..., vs would be output prices, and veff would be the basis of a “price efficiency”
comparison with the efficiency frontier. Farrell’s argument is two-fold:
(i) prices may be unstable (from firm to firm, or from time period to time period) compared with the estimate of the
production function (when s = 1), or the production “portfolios” (when s > 1), that the efficiency frontier ought to
be estimating;
16
(ii) price efficiency is “much more sensitive to the introduction of new firms (units) than is technical efficiency”.
The force of (i) must be a matter of judgement in the particular problem, where it refers to the weight vectors
u & v. For (ii) [thinking of 43PFS], the greater sensitivity or discriminatory power of the index based on veff, in which
weights are fixed rather than being amelioratively adaptive as is teff, may just be an honest reflection of interpretable
differences between units. Moreover, underlying Farrell’s argument there appears to be a degree of confidence in the
efficiency frontier (constructed so as to flatter the unit performances) as a realistic approximation to the upper limits
of efficiency—a confidence that may or may not be justified.
2.11. GENERALIZING DEA TO “NEGATIVE” OUTPUTS & NON-VOLUMETRIC OUTPUTS.
The alternative derivation of CRS-motivated teff in Section 2.10 suggests the generalization in which a number of the
volumetric outputs are “negative” in the sense of requiring negative value-weights. For the y(1) & y(2) of Section 2.9,
Fig.6 gives the simple picture for n = 7 and s = 2. The “negative” y(2) affects the worsening step in the construction
of C: the feasible set created by worsening (x, y(1), y(2)) is now
{(x0 , y 0 (1), y 0 (2)) : x0 ≥ x, y 0 (1) ≤ y(1), y 0 (2) ≥ y(2)}.
It may be verified that, in the reduced-data space, the feasible set Cp and frontier Fp are as shown in Fig.6. The
broken lines have constant w1 p(1) + w2 p(2) where w2 is non-positive. The maximization requires is that of the
Og/Oh—the ratio of distance to unit g to the distance to the frontier along the ray from the origin through the unit
g. It is therefore reasonable to call it teff too (see Section 2.8). Its numerical determination is given by adaptation
of the algorithm that serves for the case of all “positive” outputs:
max(Og/Oh)
w1 pi (1) + w2 pi (2)
w1
w2
=
≤
≥
≤
max{w1 pg (1) + w2 pg (2)},
1, i = 1, ..., n,
0,
0.
(2)
The method extends to the case s > 2 with up to s-1 “negative” outputs.
This route to questionable generalizations of DEA is no longer available when some outputs are non-volumetric
and when it is no longer possible to use the reduced-data representation. The generalizations in Cooper et al
(2000) are expounded in the rather specialized framework of linear programming algorithms (next Section). When
understanding rather than computational instruction is wanted, it is better to go back to the basic ideas of Sections
2.1 & 2.2 and specify the geometry of C & F for any generalization based on appropriate use of rescaling, mixing &
worsening, prior to the final stage of input minimization. For example, with the three outputs of Section 2.9:
• CRS rescaling would apply only to y(1) & y(2) to create rays of feasible sets not from the origin but from
points on the y(3) axis: (0, 0, y(3)) for generic unit g.
• mixing would then create the convex hull Cn of all the points on these n rays
• worsening would then enlarge Cn by reductions of y(1) & y(2) but increases in y(3).
Although the algebra of linear programming is then required to turn these steps into a input minimization in C,
going back to basics provides a reminder, necessary in any application, of what it is we are doing to the actual unit
data when any measure of efficiency is developed in this way.
2.12. LINEAR PROGRAMMING.
Farrell was told by one of the discussants of his 1957 paper that there was a new technique—the more than 10-yearold simplex method of linear programming—that would solve any problems in calculating teffs, even for applications
in which the number of variables, r + s, became very large.
At the same time, the simplex method automatically patched up (by its allowance of “slacks” in variables) the lacuna
about “points at infinity” (noted in our Section 2.5). For r = 1, the Simplex connection is made as follows:
17
Figure 6: Illustration of a reduced-data space with “negative” y(2).
18
From Section 2.5, we know that, for both CRS- & DRS-motivated construction of C, we get the teff of g = (x, y) when d = 1 i.e. by
“input minimization” at the point (cx, y) in C with minimum c (when teff= c). The conditions for (cx, y) to be in C are cx = Σi ci xi ,
y ≤ Σi ci yi , ci ≥ 0, i = 1, ..., n, and, only when rescaling is excluded for the DRS case, the extra condition Σi ci ≤ 1. (If at all wanted,
the VRS case requires the further condition Σi ci = 1.)
The minimization of c with respect to variation of the ci is no problem for the Simplex algorithm : it is the dual of the maximization
derivation of teff via weff in Section 2.10. Note that, for the DRS case, 1 − Σi ci is the necessary complementary mixing weight for the
origin (0,0). For CRS (without the extra condition for DRS) it is easily verified that the algorithm takes the reduced-data form:
min c
(3)
p/c
≤
Σi ai pi ,
ai
≥
0, i = 1, ..., n,
Σi ai
=
1.
For the generalization of Section 2.11, the dual of the algorithm (2) is given by using ≤ for “positive” outputs and ≥ for “negative”
outputs, in the second line of (3) . [Cooper et al (2000) claim that “Farrell efficiency” (i.e. teff) should yield historical priority to the
“CCR efficiency” of Charnes et al (1978). The claim is examined in Appendix 4.]
2.13. SENSITIVITY TO. . . OR TAKEOVER BY. . . OUTLIERS
The reduced-data representation of the case (r = 1, s = 2) can be used to illustrate another feature of the nonintrinsic character of teff. This is the sensitivity of a unit’s teff to outliers among other units—a sensitivity that
becomes increasingly significant as, for example, the number of outputs s increases to the level represented by the
20/43 of s/n that the 43PFs problem may require. In Fig.7 (where the axes are really p(1) & p(2)) unit i would
have had a teff of 100 % if the outlying unit j (with very large y(1) and negligible y(2)) had not been included. How
can we be sure that j deserves a teff of 100% (on grounds of “priority” perhaps) and that it is not “cherry-picking”?
[The question can be put with greater realism when there are, say, 20 output variables among which there may be one or two that can
be produced in quantity at relatively little cost.]
2.14. ADDING OR DISAGGREGATING AN OUTPUT ALWAYS INCREASES “EFFICIENCY”.
Nunamaker (1985), working with the case of a single output & multiple inputs established that adding an extra input
variable, or disaggregating an existing input into two component inputs, could not decrease (and would typically
increase) the CRS-based teff of any unit. His indicative proof of this theorem, in a footnote to his paper, can be
confirmed by the following precise & general argument for the logically equivalent “dual” case (r = 1, s > 1), using
the alternative definition of technical efficiency of Section 2.10:
From Section 2.10, teff = max{fs (w1 , ..., ws ) : wj ≥ 0, j = 1, ..., s} = max{fs } (for short), where fs = fs (w1 , ..., ws ) =
weff/(maximum value of weff in D). For the addition of an (s + 1)th output with weight w(s+1) , fs is just f(s+1) with
w(s+1) = 0. Hence max{f(s+1) } ≥ max{fs } , and teff cannot decrease. For the disaggregation of y(1) into y(1a) & y(1b),
with y(1a) + y(1b) = y(1) and associated weights w1a & w1b , fs is just f(s+1) with w1a = w1b = w1 , and the same nice
property of a maximum can be applied again.
This gives theoretical underpinning to the experience of DEA users, only too well aware that, to quote Thanassoulis
et al (1987):
“the larger the number of inputs and outputs in relation to the number of units being assessed, the less discriminatory
the method appears to be. . . . Thus the number of inputs and outputs included in a DEA assessment should be as small as
possible, subject to their reflecting adequately the function performed by the units being assessed [my italics]”.
In this, low discrimination means not enough units with relatively low teffs.
[It seems that DEA practitioners do think there have to be limits to the permissiveness of DEA that allows units to be seen in the “best
possible light”. Moreover, the requirement of adequate reflection of some function or activity has to be more demanding than its mere
representation in some output or other: in 43PFs, for example, the omission of any output dependent on some particular activity of
consequence would be noted and perhaps dysfunctionally exploited by police forces in their allocation of resources eg. prosecution for
fraud.]
19
Figure 7: Unit i might be justifiably sensitive about the influence of unit j.
20
2.15. WEIGHT CONSTRAINTS IN DEA.
The first indication of technical pathology in DEA came to light before the baby was christened—from the seconder
of the vote of thanks for Farrell’s 1957 paper. Translated to the single input problem, Chris Winsten’s example goes
like this:
Consider any production process in which a particular output, say y(1), is necessarily so closely related to input x that the
ratio y(1)/x is practically constant. Then all n units will be on or very close to the efficiency frontier, and will be awarded
a teff of 100%, whatever performance they display on the other s − 1 outputs.
In reply, Farrell acknowledged that such 100% efficiencies would be “unduly charitable” and said it would be “necessary to bring in extra information and define a more stringent measure”. But, if the general concept of technical
efficiency is sound, why did Farrell not stick to his guns, and compare each unit with the formally feasible units
generated by mixing with the origin—the reference set of proportionally worse performances? I happen to think that
Farrell was right to feel uneasy about Winsten’s example, but wrong to think that the difficulty could be resolved
by “extra information” (that the formally feasible reference set is not feasible in any realistic sense?), rather than
by facing up to the question of weighting raised by Winsten’s example. In our “dual” version of that example, a teff
of 100% is equivalent to giving zero weight to all the outputs except y(1). Defences of such 100% values—that they
are useful upper bounds to unknown true efficiencies based on veff (Section 2.10) or, equivalently, that they are the
“efficiencies” that present units in the “best possible light”—at least have an honest clarity, but also disguise the
potential implications for the underlying weights.
The problem is squarely faced in the police force study of Thanassoulis (1995), where it is dealt with in an openly
subjective but suggestive fashion. The author recognizes that a police force (unit) “may be at a part of the efficient
boundary characterised by unacceptable marginal rates of substitution [weights] between outputs, say valuing one
or more outputs excessively while giving negligible value to other outputs”, and finds that the weights were “often
counter-intuitive” [my italics]. The remedy proposed is simple: impose restraints on the weights! In his study of 1991
Audit Commission data for the 41 police forces excluding “The Met” and “City of London”, Thanassoulis used only
three broad outputs
• y(1) = number of violent crime clear-ups
• y(2) = number of burglary clear-ups
• y(3) = number of “other” crime clear-ups
on the grounds that “the bulk of police effort is applied to investigation” and that these three crime categories
were “sufficient to convey an overview of the seriousness and complexity of crime found”. The categories clearly
provide comprehensive cover of clear-ups, but their small number had to be justified on the grounds that “retention
of numerous crime categories would overcomplicate the analysis”.
Thanassoulis boldly imposed two inequalities on the output weights:
w1 /10 ≥ w2 ≥ 2w3
(in the notation of our Section 2.10), that are then easily built into the primal linear programming algorithm. Section
4 will take up the arguments for extending such inequalities to specify a set of extremal choices (2s−1 in number) for
the weights v1 , ..., vs in veff, and for dispensing with the questionable use of DEA to make the “best possible light”
choice of constrained weights in the associated efficiency comparisons. The question is taken further in Allen et al
(1997).
2.16. PRIORITIES, ENVIRONMENTALS, OR INEFFICIENCIES?
For reasons already stated, our account of Farrell’s work has dealt mainly with the case of a single input & multiple
outputs. This is, in a sense, the dual of the case that Farrell was mainly concerned with, and that was illustrated with
21
his sole application—to the single agricultural output of 48 American states, expressed in unambiguously financial
terms. If we contrast the latter single-output application with a dual case application, such as 43PFs, we may well
ask whether the restriction to comparisons in Qg (Section 2.7)—in which the output profile (the ratios of outputs to
each other) is the same as the profile of g—has the same relevance. However, one argument that has been adduced
in favour of the restriction is that (i) ratios of outputs, such as y(1)/(y(2) with y(1) the number of burglars & y(2)
the number of street robbers apprehended, do reflect the priorities of a police force, (ii) such priorities should be
respected, and (iii) respectful efficiency comparison should use Farrell/DEA teffs.
Dyson et al (1990) followed Charnes et al (1978) in presenting the priority argument as justification for rejecting any
attempt to use a common set of weights. They pose the problem of efficiency assessment of schools with “achievements
at music and sport amongst the outputs”, observing that some schools may “legitimately value achievements in sport
and music differently to other schools”. Levitt & Joyce (1987) in their study of 38 police forces in England & Wales
have a DEA “model” with just two outputs: the number of recorded crimes “against the person” cleared up, and
the number “against property” cleared up. They claim that by making this distinction “we are able to allow for the
possibility that forces may have different priorities towards the solution of one sort of crime as opposed to the other.”
Whatever validity there is in the priority argument, it certainly goes beyond the thinking of Farrell—in which the
“technical” in teff refers to the idea of a somewhat mechanical production function that converts inputs into a
dependent output, and to the associated idea that, if rescaling is thereby justified for any one output, then it is
automatically justified for multiple outputs. Assuming that the “priorities” reflect specific environmental pressures,
they must result from allocations of sub-costs within the total cost represented by x: they are thus arguably more
related to the concept of “allocative efficiency” of Farrell than to the “technical efficiency” estimated as teff. The
argument deserves further analysis. In the application of DEA with the appreciable number of output variables
(15 ≤ s ≤ 25?) that are needed to cover (and comprehensively motivate) police force activity in the 43PFs problem,
the smallness of the number of forces, 43, has the following consequences. The teff of an assessed force will typically
be determined by the inverse ratio of its cost to the lowest hypothetical cost in a set of pseudo-forces conceived
as “feasible”. There is a fairly high probability that the lowest cost is that of the assessed force itself (when it
is on the efficiency frontier). For other than such forces, the lowest cost is found (for the CRS case) by, roughly
speaking, rescaling, mixing & possibly worsening the specific performances of s or s − 1 (20 or 19?) of the forces on
the efficiency frontier. Many if not all of these will have output vectors y very different from that of the assessed
force. If the priority argument is taken seriously, their priorities (determined by specific environmentals) will be
very different too, which makes the associated value of teff at least questionable. The teff does not come from a
comparison constructed from forces with even approximately the same profile. So we can ask whether mixing should
be used at all, to create the pseudo-comparison at the heart of DEA. Also how can we know that it is not the
pressures towards inefficiency (to be found in all organisations) that are favouring the particular “priorities” of the
force under assessment? Nunamaker (1985) put the issue very clearly, pointing out that the other edge of the sword
that allows units to present their performance in the “best possible light” is an edge that allows units to engage in
“creative accounting, political lobbying, alteration of input/output mix, etc.”
and that
“provides...incentives for [unit] managers to act in a dysfunctional and socially unacceptable manner.”
These comments & quotations are not intended to weaken the case for rewarding some allocation of activity that is
honestly responsive to environmental pressures. But they do suggest that care is needed in how the reward is to be
engineered, if it cannot be adequately accommodated by the technique of stratification into groups of environmentally
“most similar forces”.
A further question about “priorities” is suggested by comparison of units i & j in Fig.5, which have equal teffs of
about 85%. Can unit i realistically escape censure on the grounds that it is not really less efficient—but is merely
exercising its priority for output y(2) over output y(1)? If such grounds are accepted, why could i not adjust its
“priorities” without change in cost x, so that it has the outputs associated with the feasible unit j (having, let us
suppose, the same cost)?—output y(2) would be unchanged but y(1) would be increased ten-fold.
In one sense, talk of “priorities” is something of a tautological red herring. Unless there is an external criterion, it
22
merely says that a unit likes its profile to be such & such, and misleads if carelessly interpreted as implying that the
unit is being directly compared with other units having even approximately the same profile (=priorities!), rather
than with feasible units constructed from units with quite different profiles & environmentals.
Another, perhaps more significant, sense in which talk of “priorities” may be misleading is that for CRS-, DRS-, &
VRS- motivated teffs, teff is (as we have preferred to define it) the “input-minimization” efficiency for fixed outputs,
where the minimization is over all the feasible units in C. In other words, the restriction to the set Cg , in which the
priority concept has been raised, is superfluous. Of course, the problem of justification is thereby simply transferred
to making the case for “input-minimization”.
2.17. EVEN THE SIMPLEST PROBLEM CAN BE PROBLEMATIC.
The last section asked whether an ill-formulated untestable hypothesis (about “priorities”) should influence the choice
of efficiency assessment technique. Fig.8 illustrates a particular type of data for r = s = 1 that may provoke another
hypothesis (this time perhaps testable)—the hypothesis of a “fixed cost” (a special case of IRS). CRS-motivated
or DRS-motivated Farrell/DEA gives units i & j very low teffs. If, using VRS, we allow for a “fixed cost”, we are
encouraged to see these units in a different light. But if not, what are the principles that favour some hypotheses
(eg. “priorities”) but not others? Is there not a case for sticking with a measure of efficiency that does not, as a
matter of principle, bother about what is inside the “black box” connecting y with x (and z)? Section 4 will try
to specify such a measure, after Section 3 has looked at another technique that may also be trying to say too much
about what is going on in the black box—under the mantle of a statistical sophistication that contrasts with the
mechanical determinism of DEA.
2.18. CAN FARRELL/DEA ALLOW FOR ENVIRONMENTALS?
Farrell’s term for environmentals was “quasi-factors”—as if they were factors of production, differing from other
factors only in that they do not come with a price-tag. In the wider DEA literature, such environmentals are known
as “uncontrollable” or “non-discretionary” inputs (Charnes & Cooper, 1985). Farrell suggested two ways of dealing
with them: the first was to treat them like any other necessary input in the definition of technical efficiency (which
does not involve prices); the second was to divide the units into groups homogeneous in the quasi-factors and make
a separate efficiency assessment within each group. [Annex C of Spottiswoode (2000) reports the advice of economists &
econometric specialists that, for 43PFs, “environmental factors affecting police outputs need to be taken into account—and in modelling
terms probably treated as an input”. The difficulties of the “taking into account” are likely to be compounded by the very large number
of environmentals that can be claimed to influence the 43PFs outputs (see Section 1).]
Farrell gave no application of the use of quasi-inputs, but it seems that he saw them as volumetric inputs positively
related to outputs. However, environmentals can also be negatively related to outputs.
[For 43PFs, examples of volumetric environmentals drawn from the “Police Funding Formula” documentation (Home Office, 2000a)
include:
• number of people living in terraced housing,
• area (hectares) covered by the police force.
For outputs like “number of crimes detected”, the first of these are taken to be “positive”, while the second is taken to be “negative”. It
is clear that such judgements are fraught with uncertainties for any technique that tries to make use of them.]
Dyson et al (1990) appear ready to treat environmentals as outputs:
“ A key aspect of DEA is incorporating environmental factors into the model as either inputs or outputs. Resources available
to units are classed as inputs whilst activity levels or performance measures are represented by outputs. One approach to
incorporating environmental factors is to consider whether they are effectively additional resources to the unit in which
case they can be incorporated as inputs, or whether they are resource users in which case they may be better included as
outputs.”
So parental education level would be an input in a study of schools, whereas competition level would be a “resource
user” and therefore an output in a study of businesses.
When environmentals are treated as inputs, there appear to be two ways in which they are incorporated into the
23
Figure 8: A problem with the simplest case?
24
associated linear programming algorithms of DEA. To examine their logic (but not their practicality) it is enough to
suppose we have just one output and one volumetric environmental input z in addition to x: so that, in the general
framework, there is two-dimensional input vector x = (x, z).
The first way leaves DEA to operate in its usual way for multiple inputs: the objection to this is that teff is based
on a comparison of a generic unit g = (x, y, z) with a feasible point (cx, y, cz) in Cg in which the “uncontrollable”
input z has been changed by the same factor c as for the (hypothetically) controllable input x. What is presumably
wanted instead is to determine the point (cx, y, z) on an appropriately defined efficiency frontier, which would allow
efficiency of g given z to be determined as c. So the second way must go back to basics, construct an appropriate
set C of feasible performances and its efficiency frontier F . All we will here do is to exhibit the 3-stage construction
that leads to the method of Charnes & Cooper (1985):
• rescaling for z as well as x & y
[the existence of the generic unit g = (x, y, z) is taken to imply the feasibility of performances
(cx, cy, cz), given the supposed volumetric character of z.]
• affine mixing
[questionable, since “long distance” mixing means we are not just making a local interpolation.]
• worsening of x & y for fixed z
[reasonable.]
It may be verified that, with C and F thus constructed in the space of (x, y, z), the feasible set Cp , say, and the
efficiency frontier Fp ,say,in the reduced-data space of the hybrid ratios
p(1) = y/x , p(2) = z/x
has the geometry exemplified in Fig.6, even though the reduced variables now have a different interpretation. The
partial isomorphism with the theory for Fig.6 then gives us our understanding of the Charnes & Cooper (1985)
method and what it does. Let (cx, y, z) be the point in Fp with the same values of y & z as g. The method is either
the linear programming algorithm (Norman & Stoker, 1991, Appendix A7) that gives c as
c =
w1 ≥
w2 ≤
w1 pi (1) + w2 pi (2) ≤
max{w1 pg (1) + w2 pg (2)},
0,
0,
1, i = 1, ..., n,
or its dual (Section 2.12). The generalization to s outputs and t environmentals is mathmatically straightforward.
The fact that the Charnes & Cooper method gives an “efficiency” c that, with the data in D other than the (x, y, z)
of g treated as fixed, is an automatically determined function of (x, y, z) should give food for thought to those who
recommend this way of dealing with environmentals. From the analogue of Fig.6 that here applies, we can see that
c = Og/Oh is a decreasing function of z for fixed (x, y) (until the extending ray Og meets “slack”). All this whether
z is “positive” or “negative”! Should DEA be required to act as some oracular goddess that can dictate the shape
of reality—not a deus but a dea ex machina?
[Spottiswoode (2000) quotes more econometric advice—that “environmental variables should be included in the analysis—either in the
models or by completing separate DEA analyses on sub-samples of forces which share similar operating conditions. However this latter
approach will result in a high proportion of forces being efficient” . Here the term “models” should be interpreted as choices of particular
linear programming algorithms.]
Relative to such questions of logic, it may be of only secondary interest that the method will award 100% “efficiencies”
to a large proportion of the n units, when an already sizeable value of s has to be augmented by a realistically large
value of t. Note also that the problem remains of how DEA should allow for non-volumetric environmentals that
cannot be put in volumetric form, such as the “density of population” used in devising PFF (Appendix 2).
25
2.19. MISCELLANEA
Among the recommendations of Spottiswoode (2000) is the following:
The Home Office, in consultation with its policing stakeholders, should review the appropriate input and outcome
measures, outcome weight ranges, and SFA and DEA models with a view to first using them in mid-2001 using audited
BVPI 2000/01 data. The task of specifying, building and testing the models should be contracted out to independent
experts.
Elsewhwere in the report the task is extended from “building & testing to “validating”.
When it was announced that US President Coolidge had just died, some wit asked “How can they tell?”. It is
salutary to try to answer the more difficult question of how to test & validate a DEA “model” that is not much
more than an algorithm to produce a string of putative efficiencies (that in any case should not be ranked). The
production of a “consistent” string of numbers from SFA does not help since these two techniques are doing very
similar things with the data, from almost dual viewpoints.
DEA may have greater value when spelled EDA (the Exploratory Data Analysis of Mosteller et al, 1977). Especially
if only “mixing” is used in the creation of feasible performances, DEA can be seen as a bold, if not rash, solution to
the problem of missing or inadequate data.
Proponents of DEA might look to the arch deconstuctor of rationality, Michel Foucault (1973, p.xx), for ideological
support:
“Order is, at one and the same time, that which is given in things as their inner law. . . and also that which has no
existence except in the grid created by a glance, an examination, a language; and it is only in the blank spaces of this grid
that order manifests itself as though already there, waiting in silence for the moment of its expression.”
3. STOCHASTIC FRONTIER REGRESSION (SFR).
3.1. INTRODUCTION.
The statement of the general problem in Section 1 was slanted towards the idea of an intrinsic, rather than heavily
interactive, measure of efficiency of an individual unit. However, a measure such as Farrell/DEA’s teff, that depends
on the existence of other units for its definition, cannot be dismissed just because it is interactive. Even an intrinsic
measure needs a scale on which it can itself be assessed and employed as a comparative incentive to increases in
efficiency. For an intrinsic measure, the scale is provided by the set of n individual measures based on D alone—
without any feasible-point infilling!
Although the basis of teff is its construction of a hypothetical efficiency frontier, by some creative accounting as it
were, the Farrell/DEA method does not venture outside the range of the data D in any significant sense. We now
consider a technique, stochastic frontier regression (SFR)—otherwise known as “stochastic frontier analysis” (SFA)—
that does go outside D in a conceptually significant and imaginative way and that therefore raises the evergreen
question of realism.
Devised for the single output case (Aigner et al, 1977 ), SFR is based on the concern that uncontrollable variation
in output is interpreted as inefficiency by deterministic techniques like DEA. The problem faced by any statistical
method that tries to meet this concern is how to separate the two contributions to the deviation of each unit from
the supposed frontier. It is the delicacy (a lack of robustness to assumptions) of the method devised by SFR to do
this that poses a significant reality challenge—especially when the method is translated to the single input/multiple
output case.
3.2. SFR MODELLING FOR A SINGLE OUTPUT/PRODUCTION.
Although our only concern is with the single input/cost case, it is necessary to review the single output/production
techniques that have now been adopted for the single input case. For the single output case of the industrial sort
26
considered by Farrell, SFR may choose to take the output y of the generic unit g = (x, y) to be randomly generated
by a two-component deviation from a production function Y = f (x):
y = Y UV
(4)
where U & V are independent random variables. Here U , in the interval (0, 1], represents the efficiency of g, whereas
V , distributed around the value unity, represents the uncontrollable random variation in y. A statistical estimate of
U is the SFR efficiency seff, say, The production function f (x) is commonly assumed to have a reality in the shape
of a Cobb-Douglas function, transformable to the logarithmic form
log Y = a + b1 log x(1) + ... + br log x(r)
(5)
Equation (5) satisfies the CRS condition if and only if b1 + ... + br = 1.
From (4),
log y = log f (x) + v − u
(6)
where v = log V and u = log(1/U ) ≥ 0. The LimDep software (Greene, 1995) takes v to have a normal distribution
with zero mean, and, among other options, gives the user the choice of an exponential, half-normal, or truncatedat-zero normal for u. Assuming all x-values in D are non-zero, the parameters in (5) and in the distributions of u
& v are estimated by maximum likelihood. Conditional on v − u = e, the theoretical expectation of the distribution
of u in (3.3) is a function of e and the parameters: the efficiency U of g may then be estimated as the negative
exponential of the maximum likelihood estimate of this function (Jondrow et al, 1982).
[Such SFR models make the implicit assumption that there are no significant errors in the inputs x—in the sense of significant deviations
from “true” values X that might be required in (6) to allow u to represent the “true” efficiency of g. Without the assumption, the
method runs into the identifiablity problem known as “functional relationship” or “errors in [all] variables” in statistical theory. There
is, paradoxically, a deterministic version of SFR that omits the V from (4), attributing all variation to inefficiency.]
3.3. SFR MODELLING FOR A SINGLE INPUT (“COST”).
The SFR literature (eg. Bauer, 1990) suggests that the the single input case can be treated as the nearly symmetric
“dual” of the single output case of the last section—the formal symmetry would be exact but for the recognition
that inefficiency increases cost. This would give X = f (y), x = (X/U )V , with the analogous specialization:
log x = log f (y) + v + u
(7)
where f (y) is now an s-dimensional efficiency frontier surface rather than a production function. Equation (7) is the
reduction to a single input of equations (4.1) & (4.2) of Bauer (1990) . Bauer indicates that (7) is applicable only
if the output vector y is exogenously determined. The suggestion ignores the likely presence in many applications
of variation in y that would, without special assumptions, vitiate any simple functional relationship X = f (y), and
that could not be written out of a realistic acccount by a declaration of exogenicity. [For 43PFs, can it be maintained that
any realistic error structure can be confined to x and does not involve y?]
A more realistic alternative in many problems would be to take the V out of the equation x = (X/U )V and put
“error” into y. Then (7) would have to be replaced by s + 1 equations: if the logarithmic form were maintained
throughout, these would be:
log x = log f (Y) + u
log y(j) = log Y (j) + v(j), j = 1, ..., s.
(8)
(9)
The Cobb-Douglas production function might then be put to use as a cost function:
log f (Y) = a + b1 log Y (1) + ... + bs log Y (s)
(10)
with b1 + ... + bs = 1 for CRS. Alternative modelling of the error in y, such as
y(j) = Y (j) + v(j), j = 1, ..., s
27
(11)
may also be considered (though this would not be realistic for 43PFs). The cost x itself may well be modelled as a
linear form divided by the efficiency term U:
x = [a + b1 Y (1) + ... + bs Y (s)]/U
(12)
with a = 0 for CRS-compatibility. The possibilities are almost unlimited. The main criterion (within this approach)
must be whether the models make sense with the data in which there are unknown and perhaps unmodellable
inefficiencies, for which the choice of the function f would be critical. For example, would it make sense to apply
(12) with a non-zero intercept a in the case of the data for s = 1 shown in Fig.8, or would it be sensible to omit
the CRS condition b1 + ... + bs = 1 from (10) (see the final paragraph of Section 2.6)—and continue to treat U as a
measure of efficiency?
Ssuch questions may be secondary to the “errors in variables” problem, as recognized by Schmidt (1985) in requiring
that a (single) output be exogenous when a “cost frontier” is to be estimated (as the dual of his “production function”
). Appendix 1 explores the problem in just the simplest case: s = 1 & b(1) = 1 (CRS-compatibility) with even simpler
distributional assumptions than those made for a single output. The problem here may have been foreseen in the
following comment on the current state of SFR (Cooper et al, 2000, p.264):
“There are shortcomings and research challenges that remain to be met. One such challenge is . . . to include multiple
outputs as well as multiple inputs.”
For the applications of SFR in Section 5, we have therefore been content to demonstrate the questionably relevant,
statistical package LimDep, designed for the manifestly different dual case, but actually recommended by Spottiswoode (2000) as a check on the performance of DEA. This package replaces the complexity of (8)-(12) by the
deceptively simpler options
log x = log f (y) + v + u
x = f (y) + v + u
(13)
(14)
When log f (y) in (13) or f (y) in (14) are linear in their parameters (with an intercept), the relationship fitted by
LimDep is usually close to a translation, parallel to the x-axis, of the Ordinary Least Squares (OLS) line fitted with x
as dependent variable. An alternative to SFR, as described so far, is the efficiency frontier defined by the “corrected”
OLS (COLS) line—shifted up & parallel to the x-axis so that all but one residual are negative (Greene, 1980). All
this work becomes more questionable if no attention is paid to the reasonableness of the fitted frontier function f (y)
as a platform on which efficiency is defined: consider, for example, its application to the simplest of data sets in the
shape of Fig.8.
Those who maintain with Spottiswoode (2000) that SFR can be used as a “check” on DEA should note that the
common use of a Cobb-Douglas f(y) in (13) violates the convexity condition on the “technology space” in s + 1
dimensions defined by worsening the expectation of the frontier surface x = f (y). From the start, there can be a
built-in conflict between the “models” for DEA & SFR. However the expectation that there are some similarities
between DEA & SFR leading to a degree of correlation between teff & seff is a reasonable one—even though such
correlation should not be taken as validation of their logics. [The situation is not as dire as the dispute in physics that provoked
the question “What do Abraham Lincoln & Einstein have in common?” and the answer “Both have beards, except Einstein.”]
Although Cubbin & Tzanidakis (1998) did things I would not wish to emulate (eg. reducing to three a very large
number of relevant outputs by purely statistical technique), their application of SFR managed to raise a concern
diametrically opposite to the one that motivated the invention of SFR:
“If, as is the case in the water industry, the overall error appears to be almost symmetrically distributed the logic of the
stochastic frontier approach would imply a very small range for w [our u] and hence very low levels of inefficiency. This
may be difficult for a regulator to accept [my italics].”
3.4. SIMULATION TO THE RESCUE?
Thanassoulis (1993) analyses some artificial data with n = 15, r = 1, s = 3 in an attempt to compare the relative
merits of DEA & RA (RA is simple regression analysis that dispenses with the complexity of SFR). He claims
that, although they are based on hypothetical data, his findings are “a consequence of the underlying nature of
28
two methods and they are therefore generalizable”. The data were generated by taking 15 error-free output vectors
(whence Y = y), using a numerical specification of equation (15) to give seven error-free x-values, and another eight
x-values with added values of inefficiencies u (no errors of type v are involved). RA then fits the plane surface
x = b1 y(1) + b2 y(2) + b3 y(3)
(15)
by ordinary least squares, and defines efficiency as the ratio of fitted to observed input x. Unsurprisingly, it is found
that:
(a) DEA gives teffs that are equal to the “true” efficiencies (except for two units where “slack” plays a role), simply
because for most units the relevant frontier facet lies on the “true” efficiency frontier;
(b) RA (influenced by the values of u added to eight of the units, and with its fitted line shifted up the x-axis
from the “true” frontier) does poorly, giving the other seven units efficiencies exceeding 100%. The ranking of these
efficiencies is, however, less ojectionable.
So DEA is judged good for estimation, while RA is judged to be not so bad for ranking. It is these findings that the
paper presents as generalizable.
Thanassoulis justifies his avoidance of the complexity of SFR by quotation of Schmidt’s paper. An alternative
quotation (Schmidt, 1985, p.306) reflects the problem with such avoidance:
“In my opinion the only serious intrinsic problem with stochastic frontiers is that the separation of noise & inefficiency
ultimately hinges on strong (and arbitrary) distributional assumptions. This is not easy to defend. However, in defense of
stochastic frontier models, it is clear that this problem is not avoided by assuming the frontier to be deterministic. Assuming
statistical noise not to exist is itself a strong distributional assumption, and one that is empirically false in data sets that I
have analyzed.”
Generalities aside, one might test the realism of SFR by asking how the method would react to the simplest of data
sets illustrated in Section 2.17.
3.5. CAN SFR DEAL WITH ENVIRONMENTALS?
If the problem of formulation for multiple outputs could be resolved, it would be tempting to deal with environmentals by incorporating z into f (Y) or f (y) to adjust for the undoubted influence of environmentals in most
applications. However, we still have the problem touched on at the end of Section 2.8: if efficiency is correlated with
environmentals (eg. “resident population” or size), as is most likely, then adjustment for z may prematurely remove
from consideration an important component of what the study is about. Such adjustment would also have to face
the problem of “over-adjustment” (the analogue of the problem of “over-fitting” in multiple regression) and of then
justifying the finally decided adjustment in terms of some realistic model.
3.6. “STATE OF THE ART” ?
Spottiswoode (2000) relied on “advice from leading experts in the fields of economics and econometrics having
practical experience in efficiency measurement”, summarised in a quotation from one such source:
“The use of both DEA and a parametric frontier technique such as . . . SFA undoubtedly represents the ’state of the art’ in
terms of relative efficiency analysis and would represent the optimal approach to efficiency analysis across police forces.”
A paper from the same source (Drake & Simper, 1999) was said to be “encouraging” in the final paragraph of the
technical annexes of Spottiswoode (2000), along with yet another quotation:
“both DFA and DEA produced very similar efficiency rankings suggesting that both are viable methodologies for the relative
efficiency analysis in public sector services such as the police.”
Here “DFA” is the authors’ Distribution Free Analysis version of SFR—a complex procedure applied to 5 years’ data
of 43PFs with 4 inputs but only 3 outputs. The procedure will be reviewed in Section 5.2.
29
4. THE THIRD WAY (TTW).
4.1. INTRODUCTION
“The Third Way” (TTW), here proposed as an alternative to DEA & SFR rather than a complement, has historical
precedents in welfare economics, that may have been blighted by technical problems in reconciling valuation theory
and other economic theory. The reluctance to engage in the thorny issue of multiperson preferences may have been
reinforced by the famous “Impossibility Theorem” of Kenneth Arrow (1951)—the apparently unattractive finding
that, roughly speaking, dictatorship was the only option for social choice satisfying some simple axioms. However, one
form of dictatorship is the benevolent exercise of political will & judgement by a democratically elected government.
Given the manifest technical difficulties with less overtly political techniques such as DEA & SFR, should the
possibility of benevolently dictated valuation be left unexplored, especially in problems like 43PFs? The idea that we
need to introduce some sort of exogenous valuation, in order to resolve matters, surfaces here and there throughout
the efficiency literature, even among exponents of DEA such as Thanassoulis quoted in Section 2.16. From a less
committed viewpoint, Lewis (1986), echoing Nunamaker (1985), put the issue clearly in comments on DEA:
“the efficiency criterion used regards all the variables as having equal importance. This may mean that a DMU which
dominates in the production of a relatively unimportant output is assessed as efficient at the expense of a DMU which is,
in fact, more efficient at producing more valuable outputs”.
Introduced in Sections 2.1, 2.9 & 2.10 as the undoctored use of veff, the technical simplicity of TTW will be unattractive to those who believe that complex problems must have complex, preferably technically advanced, solutions. For
example, Cubbin & Tzanidakis (1998) would dismiss veff as a “simple ratio analysis” compared with “sophisticated
mathematical & statistical modelling”.
[Until quite recently, “sophisticated” meant “deprived of original simplicity”. King Lear’s regretful adage that “Striving to better, oft we
mar what’s well” is clearly far too complacent for a problem such as 43PFS, but it could hold good in the distinctly un-Shakespearean
form: “Striving to refine a method or model with more bells & whistles can be counterproductive, and it may be more realistic to move
in the direction of greater simplicity.”]
Spottiswoode (2000) claims that:
“All techniques for measuring comparative police efficiency would work best when there are a limited number of input and
outcome variables relative to the number of forces being measures.”
This claim is a response to the non-discriminatory feature of DEA (see Section 2.14) and to the problems of validating
speculative statistical modelling in SFR (see Section 3). The claim should not influence the implementation of TTW,
since TTW is designed to accommodate a good number of necessary explicit outputs [see Section 1 (ii)].
4.2. A GOOD START?
Despite its limitations, we develop TTW from another look at the police study of Section 2.15. Thanassoulis used
DEA with four inputs —numbers of violent crimes, burglaries, other crimes and officers. With these, teff is based
on weighted combinations of clear-up rates, each with respect to some weighted combination, determined by DEA,
of the four inputs. Of these, only the number of officers can be considered a cost, and, if the analysis of this limited
database were to be repeated, it might be more logical to use a single input x—the total cost of a police force referred
to in Section 2.3.
The second easy change would be to add three “negative outputs” that might have been available in the Audit
Commission data-base:
• y(4) = number of violent crimes in 1990 not cleared up by the end of 1991
• y(5) = number of burglaries in 1990 not cleared up by the end of 1991
• y(6) = number of “other” crimes in 1990 not cleared up by the end of 1991.
30
A third change involves the important issue of the quality of the data records: it would be not to count crimes that
are solved simply by being “taken into consideration” with some other cleared-up offence. Since 1993, the Audit
Commission has done just this, and counts only crimes “detected by primary means”, for which the police have had
to carry out a crime-solving investigation (cf. the wondrous Nottingham detection rate of the 1960s).
Then, assigning a value of v1 = 100 to the clear-up of a single violent crime, the following intervals for v2 , ..., v6 in
veff might be agreeable by a panel of “the great and the good” authorised to do so by a benevolent dictatorship:
6
2
−20
−2
−1
≤
≤
≤
≤
≤
v2
v3
v4
v5
v6
≤
≤
≤
≤
≤
10
3
−10
−1
−1/2
These inequalities would be roughly consistent with Thanassoulis’s bold weight-restraints, but they go further in
putting a negative value on crimes that were recorded in the previous year but not cleared-up by the end of the
assessment year. The final calculations would rely on the linearity (and therefore monotonicity) of veff as a function
of each of v2 , ..., v6 . For each of the 32 choices of end-points of the 5 intervals, one would calculate veff for each police
force and the corresponding percentage efficiencies (100× veff/max{veff}).
No attempt will be made to refine the discussion of this simple example: it serves merely to introduce the general
specification of TTW (Mark One!).
4.3. GENERAL SPECIFICATION OF TTW.
At the general level, there are four distinct stages:
(i) problem description, and collation & screening of relevant data;
(ii) definition/specification of input cost & outputs;
(iii) valuation of outputs, and calculation of undoctored veffs;
(iv) stratification and/or adjustment of veffs for environmentals.
For (i), little can be said at the general level, except to parrot the call for “Best practice!” (now widely heard in the
land).
For (ii), TTW is intended for problems in which the reduction of all costs to a single bottom-line figure (the input
x) is a relatively straightforward operation when the accuracy aimed at is realistically set, to within 5% say. If there
is unresolvable argument about whether any substantial element should be included, it may be best to allow two
efficiency measures to be developed—one for its inclusion and one for its exclusion. For outputs, however, TTW is
intentionally designed for problems in which there is, typically, a large number of contending outputs—all required
to appear explicitly in the openly published formula that generates the efficiencies. In its criticism & rejection of
the TTW approach (see Section 4.6), Spottiswoode (2000) ignores one crucial advantage of TTW—that it does not
break down when the number of outputs is increased (to avoid the problems, already alluded to, that are associated
with excessive aggregation of outputs that call for separate consideration and appearance in the efficiency formula
eg. successful or unsuccessful, but all very costly, prosecutions for fraud).
For (iii), I have already suggested that TTW should use undoctored veffs externally adjusted for environmentals if
necessary (as if each were the yield of an agricultural plot adjusted for weather and soil fertility, or the outcome
of a clinical trial adjusted for the case history of patients). The output variables y(1), ..., y(s) in the numerator
v1 y(1) + ... + vs y(s) of veff should be such that, by choice of the value-weights v= v1 , ..., vs , the numerator represents,
for each unit, the total value of the bulk of the activity of the unit. The outputs should be jointly comprehensive so
that the bulk of the activity is covered, and volumetric so that the coefficient vj is then the value per unit volume
of the jth output. In many problems, it should be possible to define outputs that are disjoint (eg. the different
categories of crime for 43PFs) to facilitate the assignment of value-weights. Only the ratios of the value-weights need
to be specified, which may be helpful in problems where consensus about v allows only intervals to be agreed. For
31
this, TTW might take a pivotal output whose value-weight might be set equal to unity or, more conveniently, 100.
An interval should then be established for each of the other value-weights, within which all parties can locate their
preferences. This would give an (s − 1)-dimensional box for the latter, whose 2s−1 vertices would deliver a set of
undoctored veffs for the n units. For (iv), there are difficult but not, it is to be hoped, insurmountable technical
problems. Their analysis calls for a Section of their own.
4.4. ALLOWANCE FOR ENVIRONMENTALS.
There is just one idea that, in effect declares “No problem!”. That is the idea that input costs have been predetermined
for each unit, at levels that take proper account of environmentals, and in the sense that veff is then an acceptable,
already adjusted, measure of efficiency. The validity of the idea could only be assessed by close & rather subjective
inspection of the method of predetermination—and, even if it were possible, its justification would depend on the
particularities of the problem. However, it is important to recognise that such questionable justification could
be maintained even if the veff values were found to be statistically significantly related to environmentals—the
justification would be on the untestable grounds that the unknown efficiencies of the units must be themselves be
correlated with environmentals! [For 43PFs, the claim has been made that the Police Funding Formula (PFF), that determines the
input cost of police forces, is a realization of the “No problem!” idea. The question is whether the PFF, in its determination of the police
force input x (see Section 2.3), effectively does the job of allowing for environmentals—in the sense that, once they have been locked into
x, we can throw away the key and forget about them. One may have the intuition that such a technical achievement would require a
knowledge & understanding of the 43PFs problem that goes well beyond what is currently available. Appendix 2 looks in detail at the
PFF and finds good reason to dismiss the idea. The easiest technique to think about is TTW’s veff. For this, there is no way of logically
refuting the claim that, once a police force has been given its PFF-determined x, it should be able to attain, by routine activity, the same
value of veff as any other police force of equal efficiency whatever its environmentals. (This claim would be making a heavy assumption
about the relationship of the value-weights v in veff to the activity requirements of different outputs.) The only way of weighing the
claim is to look at the detailed formulation of the PFF (which we do in Appendix 2).]
The possibility of confounding of efficiencies with environmentals is a realistic one, and has to be taken into account
in any method of allowance for environmentals (see Section 3.5). To the extent that it is possible, take for a mental
picture the 1 + s + t dimensional space of input, outputs and environmentals. It is instructive in considering the logic
of allowance for environmentals to imagine that there is no shortage of units—that n is effectively infinite. It would
then be possible to use the environmentals z to define thin strata, and look at things firstly within individual strata
and then between strata. Within a stratum, there is no question of correlation between efficiency and environmentals
since the latter are essentially constant, and any measure of efficiency will be determined by the variations of (x, y)
for the fixed z. Between strata, the variation of the stratum means of (x, y) or its logarithmic transforms must
reflect either a necessary “technical” dependence on environmentals at constant efficiency level, or some relationship
between efficiency and environmentals. Such insight may clarify logic, but it does not resolve the problem of how to
do the calculations when, as in 43PFs, we have only 43 units spread over at least 21 dimensions (1+10+10 = 21 is
a barely realistic minimum). What will help is committment to the TTW principle of taking veff as an undoctored
measure and only then bringing in the environmentals: for 43PFs the dimensionality might then go down to 10,
which is still a large enough number. The realism that may then be attainable will depend strongly on the particular
problem. There are two general ideas that may help out: unit subdivision and cross-validation. Farrell’s units were
states of the U.S.A., for which the data may have been available or obtainable at the much smaller county level. The
units for 43PFs occupy, on average, 1/43rd the area of England & Wales, and the data must have been recorded at
the level of the smaller police force divisions or the even smaller basic command units. Variations within units may
be more informative about the technical dependence of performance on environmentals than on efficiency, given that
efficiency may be fairly homogeneous in the centrally organized body that is a police force. In any case, the extra
data would be relevant in some way. When it comes to fitting the equations, the technique of cross-validatory choice
(Stone, 1974) of an allowance formula should be considered as a model-free device for controlling over-adjustment.
Posterior adjustment of veff by z makes statistical sense because of the intrinsic character of veff: each unit’s veff
has its own z to be considered as influential for that very unit alone. The same cannot be said for teff, whose heavily
interactive character allows each unit’s teff to be, perhaps strongly, influenced by the environmentals of other units.
For DEA, there are additional grounds for questioning any affirmative answer to our initial question. They concern
the nature of what is being claimed. For CRS-based teff, for example, it would have to be maintained that, if they
are equally efficient, police forces with the same output profile (i.e. the ratios of the y(j) to each other) would be
32
able to attain the same value of teff. Would that be realistic?
4.5. NEGOTIABLE VALUE-WEIGHTS.
A crucial question remains to be explored: how are the value-weights v to be arrived at? A benevolent dictatorship
will want to retain the goodwill of the individuals who work in the n units being assessed, even though its main
concern is to influence the units to maximise the societal value of their outputs. We are, after all, concerned with
non-profit-making organizations that depend greatly on the organizational skills of managers responsive to pressures
and demands beyond the ken of any supervisory quango representing the interests of society at large.
We are faced with a dilemma. On the one hand, it would be unreasonable for purely societal value-weights v to be
imposed that took little or no account of the relative internal costs of generating different outputs. On the other
hand, weights that simply reflected best estimates (in some sense) of the existing internal costs would be failing to
give any incentive for units to meet the “market pressures” represented by the supposedly widely agreed societal
values of different outputs.
What is needed is some negotiable, even experimental, compromise between these two extremes. Appendix 3 presents
a possible way in which this compromise might be reached: it offers a one-parameter family of value-weights in which
the parameter allows for a negotiable “fine-tuning”. The proposal does not override the necessity of environmental
allowance along the lines of Section 4.4.
4.6. SOME OBJECTIONS TO TTW.
At a critical juncture in its tussle with the 43PFs problem, the report of the Treasury’s Public Services Productivity
Panel (Spottiswoode, 2000) maintains that there are two key objections to using a “simple” or “very simple” efficiency
index:
“First, it allows for no variation in the weight that could be assigned to each outcome for each authority and its force. But
the relative importance of each outcome could legitimately vary from force to force, reflecting local circumstances and local
police plans. An approach that lets the outcome weights vary is preferable.”
This objection clearly relates to the priorities/environmentals question (Section 2.16) and will not be further discussed
here: readers will no doubt make their own assessment of the balance of advantage (if any) and disadvantage (if any)
in allowing locally influenceable weights—a very different matter from non-local allowance for locally influential but
uncontrollable environmentals. Quoting again:
“Second, such an index would implicitly assume a ’linear’ or straight-line relationship between inputs and outcomes; that
is, if a force doubled the inputs it would get double the outcomes. This is unlikely to hold in practice. An approach that
allowed for non-linear relationships between inputs and outcomes would be preferable.”
It is quite right to say that veff, having the form of aggregated value per unit cost, would remain constant if its single
input cost and its output volumes were doubled. But that is far from constituting an assumption about what might
happen if a police force did such-and-such. If “in practice”, it were found that veff decreased with x (representing
the “size” of the force), there are two options open to TTW: either to accept the finding as indication that larger
forces are more inefficient than smaller ones and perhaps go on to suggest the subdivision of the organizational
structure of large forces or to make an allowance for size treated as an environmental variable over which the forces
have no control (see Section 4.4). Compared with this above-board option, can it really be preferable to build some
econometrically motivated, non-linear relationship of questionable technicality into the method (DEA or SFR!) that
then automatically calculates some “efficiency” measure?
4.7. NEGLECTED ADVICE.
Annex D of Spottiswoode (2000), entitled “Summary of advice on DEA & SFA from economics and econometric
specialists”, has four paragraphs on what it got from the National Economic Research Associates. NERA sees DEA
& “regression” as useful initial “top down” analyses and accepts the idea of a “cross-check” between the two (noting
a particular difficulty in the interaction between regression and the Police Funding Formula). If this approach is to
be followed, it suggests using DEA for exploring the database but then making a “bottom up” analysis involving
detailed assessments of costs & activities— “a huge undertaking”:
33
“If agreement can be reached on specific weights for all outputs then a more straightforward form of ratio analysis than
DEA may be feasible. However, any such analysis should make allowance for variations in operating circumstances.”
A less demanding approach would be to examine the alternative of incentive mechanisms based on a clear definition
of outputs with appropriate weights and, feasibly, on a
“mechanism that would avoid the need to undertake the relative efficiency assessment, perhaps by relating senior
management pay to increases in a weighted output index over time.”
The Spottiswoode report does not justify its inadequate response to this annexed advice. [The four paragraphs here
summarized were a straight transcript from NERA (1999), with only changes in authorial voice eg. from “we” to “NERA”. The rest of
the NERA report can be recommended for its good sense about the problem here faced. It reveals that the brief given to NERA by the
Treasury Panel was narrowly focused on DEA & SFR, but the good sense referred to was not restricted to the consideration of these two
related techniques. ]
5. TWO APPLICATIONS OF DEA, SFR, & TTW.
5.1. THE 21 SPANISH HIGH COURTS.
The study of Pedraja-Chaparro & Salinas-Jimenez (1996, P-C/S-J for short) follows closely that of Kittelsen &
Førsund (1992), who applied DEA to the 107 distict courts of Norway with two inputs (the numbers of judges &
office staff) and seven outpts (the numbers of cases in categories that cover, and do not merely represent or act as
proxies for the whole activity of the courts). The Norwegian study does not give the data for re-analysis, but two
revealing quotations from it are worth reproducing here, before we move to Spain for published data. Justifying the
use of DEA as a second-best option:
“ Faced with detailed, if not comprehensive, data on the quantities of services produced but no information on the relative
values of the different services, the problem is to find methods that can utilize the information available in the data but
does not demand more information.”
Justifying the reduction to seven categories of an initial list of 19 categories:
“ One of the characteristics of the DEA method is that as the dimensionality of the problem is increased the number of
efficient units increases as well. Use of the 19 product subdivisions . . . would generate fully efficient scores for almost all the
courts. Apart from being an unreasonable result, such an analysis would hardly give any interesting information.”
The 21 Spanish High Courts studied by P-C/S-J were those in the Administrative Litigation Division, engaged in
relatively homogeneous cases. For their main study, P-C/S-J used two inputs & two outputs but no environmentals.
Here we replace the two inputs (staff numbers) by their total financial cost x:
Inputs
x = total cost of x(1)
judges & x(2) office staff
Outputs
y(1) = no. of cases resolved by full process
y(2) = no. of other cases
P-C/S-J had some initial reservations about their analysis:
“The selection of the outputs may be criticized for not taking into account two major points: first, the heterogeneity in
each of the two outputs, which might explain the different efficiency scores . . . second, the important consequences for the
DEA results of an incorrect model specification (selection of variables).”
In defense, they plead both non-availability of more detailed data and, referring to Nunamaker’s Theorem, the fact
that, with only 21 units, efficiency scores would be very sensitive to any increase in the number of variables. A
Spanish bull of a dilemma with two fine horns!
34
Figure 9: The CRS-motivated reduced-data scattergram of 21 Spanish high courts.
Plots of p(1) = y(1)/x and p(2) = y(2)/x against x show no evidence of size dependence. Fig.9 shows that the CRS
frontier in the (p(1) , p(2)) scattergram is determined by courts 17 & 19, and that these two courts are the so-called
100% efficient peers for the other 19 courts.
Table 1 gives the results of the calculation of eight different percentage “efficiencies”:
• teffa = P-C/S-J’s CRS teff based on x(1) & x(2),
• teff = the CRS teff based on x,
• Uh = U based on (13) with Cobb-Douglas f (y), v normal, & u half-normal,
• Ue = ditto but with u exponential,
• %veff = 100 × veff/(max{veff} in D) for four choices of value-weight ratio.
Note that teff ≤ teffa, uniformly, as Nunamaker’s theorem dictates. That Uh & Ue can differ appreciably reflects
the sensitivity of SFR to distributional assumptions. If the disagreement between the DEA and SFR measures here
35
Court
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
y(1)
281
897
3699
582
617
617
1343
679
2889
9634
675
1138
1498
757
1164
821
663
885
2091
2249
332
y(2)
77
440
2692
171
152
207
458
88
1204
4674
453
357
443
514
333
390
1256
901
370
1994
1741
x(1)
4
7
28
5
5
3
14
4
21
54
5
8
10
5
5
4
4
6
6
9
21
x(2)
9
19
75
10
10
10
21
10
43
143
10
15
19
10
13
9
10
11
21
34
46
x
40603
79661
367693
48929
50548
40380
132916
45292
225875
662387
50341
78030
100215
48928
54682
40615
45668
57495
79655
123828
227047
teffa
32
52
57
60
63
65
67
68
72
74
77
79
82
87
92
99
100
100
100
100
100
teff
28
49
48
48
48
63
41
57
54
63
63
59
60
73
85
88
100
82
100
92
80
Uh
47
69
65
70
71
81
63
78
74
80
77
79
80
83
89
89
82
83
92
87
88
Ue
51
78
74
79
80
87
72
86
83
87
85
86
87
88
92
92
88
88
94
91
91
8/1
27
45
41
46
47
59
39
57
50
57
54
56
58
63
82
80
67
65
100
75
75
4/1
27
46
43
47
47
60
40
56
52
59
57
57
59
66
83
82
78
70
100
81
76
2/1
28
49
48
48
48
62
41
56
54
63
63
59
60
73
85
88
99
81
100
92
80
1/1
21
40
41
37
36
49
32
40
43
51
53
46
46
62
65
71
100
74
74
82
64
Table 1: Data for y(1), y(2), x(1), x(2) & x from P-C/S-J’s Tables 5 & 8, and eight percentage “efficiencies”: two for
DEA, two for SFR, and four for TTW.
were viewed as relatively unimportant in the overall comparison of forces, consistency does not mean validity: the
two sets of measures have other crucial assumptions in common. That both Uh & Ue exhibit appreciable shrinkage
compared with teff (smaller teff’s are raised, larger teff’s are lowered) reflects the moderating influence of SFR’s
statistical approach in its difficult judgement between what is random error and what is inefficiency, in the residual.
The %veffs for v1 /v2 = 2/1 (giving “full process” cases twice the value of cases that do not go through the whole
legal process) are very close to the teffs—which is not surprising given that the frontier line between units 17 & 19
has the equation
1.95p(1)+p(2) = constant.
It may be noted that although veff is monotone in each vi , %veff is not monotone in the ratio v1 /v2 . The changes
in %veff as v1 /v2 is varied reflect both the interpretability of this measure and its to-be-expected sensitivity to such
variation.
5.2. THE 43 POLICE FORCES OF ENGLAND & WALES.
The case for efficiency assessment of the 43 police forces of England & Wales is has been put by Spottiswoode (2000)
as follows:
“There is a plethora of indicators and information about police outputs and outcomes. But, to date, it has not been possible to draw this
information together to build a comprehensive or systematic measure of relative police efficiency in meeting their ultimate objectives of
promoting safety and reducing crime, disorder and the fear of crime.”
The Audit Commission (1999) has been rather more explicit:
“Police response to emergencies is improving, and the proportion of crimes detected is increasing. However, some of these improvements,
such as the increases in detection rates, are principally the result of falling levels of crime—the number of crimes being solved is not
generally increasing. . . . the public will welcome the fact that the chance of any particular crime being cleared up has increased.
However, people may question why increases in spending have not resulted in the absolute number of crimes being cleared up.. . . there
36
are significant variations in performance between police forces. These variations cannot simply be explained by differences in workload
or in the circumstances forces face. While overall those forces recording the highest levels of crime also have the lowest detection rates,
a few forces with the highest recorded rates of burglaries and violent crimes have among the highest detection rates. And the clear up
rate for burglaries in the best performing metropolitan force is almost twice that for the worst.”
The Spottiswoode report is torn between (i) recognition of the multi-faceted character of the outputs (which we take
to include “outcomes”) needed to “reflect the key outcomes that the police are expected to achieve” and (ii) the
inability of DEA to cope with more than a limited number of inputs & outputs:
“In the case of the police in England and Wales, the number of forces (43) is relatively small, and discipline is required on the number of
input and outcome variables to be used. This technical constraint would be overcome if efficiency measurement is undertaken at Basic
Command Unit (BCU) level or by using time-series data. However, even if BCU level data is used, a limited number of variables should
be preferred, as this would facilitate the task of establishing and tracking relationships between variables, and limit the scope for data
and measurement error.”
The report is optimistic that a balance can be struck, with about eight outputs (“best value performance indicators”
is the favoured term). It is
“. . . critical that the selected outcome measures capture the essence of police outcomes and thus, implicitly, the many dimensions to
policing. This does not mean that there has to be a multitude of outcome measures. The focus of the outcome measures should be on
what the police are being expected to achieve for the money they have. This is different from trying to model everything that forces do
on a day-to-day basis.”
Are these the words of good sense freely arrived at—or of rationalization forced by an unwisely favoured technique?
What does it mean to “capture the essence”?
For this study, it has not yet been possible to carry out a realistic analysis of 43PFs that starts, as it must, with a
carefully selected, comprehensive set of volumetric output variables and relevant environmentals. [There is no problem
with the cost input x: H.M.Treasury’s concern with public expenditure has ensured that.] So this section will be used to prepare
the ground for such analysis by looking at two papers that—ignoring the interesting work of Thanassoulis (1995)
on which we have commented in Sections 2.15 & 4.2—claim to be “the first to examine the relative efficiency of
the English & Welsh police forces”. Unlike the Spanish courts study, the papers do not include the basic data (212
records & 1505 numbers): so we are unable reduce the data to a single input database for re-analysis. However,
there are features of the authors’ increasingly arcane analysis that will serve to test our obvious prejudices.
For both studies, Drake & Simper (1999, 2000) have taken 5 years’ data with 4 inputs, but only 3 outputs and no
environmentals:
Inputs
employment costs
premises-related expenses
transport-related costs
capital & other costs.
Outputs
clear-up rate
no. of traffic offences dealt with
no. of breathalyser tests
5.2.1. THE FIRST STUDY
The first paper (Drake & Simper, 2000) does not state whether “clear-up rate” is a proportion or the volumetric
number of clear-ups. The former would tend to bias the findings against units with large values of the inputs, which
are all volumetric. The paper also does not state whether “Taken Into Consideration” (TIC) cases are included,
a notorious source of inflatable statistics: the highly critical views of Walker (1992) about the use of clear-ups are
quoted and disregarded. The other two outputs might also be questioned on the grounds that they are easily &
cheaply manipulable by police forces so that their use may give a distorted efficiency picture. Moreover the small
number of outputs would leave many forces with the sense that their full range of activities was not being given
due weight in any derived efficiency measure. Neither these questions, nor those associated with environmentals, are
dealt with, either before or after the straightforward calculation of
37
• “overall technical efficiency” OE (CRS-based teff) = AQ/AR in Fig.4,
• “pure technical efficiency” P T E (VRS-based teff) = AS/AR in Fig.4,
• “scale efficiency” SE = OE/P T E = AQ/AS in Fig.4.
These three quantities, described as highly informative, are the basis of some suggestions for enhancing the efficiency
of English & Welsh policing. It is claimed that they “may enable us to shed some light on the optimal size and
structure of police forces”. Despite the modesty of tone, this is a bold claim. What is it in the results obtained from
such limited data, by a method with acknowledged defects, that can justify such a claim?
The authors focus comment on percentages that constitute a 2×2 table of 5-year average efficiencies, with a derived
third column:
Police force
Surrey
“The Met”
.
CRS(OE)
62%
58%
VRS(P T E)
69%
100%
CRS/VRS(SE)
89%
58%
Surrey’s CRS of 62%—the lowest 5-year average among the 35 “non-metropolitan” forces (excluding City, “The Met”,
and six other forces)—suggests “failure to utilise resources effectively”. But Surrey’s SE of 89% suggests that it is
“not too far removed from the constant returns region of operation” (a remark that would merit further analysis).
This contrasts with the performance of “The Met”. Its CRS of 58% is lower than that of Surrey and the lowest of
all 43 forces, but it seems “it would be inappropriate to label the Metropolitan as a highly inefficient police force”:
the VRS of 100% (necessarily 100% in each of the five years) suggest that “given the scale of the Metropolitan’s
operations, it is a highly efficient police force with no obvious inefficiencies in resource utilisation” (the latter phrase
may be thought to point misleadingly to allocative efficiency). But the Metropolitan Commissioner of Police (the
Met’s chief constable) must read on before congratulating his force, for the paper will tell him that his SE of 58%
confirms (really an extant implication of the VRS value of 100%) that
“all of the observed overall inefficiency is associated with scale effects. Given that the Metropolitan is the largest force in the
country, this result strongly suggests that there are diseconomies of scale at work in respect of large police force operations.
As in other large organisations, this is probably attributable to the extra bureaucracy and layers of management structure
which tend to accompany large scale.”
In this welter of econometrically motivated comment, it is salutary to note that the Met would get a VRS(P T E)
of 100% whatever it did, as long as it (by far the largest force in the country) managed to turn in the largest score
on any one of the three outputs of this study. It may also be noted that the inference about overall inefficiency
being attributable to diseconomies of scale, in the passage just quoted, is from a sample of size one: the VRS values
of all but one of the other six large “metropolitan” forces are all less than the average of the 35 generally smaller
non-metropolitan forces. [The exception is Greater Manchester which looks as if it has, like the Met, hit the jackpot with a VRS
value of 100% in all five years—did it repeatedly hit the maximum for one particular output?].
The paper then uncovers another source of speculation—one that Section 2.7 has already noted. The results show
“clear evidence of an inverted U shaped relationship in respect of scale efficiency [CRS/VRS].” It is an instructive
exercise to see, with Fig.4 as a guide, that this feature is almost inevitably present whatever the data—as a necessary
consequence of the construction of nested & contiguous convex bodies of feasible performances for CRS & VRS. So
we need not read much into the comment on the second feature, that
“[although it is] a very common finding in economic studies of industrial production, it is a particularly interesting result
to find that the same economic production relationship appears to hold good in public sector services such as policing.”
The rest of the paper is of interest only for its use of unnecessarily advanced statistical techniques to analyse the 212
(5×43 - 3) force/year combinations (3 were unavailable) as if they were from 212 independent police forces.
38
5.2.2. THE SECOND STUDY
Spottiswoode (2000) dedicates the last two paragraphs of its Technical Annex C to the second study (Drake & Simper,
1999), describing it as “more encouraging” than other studies to the expressed hope that, “properly specified”, DEA
& SFA “should produce broadly similar results”. The stress here on consistency of the two techniques may be
diverting attention from the more important question of what it is they actually do with the data. There is not
the space here to give a detailed critique of the study, which would, in any case, be better done by a first-rank
econometrician. One feature does however strike this statistician as rather odd.
The method (an adaptation of the SFR of Section 3.3 to accommodate multiple inputs and to take account of yearto-year changes) requires “prices” w1 , . . ., w4 and redefined inputs so that the “inputs” listed in Section 5.2 are the
“cost shares” (for those prices) of the redefined inputs. The latter are taken to be labour & (3 times repeated) total
population in the police force area i.e. only two different inputs. The method then adopts & adapts some standard
econometric cost function theory to posit an input-allocatively-optimal “trans-log” total cost function of the 8
variables y(1), y(2), y(3), w1 , w2 , w3 , w4 and year, which any observed total cost (the sum of the original “inputs”) is
then taken to exceed by a unit- & year-dependent inefficiency factor. The method makes no distributional assumptions
about the make-up of this factor, as is done in fully stochastic SFR. Because of the (unjustified?) multiplicity of
prices, there are as many as 37 parameters in the fitting of this function to the 212 observations of total (yearly) cost
from the 43 police forces.
If the problem of multiple inputs were by-passed by reducing them to a single cost, there remains the significant
problem that the PFF funding formula (Appendix 2) determines a large proportion of that cost as a function of both
outputs & environmentals. How could the manifest interaction of this determination with the empirical fitting of a
cost function be resolved?
ACKNOWLEDGEMENTS
The following individuals and organizations have helped either knowingly or unwittingly:
• Dr Juanita Roche (ex HM Treasury) for initial impetus
• R.A.S. for continuing motivation
• Hillingdon Council for free travel
• staff of the Home Office RDS Economics & Resource Analysis Unit for healthy argument, computational
assistance & generous access to literature
• Ina Dau & Richard Chandler for divine help with emacs & LATEX.
39
APPENDIX 1
AN ILLUSTRATIVE SFR MODEL FOR SINGLE INPUT & OUTPUT.
Consider the model, SFRM say, in which data for r = s = 1 are generated, independently for each unit, as follows:
For 0 ≤ a < 1, 0 < b ≤ 1, c > 0, unobserved “true” input, X, and output, Y = X/c, are associated with the generic unit
g = (x, y), where x & y are independently randomly related to X & Y by x = X/U and y = Y V , where U is randomly
uniformly distributed in the interval (a, 1), and V is randomly uniformly distributed, with expectation unity, in (1 − b, 1 + b).
This model is not proposed as a serious competitor for the attention of stochastic frontier analysts: it serves only
to reveal problems that arise even in grossly simplified models—problems that might be obscured, beyond ease of
understanding, by the necessary technicality of apparently more realistic models. No attempt will be made here to
explore these problems beyond the point at which a reader’s curiosity might be satisfied.
The parameters of the model can be taken to be a, b, c, Y1 , ..., Yn (equivalently X1 , ..., Xn ). Their number, n + 3,
increases with n, so that consistent & unbiased estimation cannot be expected as a matter of course. The data D has
non-zero likelihood only when the parameters satisfy the inequalities
axi /c < Yi < xi /c & yi /(1 + b) < Yi < yi /(1 − b), i = 1, ..., n
(16)
in which case the likelihood L is given (up to proportionality) by the nth power of d = c/[b(1 − a)]. Maximization of
d subject to (16) gives unique maximum likelihood estimates (MLEs) of the parameters a, b & c However there will,
in general, be an interval of MLEs for each of the parameters Y1 , ..., Yn (or X1 , ..., Xn ). The latter include X̂ for
the generic unit, for which seff then has an interval of values Û = X̂/x as its non-uniquely determined MLE. The
feature of non-uniqueness appears even in the case where a & b can be (unrealistically) fixed by superior knowledge.
For the simplest case of a = 0 and b = 1, we maximise c subject to yi /2 < Yi < xi /c, i = 1, ..., n, for some Y. The
maximum is ĉ = 2xm /ym where xm /ym is the minimum of the ratio x/y in D (for the unit with index m) and ĉ is
then the MLE of c. The associated MLE of Ym is ym /2 and that of Xm is then ĉym /2 = xm . But, for i not equal to
m, there is an interval (yi /2, xi /ĉ) of values of Yi that maximize the likelihood. Unit m is the one with the largest
y/x and the one with the largest teff of 100% (see Section 2.4). Compatibly, for this unit, Û = 1 (100%) also. For i
not equal to m, however, Û has an interval of values ranging from the value of teff for unit i up to 100%. Inspection
of Fig.10 clarifies the picture. In Fig.10, the estimated “true” efficiency frontier is the ray through the origin, ORW:
this line has twice the slope of the line OQ through unit m. The non-unique MLE of the “true” input X & output
Y of g is the sub-interval RW. The non-unique MLE of the efficiency U increases from the value teff at R to 100% at
W. For unit m, this interval shrinks to a single point at which the efficiency U is 100%. The unit g can be regarded
as generated either from X = x × teff, Y = y/2 at R with an efficiency of teff and a doubling of Y (i.e. v = 2) or
from X = x, Ŷ = x/ĉ at W with an efficiency of 100% and a v of y/Ŷ = 2×teff. (The latter can be seen in the
trigonometry of Fig.10: teff = (xm /ym )/(x/y) = P Q/P S = P R/P T = OP/(OP + T W ) by “similar triangles”.)
The non-uniqueness is an artefact of the use of uniform distributions in SFRM. However, an analogous near-flatness
(in key directions) of the summit of the likelihood function will be a necessary consequence of the general structure
of any model with two analogously competing components of random variation U & V , and n parameters Y1 , ..., Yn
about which no assumptions are made.
40
Figure 10: The non-uniqueness of the maximum likelihood estimation of the SFR efficiency, seff, for generic unit
g = (x, y) using a specialization of model SFRM.
41
Figure 11: Police force areas of England & Wales
APPENDIX 2
THE POLICE FUNDING FORMULA (PFF)
How the English & Welsh pay for their 43 police forces would merit a Palme d’Or for complexity—in a country that
is far from backward in finding complex ways of achieving its administrative objectives. The funding mechanism can
only be appreciated when no attempt is made to ascertain the logic of having several channels of funding. There are
four major players in the distribution game:
• Councils and Police Authorities (PAs)
• Department for the Environment, Transport & the Regions (DETR)
• Home Office
• H.M.Treasury.
42
A useful Association of Police Authorities document (APA, 1999) warns its readers that the arrangements are
“complex” (twice) and that the funding involves a “complicated formula” (four times). The purport of these warnings
may be to congratulate clever mandarins rather than encourage readers to think that such complexity may be
unnecessary.
For each spending year, the Treasury decides on the “Total Standard Spending” (T SS) that central government
will “support”. This sum includes a “Council Tax for Standard Spending” (CT SS) slice (15% of T SS) that is a
notional aggregate contribution from the “rates”. It also includes a 15% slice from the business rates (N N DR by
devious acronym). The Home Office weighs in with its 50% contribution—the “Specific Police Grant” (SP G). The
remaining 20% is made up by the “Revenue Support Grant” (RSG) channeled through the offices of DETR. The
balance sheet here is simply:
T SS = SP G + (CT SS + N N DR + RSG) = SP G + SSA
(17)
where SSA is the thereby defined “Standard Spending Assessment”.
How is T SS divided among the 43 police authorities? CT SS is allocated strictly by the numbers of households in
Council Tax property bands—N N DR strictly by the number of residents in the police force area. The Home Office
uses its PFF formula to divide its SP G, and gives DETR mandarins the task of allocating RSG so that SSA is also
effectively divided by the same formula.
[The pecking order appears to be: HMT−→ HO−→ DETR−→ PAs←→ Councils, in which only the Home Office and Police Authorities
have significant knowledge of, and responsibility for, what police do: the Home Office has direct responsibility for the usage of £1.8B
by “The Met”. The foreword by the Chief Secretary to the Treasury in Spottiswoode (2000) makes clear HMT’s determination to get
involved in the nuts & bolts of police machinery. Where is the oversight of HMT’s competence to do this? Among the nuggets in the
government paper “Adding It Up” (Cabinet Office, 2000) is the advice that “to perform their challenge role effectively central departments
should undertake a review of their analytical capability”.]
THE FORMULA.
Three documents—APA(2000) & Home Office(2000a, 2000b)—give a good overall view of the make-up of this Home
Office formula. But its final form, although stated with great precision, could have been designed to leave most
interested parties in the dark.
Applied to any particular funding sum, the formula first puts it into 10 “pots” (components) in portions determined by either necessity
or by need (assessed by a broadly-based “activity analysis” for the country as a whole):
1. 29.8% for crime management
2. 8.6% for call management
3. 7.5% for public order management & public reassurance
4. 7.0% for traffic management
5. 2.6% for community policing
6. 17.5% for patrol
7. 14.5% for pensions
8. 10.0% for police establishment
9. 2.0% for additional security
10. 0.5% for sparsity.
The bulk of the 26.5% in pots 7-9 is distributed by “necessity” with no environmental input, but environmentals are deeply involved in
the share-out of the other pots to the 43 forces. The APA(1999) document puts an informative gloss on the statistical mechanics of this
share-out:
“The relative workload demands on individual police forces are . . . estimated using a range of indicators so that these
individual pots of funding can be distributed to each police authority in proportion to these demands. . . . Information from
a number of forces was pooled to produce a model of police workload based on the characteristics of force areas. These
are mainly socio-demographic factors—for example, looking at all forces across the country, the number of incidents [the
workload measure for the pot 1] is found to be statistically associated with density of population, long term unemployment
and other population factors. Therefore, for any particular force, the predicted workload can be estimated on the basis of
these characteristics or indicators, which then determines the distribution of funding under each of the components.”
43
The story from the statistical coalface for pot 1 (Home Office, 2000a) involves (i) an unspecified list of variables that ’might have an
impact on the number of incidents’, (ii) principal components analysis, (iii) elimination of variables with either low correlation with the
number of incidents or high correlation with others, (iv) multiple linear regression of the number of incidents on the selected variables,
and (v) use of the fitted regression to divide the pot. More a minefield than a coalface, perhaps, but—justified “to avoid the introduction
of perverse incentives by relying directly on workload data collected by police forces”—the regression technique also fills in missing data
and allows dubious data to be rejected without excluding any force from a share of a pot.
It is tempting to go on describing the details of PFF if only to point out that its precise but extended specification in the Home Office
(2000b) document, accurate to the pound, cannot be much help to Chief Constables wanting to know how their own environmentals
influence the share they get (a case for ONS oversight?) However, readers may already have concluded that, as constituted, PFF does
not realize the “No problem!” aspiration of Section 4.4.
APPENDIX 3
NEGOTIABLE VALUE-WEIGHTS
Stage 1: Establish societal weights sj , j = 1, ..., s for the s outputs.
Stage 2: Obtain from each unit, its own best estimate of how its input cost, generically x, should be notionally
divided as
x = x[1] + ... + x[s]
to represent the internal costs of generating the output volumes y(1), ..., y(s) respectively.
[The estimation may not be easy when the same internal costable activity serves to generate more than one output.]
Stage 3: Calculate cost weights cj , j = 1, ..., s, as the medians, or more generally as trimmed means (Mosteller &
Tukey, 1977), of the n ratios yi (j)/xi [j], i = 1, ..., n.
Stage 4: Calculate negotiable value-weights
vj (N ) = sj + N cj , j = 1, ..., s,
where N is a negotiable positive number, for a range of values of N , together with the associated efficiences adjusted
for environmentals.
Stage 5: Negotiate the value of N with police authorities & chief constables.
44
APPENDIX 4
HISTORICAL NOTE
We read in Cooper et al (2000) that:
“No empirical applications [of the ideas of Koopmans (1951)] were reported before the appearance of the 1957 article
by M.J.Farrell . . . This showed how these methods could be applied to data in order to arrive at relative efficiency
evaluations . . . in contrast with Koopmans and Pareto who conceptualized matters in terms of theoretically known efficient
responses without much (if any) attention to how inefficiency, or more precisely, technical inefficiency could be identified.
Koopmans, for instance, assumed that producers would respond optimally to prices which he referred to as efficiency prices.
. . . the identification of inefficiencies seems to have been first brought into view in [Debreu (1951)]. Even though their
models took the form of linear programming problems, both Debreu and Farrell formulated their models in the tradition
of ’activity analysis’. Little attention had been paid to computational implementation in the activity analysis literature.
Farrell therefore undertook a massive and onerous series of matrix inversions in his first efforts. The alternative of linear
programming algorithms was called to Farrell’s attention by A.J.Hoffman who served as commmentator in this same issue
of the Journal of the Royal Statistical Society. Indeed, the activity analysis approach had already been identified with linear
programming and reformulated and extended in [Charnes & Cooper, 1957]. See also Chapter IX in [Charnes & Cooper,
1961]. The modern version of DEA originated in two articles [Charnes, Cooper & Rhodes, 1978, 1981].”
On another page, Farrell’s 1957 paper is seen as defective because it failed to deal effectively with “slacks”:
“[Farrell] did not fully satisfy the conditions for Pareto-Koopmans efficiency but stopped short, instead, with . . . ’weak
efficiency’ (also called ’Farrell efficiency’) because non-zero slack, when present in any input or output, can be used to
effect additional improvements without worsening any other input or output. Farrell, we might note, was aware of this
shortcoming in his approach which he tried to treat by introducing new (unobserved) ’points at infinity’ but was unable to
give his concept implementable form. In any case, this was all accomplished by Charnes, Cooper & Rhodes [21 years later]
in a mathematical formulation . . . as ’CCR-efficiency’.”
These quotations from Cooper et al (2000) are suggesting that Farrell was trying, rather unsuccessfully, merely to
apply already well-formulated ideas, and that we should refrain from describing his 1957 paper as “seminal”. What
the quotations do not emphasize is that Farrell made “one small step” (Neil Armstrong) for econometrics with
the “necessary parallels” (Farrell) that he modestly drew with the Pareto-Koopmans-Debreu theory. That theory
was concerned with the overall efficiency of an “economic system” (Debreu) or a “managed enterprise” (Charnes
& Cooper, 1961), broken down into interacting “activities”(Koopmans) or “production units” (Debreu). Farrell’s
innovation was to see that the same technique that gave Debreu’s efficiency coefficient for the whole system could
be applied to the individual units. The issue of his computational competence in dealing with “slack” is a relatively
minor one: in any case, the paper of Farrell & Fieldhouse (1962) showed that Farrell had by then responded with
more than necessary competence to Hoffman’s 1957 advice about the help to be found in linear programming.
45
REFERENCES
Bracketed numbers refer to sections not pages.
Aigner,D., Lovell,C.A.K., & Schmidt,P. (1977), “Formulation and estimation of stochastic frontier production function models”, J.Econometrics, 6, 21-37, [3.1].
Allen,R., Athanassopoulos,A., Dyson,R.G., & Thanassoulis,E. (1997), “Weights restrictions and value judgements
in data envelopment analysis: evolution, development and future directions”, Annals of Operations Research, 73,
13-34, [2.15] .
APA (1999), Pounding the Beat: A guide to police finance in England and Wales, Association of Police Authorities,
London, [Appendix 2].
Arrow,K.J.,1951, Social Choice and Individual Values, Cowles Commission Monograph No.12, Wiley & Sons, New
York, [4.1].
Audit Commission (1999), Local Authority performance indicators 1997/98: Police and Fire Services, Audit Commission, London, [2.15, 4.2, 5.2].
Banker,R.D., Charnes,A., & Cooper,W.W. (1984), “Some models for estimating technical and scale inefficiencies in
Data Envelopment Analysis”, Management Science, 30, 1078-92, [2.6].
Bauer,P.W. (1990), “Recent developments in the econometric estimation of frontiers”, J.Econometrics, 46, 39-56,
[3.3].
Cabinet Office (2000), Adding It Up, www.cabinet-office.gov.uk/innovation, London, [Appendix 2].
Charnes,A. & Cooper,W.W. (1957), “On the theory and computation of delegation-type models: K-efficiency, functional efficiency and goals”, Proceedings of the Sixth International Meeting of the Institute of Management Science,
Pergamon Press, London, [Appendix 4].
Charnes,A. & Cooper,W.W. (1961), Management Models and Industrial Applications of Linear Programming, Wiley
& Sons, New York, [Appendix 4].
Charnes,A., Cooper,W.W., & Rhodes,E. (1978), “Measuring the efficiency of decision making units”, European
Journal of Operational Research, 2, 429-44, [2.12, 2.16, Appendix 4].
Charnes,A., Cooper,W.W., & Rhodes,E. (1981), “Evaluating programs and managerial efficiency: An application of
Data Envelopment Analysis to program follow through”, Management Science, 27, 668-97, [Appendix 4].
Charnes,A. & Cooper,W.W. (1985), “Preface to topics in Data Envelopment Analysis”, Annals of Operations Research, 2, 59-94, [2.18].
Cooper,W.W., Seiford,L.M., & Tone,K. (2000), Data Envelopment Analysis: A Comprehensive Text with Models,
Applications, and References, Kluwer, Boston, [1, 2.11, 2.12, 3.3, Appendix 4].
Cubbin,J. & Tzanidakis,G. (1998), “Regression versus data envelopment analysis for efficiency measurement: an
application to the England and Wales regulated water industry”, Utilities Policy, 7, 75-85, [3.3, 4.1].
Debreu,G.(1951), “The coefficient of resource utilization”, Econometrica, 19, 273-92, [Appendix 4].
DETR (1999), “Performance indicators for 2000/2001”, Section 11.1, Department of the Environment, Transport
and the Regions, London, www.local-regions.detr.gov.uk/bestvalue/bvindex.htm , [1].
Drake,L. & Simper,R. (1999), “X-efficiency and scale economies in policing: A comparative study using the distribution free approach and DEA”, Economic Research Paper No. 99/7, Department of Economics, Loughborough
University [3.6, 5.2, 5.2.2].
46
Drake,L. & Simper,R. (2000), “Productivity estimation and the size-efficiency relationship in English and Welsh
police forces: an application of Data Envelopment Analysis and Multiple Discriminant Analysis”, International
Review of Law & Economics, 20, 53-73, [2.6, 2.7, 5.2, 5.2.1].
Dyson,R.G., Thanassoulis,E. & Boussofiane,A.(1990),”Data Envelopment Analysis”, in Operational Research Tutorial
Papers, L.C.Hendry & R.Eglese (editors), pp.13-28, Operational Society, U.K., [2.16, 2.18].
Farrell,M.J.(1957), “The measurement of productive efficiency (with discussion)”. J.Roy.Statist.Soc.A, 120, 253-90,
[Abstract;2.1, 2.2, 2.4-2.6, 2.9, 2.10, 2.12, 2.15, 2.16, 2.18, 3.1, 4.4, Appendix 4].
Farrell,M.J. & Fieldhouse,M. (1962), “Estimating efficient production functions under increasing returns to scale”,
J.Royal Statistical Soc. A, 125, 252-67, [2.6, 2.7, Appendix 4].
Foucault, M., (1973), The Order of Things: An Archaeology of the Human Sciences, Vintage Books, New York.
Greene,W.H.(1980),”Maximum likelihood estimation of econometric frontier functions”, J.Econometrics, 13, 27-56,
[3.3].
Greene,W.H.(1995), LimDep Version 7.0 User’s Manual, Econometric Software, Inc., Castle Hill, [3.2].
Home Office (2000a), “Police Funding Formula 2000/2001, AFWG(00)1”, RDS Economics & Resource Analysis Unit,
[2.18, 4.4, Appendix 2].
Home Office (2000b), “The Police Grant Report (England and Wales) 2000/01”, www.homeoffice.gov.uk/ppd/pru/pgr2001.htm, [Appendix 2] .
Jarvie,I.C. (1985), “Philosophy of the social sciences”, in The Social Science Encyclopedia” eds. Adam & Jessica
Kuper, Routledge & Kegan Paul, London.
Jondrow,J., Lovell,C.A.K., Materov,I.S., & Schmidt,P. (1982), “On the estimation of technical efficiency in the
stochastic frontier production function model”, J.Econometrics, 19, 233-8, [3.2].
Kittelsen,S.A.C. & Førsund,F.R. (1992), “Efficiency analysis of Norwegian district courts”, J.Productivity Anal., 3,
277-306, [5.1].
Koopmans,T.C.(ed.)(1951), Activity Analysis of production and Allocation, Cowles Commission Monograph 13, Wiley
& Sons, New York, [Appendix 4].
Kopp,R.J. & Mullahy,J.(1990), “Moment-based estimation and testing of stochastic frontier models” , J.Econometrics,
46, 165-83.
Kumbhakar,S.C. & Lovell,C.A.K. (1999), Stochastic Frontier Analysis, Cambridge University Press, [1].
Levitt,M.S. & Joyce,M.A.S. (1987), The Growth and Efficiency of Public Spending, Cambridge University Press, [2.6,
2.16].
Lewis, Sue (1986), “Measuring output and performance: Data Envelopment Analysis”, H.M.Treasury’s Public Expenditure Survey Committee: Development Sub-Committee, [4.1].
Mosteller,F. & Tukey,J.W. (1977), Data Analysis and Regression, Addison-Wesley, Reading, Mass., [Appendix 3].
NERA (1999), “Peer review of a possible approach to better measure police efficiency: A report for the Public
Services Productivity Panel”, National Economic Research Associates, London, www.nera.com.
Norman,M. & Stoker,B. (1991), Data Envelopment Analysis: The Assessment of Performance, Wiley, Chichester,
[1].
Nunamaker,T.R. (1985), “Using Data Envelopment Analysis to measure the efficiency of non-profit organizations: a
critical evaluation”, Managerial and Decision Economics, 6, 50-8, [2.14, 2.16].
47
Pedraja-Chaparro,F. & Salinas-Jimenez,J. (1996), “An assessment of the efficiency of Spanish Courts using DEA”,
Applied Economics, 28, 1391-403, [5.1].
Pedraja-Chaparro,F., Salinas-Jimenez,J., & Smith,P. (1999), “On the quality of the data development analysis
model”, J.Operational Res.Soc., 50, 636-44, [1].
Schmidt,P. (1985), “Frontier production functions ”, Econometric Reviews, 4, 289-355, [1, 3.3, 3.4] .
Spottiswoode,C. (2000), “Improving police performance: A new approach to measuring police efficiency”, www.hmtreasury.gov.uk/pspp/studies.html, Public Services Productivity Panel, H.M. Treasury, London, [1, 2.9, 2.18, 3.3,
3.6, 4.1, 4.3. 4.6, 5.2, Appendix 2].
Stone, M. (1974), “Cross-validatory choice and assessment of statistical predictions”, J.Roy.Statist.Soc.B, 36, 111-47,
[4.4].
Thanassoulis,E., Dyson,R.G., & Foster,M.J., (1987), “Relative efficiency assessments using Data Envelopment Analysis: An application to data on rates departments”, J.Operational Res.Soc., 38, 397-411, [2.9, 2.14].
Thanassoulis,E. (1993), “A comparison of Regression Analysis and Data Envelopment Analysis as alternative methods
for performance assessments”, J.Operational Res. Soc., 44, 1129-44, [3.4].
Thanassoulis,E. (1995), “Assessing police forces in England and Wales using Data Development Analysis”, European
J. Operational Res., 87, 641-57, [2.15, 4.2, 5.2].
Walker, Monica A., (1992), “Do we need a clear-up rate?”, Policing and Society, 2, 293-306, [5.2.1].
INDEX
Numbers refer to sections.
Arrow’s Impossibility Theorem, 4.1
benevolent dictatorship, 4.1
DEA
• see Farrell,M.J.
• allowance for environmentals, 2.18
• as a flattering technique, 2.10
• as EDA, 2.19
• as oracular goddess, 2.18
• does not rank units, 2.7
• lack of discrimination, 2.14
• questionable generalization, 2.11
• sensitivity to outliers (takeover by?) outliers, 2.13
• testing & validating (is it possible?), 2.19
• weight constraints, 2.15
efficiency
48
• frontier F , 2.1
• stochastic (seff or U ), 2.1, 5.1
• value (veff), 2.4, 2.10, 3.2, 5.1
• technical/Farrell/DEA (teff), 2.1, 2.4, 2.10, 5.1
environmentals, 1, 2.16
• as “negative” outputs, 2.18
• and Farrell/DEA, 2.18
• and stochastic frontier regression, 3.5
• and “the third way”, 4.4
Farrell,M.J.
• approach of (modified), 2
• priority question, Appendix 4
feasible
• performance or point, 2.1
• construction of feasible set C, 2.2
Foucault’s blank spaces, 2.19
frontier, 2.1, 2.2
• unit, 2.1
input
• minimization, 2.4
• volumetric, 1
linear programming, 2.12, 2.18
mixing, 2.2
Nunamaker’s Theorem, 2.14
output
• “positive”, 2.9
• “negative”, 2.9, 2.10
• non-volumetric, 2.10
49
• volumetric, 1, 2.9
output maximization, 2.4
points at infinity, 2.5, 2.12, Appendix 4
police
• comments of Audit Commission, 5.2
• funding formula, 2.18, 4.4, Appendix 2
• inputs, outputs & environmentals, 1
• priorities, 2.16
• study of Thanassoulis, 2.15, 4.2
• studies of Drake & Simper, 5.2, 5.2.1, 5.2.2
rescaling, 2.2
returns to scale
• constant (CRS), 2.2, 2.4, 2.6, 2.9, 2.10, 2.12
• decreasing (DRS), 2.4, 2.6, 2.9, 2.10
• increasing (IRS), 2.6
• variable (VRS), 2.6, 2.7
• comments of Banker, Charnes & Cooper, 2.6
• comments of Drake & Simper, 2.7, 5.2.1
“state of the art”, 3.6
Spanish courts re-analysis, 5.1
stochastic frontier regression, 3
• as a consistency “check” on DEA, 3.3, 5.2.2, 4.7
• errors in all variables, 3.3
• exogenicity condition, 3.3
• simulation of Thanassoulis, 3.4
technical efficiency as flattering upper bound, 2.10
“the third way”, 4, 5.1
• allowance for environmentals, 4.4
• concurrence with NERA recommendation, 4.7
50
• cross-validation, 4.4
• Mark One algorithm, Appendix 3
• objections, 4.6
• unit subdivision, 4.4
value judgement, 2.4
• freedom from, 1
value weights, 2.10, 4.3
worse
• certainly, 2.1
• proportionally, 2.1
• worsening, 2.2
51
Download