QUESTIONS OF EFFICIENCY ASSESSMENT IN PUBLIC SERVICES:

QUESTIONS OF EFFICIENCY ASSESSMENT IN PUBLIC SERVICES: IS THE STATE OF THE ART REALLY “STATE OF THE ART”? ABSTRACT: We revisit and reformulate the seminal 1957 Series A paper of M.J.Farrell, and look at how it was “generalised” as “data envelopment analysis” (DEA). The closely related technique of “stochastic frontier regression” (SFR) is also reviewed, and both techniques are subjected to some reality tests for the specific problem of police force efficiency assessment. The possibility or necessity of a less technical approach based on explicit value judgements is considered: we call this “the third way” (TTW). The methods are pedagogically compared on a small set of data from 21 Spanish law courts. The ground is prepared for their comparative & realistic application to the massive amount of data recorded by the 43 police forces of England & Wales. PREFACE [These notes were prepared by Mervyn Stone as background material for the meeting of the Official Statistics Section of the Royal Statistical Society on July 19th MM.] The topic of this meeting falls between academic study & political action, and is therefore inevitably subject to governmental influences of one sort or another. However, if government is to get the sort of advice that will be in the public interest, it is vital that the topic be explored openly (in the best traditions of this Society) without fear of political disfavour and without the complementary tendency to “hype” any particularly viewpoint for the benefit of administrators who do not have the (increasingly rare) academic luxury of time to analyse apparent complexity. For instance, these notes will not include either an “Executive Summary” or an appeal to authority of the sort that graces the Spottiswoode (2000) report. They aim rather to pose questions that may induce total scepticism about the approach recommended in that report, and preferably a disposition to favour some (if not “the”) “third way”. The problem with appeal to authority lies in finding where “authority” lies: the Spottiswoode report says “There was generally consistent advice from leading experts in the fields of economics and econometrics having practical experience in efficiency measurement” The report does not make clear that, even among micro-economists, opinions about the practical usefulness of the techniques thus recommended are as inconsistent as can be—with some leading micro-economists being quite dismissive about them. These notes will not end with a set of “conclusions” suggestive of some claim to authority: it is better for the reader to sample the text than to rely on pruned & potted comments that could not accurately convey a sense of the whole matter, just as reading a recipe is usually no substitute for tasting the dish. Readers with the least time to spare might read Section 1 and then go straight to the Index or References to follow up items that tempt their palate. Those with more time, but not enough to read the whole paper, should omit Sections 2.11, 2.12, and Section 3’s exposition of the stochastic frontier approach (since econometricians are very divided about the latter’s value especially for the case of multiple outputs). For those for whom the topic is a necessary but rather dull exercise, the informal status of these notes permits an occasional levity going beyond what would be acceptable in any of the Society’s journals. 1 1. A GENERAL FRAMEWORK FOR THE GENERAL PROBLEM. Given: • A cross-sectional data-base from n operationally independent units, all engaged in the same range of productive activities over the same time period [eg. the 21 High Courts of the Spanish Administrative Litigation Division in 1991, or the 43 police forces of England & Wales in 1999. There will not be time or space to consider the use of potentially more informative longitudinal (“panel”) data.] • Measurements of r + s + t variables on each unit: – inputs x = (x(1), ..., x(r)) – outputs y = (y(1), ..., y(s)) – environmentals z = (z(1), ..., z(t)) – data D = {(xi , yi , zi ); i = 1, ...n} – (x,y,z) is the performance vector of a generic unit g in D The problem is: To generate from D an acceptably realistic assessment of the efficiency of an individual unit, with respect to the totality of internal (undocumented) activities that produce y from x in an environment characterised by z. The “holy grail” is the production of a single measure of efficiency for each unit, but a realistic assessment may require more than one measure supplemented with a number of quality assurance indices. No attempt will be made here to treat this problem in full generality [whatever that means] for the good reason that an abstract general approach is unlikely to be sufficiently responsive to the special features of a practical realization. There are however some relevant general considerations: (i) Inputs are controllable variables that are “costs” for a unit in the generation of the outputs eg. salaries, capital costs & depreciation. Most of this paper will be concerned with the case where multiple inputs can be individually costed and aggregated into a single cost input x: in which case, “inefficiency” necessarily includes the “allocative inefficiency” from non-optimal allocation of x to its individual components, (ii) Outputs should be comprehensive and include everything that can be given a value, positive or negative, in the outcomes of the activity associated with the inputs. [Most formulations of DEA consider only outputs that have positive value, or that can be apparently transformed into positivity.] The degree of subdivision of outputs into different categories has to be related to the extent to which there are associated differences in value. Such subdivision—so that outputs are separately identifiable and weighted in the efficiency formula [for those techniques for which there is a formula to permit this!]—may be necessary for the informed cooperation and motivation of units in improving efficiency. We will distinguish between volumetric and non-volumetric outputs: a volumetric output measures the amount or quantity of a particular output, whereas a non-volumetric output typically provides a numerical measure of quality. (iii) It is desirable that any efficiency measure should be derived by a widely comprehensible method, and that, in any particular application, its acceptability should be determinable by the intrinsic character & properties of the measure (i.e. from the form of its functional dependence on x, y & z)— without appeal to other criteria such as the number of Ph.D.s written about the method or the number of claims that “insight” may be gained into the problem by its application. (iv) There are broadly two types of efficiency measure. An intrinsic measure is one that could be calculated from the generic unit g’s performance (x,y,z) alone, independently of the performances of the other n − 1 units in D. An interactive measure is one whose determination is influenced by the other units’ performances, effectively positioning the generic unit in the geometry of the n performance vectors in D. 2 (v) There are difficult questions in how to take account of environmentals z such as geography, social mix, unemployment level or other socio-economic factors. One approach is simply to ignore environmentals, while recognizing that the efficiency measures may be influenced by them. A second approach is to stratify the n units into smaller groups in each of which the environmental variables considered important do not vary very much. Another is to try to allow for the influence of environment by adjustment of the input or output measures themselves, before constructing any measure of efficiency. Yet another method is to adjust only an undoctored measure of efficiency itself. Before considering such questions, we need to sort out the theoretical & practical difficulties that arise even when environmentals are ignored. So we will initially exclude environmentals and take performance to be assessable on the basis of inputs and outputs alone. The performance vector of our generic unit g is then simply (x, y), and we abuse notation with the identification g = (x, y) . Apart from such general & somewhat imprecise considerations, it can be argued that there is little to guide construction of efficiency measures apart from an intuitive sense of what might be judged appropriate for particular realisations of the problem. It will therefore be necessary to use the example of the 43 police forces of England & Wales (43PFs, for short) to test the realism of our thinking at every stage of the argument. However, the problem with intuition is that it is a very personal matter. Those who have developed and applied the currently favoured techniques for dealing with the above general problem [techniques described by some as “state of the art”] may well have been motivated by well-formed intuitions. Since the conclusions of this paper will probably clash with these intuitions, the issue had better be put as a question for an uncommitted intuition. . . one that touches deeply the philosophical battle in the social sciences between empirical method and value-based approaches i.e whether facts can be separated from values (Jarvie, 1985): Can one really believe that there are self-defining indicators of efficiency—functions of D alone and determinable without reference to context —that can be straightforwardly extracted or “measured” by some almost mechanical technique not significantly influenced either by prices of inputs when r > 1, or by value judgements on outputs when s > 1? [ Intuition does not work in a vacuum. For the 43 PFs in Appendix 2 Fig.11, it is helpful to know that the following inputs, outputs & environmentals have been considered for entry into D. The outputs & first-listed environmentals are those selected by Spottiswoode (2000) from a much wider field. Inputs: • staff costs • operating costs • consumption of capital costs. Outputs: • recorded crimes • percentage of recorded crime detected • domestic burglaries • violent crimes • theft of & from motor vehicles • number of offenders dealt with for supplying Class A drugs • public disorder incidents • road traffic collisions involving death or serious injury • level of crime (British Crime Survey) • fear of crime (ditto) • feelings of public safety (ditto) Most of these outputs appear, among many more, in the listing & discussion of “best value performance indicators” in the consultation document DETR (1999). Environmentals: 3 • number of young men • stock of goods available to be stolen • changes in consumer expenditure. These three environmentals were given as examples. The complexity of the environmentals problem is indicated by the fact that the following ones have already been used in the Police Funding Formula (PFF; see Appendix 2) to determine the money thought appropriate for police forces with different environmentals. PFF environmentals: • resident population • daytime population • population in terraced housing • population in Class A residential neighbourhoods • “striving” areas • population in one-parent families • households with only one adult • households in rented accommodation • population at a density of more than one per room • population density • sparsity of population • length of built-up roads • length of motorways.] There is a burgeoning literature about techniques, such as Data Envelopment Analysis (DEA) & Stochastic Frontier Regression (SFR) (eg. Schmidt, 1985; Norman & Stoker, 1991; Cooper, 2000; Khumbakar & Lovell, 1999). These two approaches aim or claim to contribute in a major way to the solution of the above problem in its diverse manifestations. DEA does more: it induces in its users a sense of freedom from the subjectivity of value judgement & arbitrary specification that is well-expressed in the following quotation from Cooper et al (2000): “In addition to avoiding a need for a priori choices of weights, DEA does not require specifying the form of the relation between inputs and outputs in, perhaps, an arbitrary manner and, even more important, it does not require these relations to be the same for each [unit].” Not all supporters of DEA are as committed. The paper of Pedraja-Chaparro et al confirms many of the welldocumented concerns about the practical usefulness of DEA. However its attempt to understand, by simulation of data sets from a known model, how they might be dealt with is of limited interest for multiple output efficiency studies. (Their model is for multiple error-free inputs and a single output, so it does not treat the problems raised in Section 4 for SFR.) The literature on DEA is rooted in the seminal econometric work of Farrell (1957) which will now be considered in slightly amended form. 4 2. FARRELL’S ECONOMETRIC APPROACH. 2.1. GENERAL DESCRIPTION. At the heart of the problem is the difficulty, in the applications we have in mind, of comparing any two performances (x,y) & (x0 , y0 ). How can we (pessimistically) judge whether or not (x,y) is worse than (x0 , y0 )? For “positive” outputs (“the more the better”), there are two closely related criteria (expressions of “Pareto optimality”) that no-one is likely to dispute: (x,y) is certainly worse than (x0 , y0 ) if (a) x ≥ x0 & y≤ y0 and (b) x> x0 or y< y0 [where x ≥ x0 means x(j) ≥ x0 (j), j = 1, ..., r, and x> x0 means x(j) > x0 (j), j = 1, ..., s, and likewise for the reverse inequality.] If condition (b) is not imposed, we have a weaker criterion—that of worse without the certainty. The condition x > x0 or y < y0 means that we are not prepared to say that (x,y) is certainly worse than (x0 , y0 ) if only some of the inputs of (x0 , y0 ) are increased and only some of the outputs of (x0 , y0 ) are decreased. Weak though it may be, the first criterion is immediately useful in defining a superior subset of any given set S of performance vectors (points for short): The efficiency frontier of S is the subset F of points that are not certainly worse than any other point in S. The criteria do not tell us how much worse (x,y) is i.e. they do not provide a measure of the relative efficiency of (x,y) & (x0 , y0 )—except in one very specialized comparison that is at the heart of Farrell’s approach. Suppose that (x,y) is proportionally worse than some different (x0 , y0 ) [and (x0 , y0 ) is proportionally better than (x,y)] in the sense that x0 = cx for c ≤ 1 y0 = dy for d ≥ 1. It is then reasonable to define the efficiency of (x,y) relative to (x0 , y0 ) as the ratio c/d. For, if it were agreed that the intrinsic efficiency of (x,y) should be given by an index ratio of the type veff = [v1 y(1) + ... + vs y(s)]/[u1 x(1) + ... + ur x(r)], it is immediate that [veff for (x, y)]/[veff for (x0 , y0 )] = c/d, whatever the values of the weight vectors u & v. [veff might be sensible if the y’s and x’s were volumetric and the v’s & u’s In this way, we could compare the efficiency of a generic unit g in D with that of another unit—provided one of the two was proportionally worse than the other. The problem is that this necessary condition cannot be expected to hold for any pair of units in D, even approximately, since the n points in D, in typical applications, can be expected to be well-distributed in their r + s dimensional space, when the ratio (r + s)/n (an index of their sparseness in the region that they span) is not small. [For 43PFs, we can expect were values & costs/prices per unit volume respectively.] (r + s)/n to be of the order of 20/43.] Farrell’s econometric approach creatively enlarges D to a continuum, C say, in which new points are constructed by specified procedures starting with the discrete “scattergram” that is D. These points are to be regarded as nonexistent but, at least hypothetically, feasible performances: they are sometimes referred to as the “technology set”. There will be a (perhaps empty) subset, Cg say, of points (cx, dy) of C, other than g = (x, y), that are proportionally 5 better than g. When Cg is not empty, the ratio c/d can therefore be calculated for each of its points. The technical efficiency (teff for short) of g can then be straightforwardly defined as the minimum value of c/d for points in Cg i.e. the generic unit g is to be compared with any performance in Cg that gives this minimum. For the possibility that Cg is empty [i.e. there is no (x0 , y0 ) in C proportionally better than g = (x, y)] the teff of g is quite properly taken to be unity or 100%: in this case, g is on the frontier F and is called a frontier unit. That, when Cg is not empty, the minimising point or points lie on the efficiency frontier F of the continuum C may be seen by reductio ad absurdum, with just one condition on the construction of C—that, if a point is in C, so are all points certainly worse than it [this condition is satisfied if the construction of C uses procedure (i) below]: Suppose a minimising point in Cg , (cx, dy) = (x0 , y0 ), say, were not in F . Then (x0 , y0 ) would be certainly worse than some point (x00 , y00 ) in C, with x0 ≥ x00 , y0 ≤ y00 and either x0 > x00 or y0 < y00 . If x0 > x00 , there exists some c0 < c such that x0 > c0 x > x00 , whence (c0 x, dy) is certainly worse than (x00 , y00 ) and is therefore in C, and certainly better than (x0 , y0 ) and therefore in Cg . [An analogous consequence for a point (cx, d0 y) with d0 > d would follow if y0 < y00 .] The inequality c0 /d < c/d [or c/d0 < c/d] then contradicts the initial supposition. A simpler version of this argument shows that, when Cg is empty, g itself has to be on the efficiency frontier F . 2.2. CONSTRUCTION OF THE SET C OF FEASIBLE PERFORMANCES AND ITS EFFICIENCY FRONTIER F. The feasible points are taken to include the n points in D [which do exist!]. Starting with them, the continuum C is progressively constructed by three procedures, (i), (ii), & (iii), for each of which there is some justification: [for most of what Farrell does] (i) WORSENING—If (x, y) is feasible, so is any point worse (a fortiori certainly worse) than (x, y). Worsening creates the feasible set {(x0 , y0 ) : x0 ≥ x & y0 ≤ y}. [Justification rests on the truism that inefficiency can always manage to achieve smaller outputs with larger inputs!] (ii) RESCALING—If (x, y) is feasible, so is (cx, cy) for any scale factor c. [Farrell justifies this by the idea of “constant returns to scale” (CRS) —an econometric concept attractive to economists concerned with industrial production, where it is usually invoked for the case of multiple inputs & a single output for which the idea of a “technical” production function makes sense. The idea of CRS is less obviously relevant to cases with multiple outputs involving complex sociological interactions and inefficiencies (our main concern). But, if CRS is deployed in such cases, there are at least two distinct ways of doing so: • An actual unit g = (x, y) would display CRS if it were to change all its inputs & outputs by the same scale factor c. • An efficiency frontier F satisfies the CRS condition if (x0 , y0 ) & (cx0 , dy0 ) both in F implies c = d. Applied (in the interests of efficiency assessment) to the generic unit g with performance vector (x, y), the concept creates, by hypothetical resizing (up or down), a set {(cx, cy); c > 0} of role models (good or bad!) for other units. For 43PFs, it would claim that the very existence of a police force with performance vector (x, y) means that the feasibility (potential existence) of a police force with performance vector (cx, cy) (for any c > 0) should be admitted for the purpose of assessing efficiency. This is a subtle idea and not one to be cynically dismissed as a cheap extension of the data set D. Farrell refers to CRS as an “assumption” but (with 43PFs in mind) we see it here more as a purposive & perhaps reasonable device for creating hypothetical units to be used in efficiency assessment. The “assumption” would be better termed an acceptance, for the purposes of efficiency assessment, of the frontier F (constructed with CRS-motivated rescaling) as if it were a realistic approximation to a true limiting efficiency frontier satisfying the econometrically attractive property of constant returns to scale. Application of the CRS concept also has implications for the choice of input & output variables, which we consider in Section 2.9 as far as outputs are concerned. ] (iii) MIXING—If (x0 , y0 ) & (x00 , y00 ) are feasible, so is the affine combination 6 (ax0 + (1 − a)x00 , ay0 + (1 − a)y00 ) for 0 < a < 1 on the straight line between the two points. [One justification of mixing involves CRS: scale (x0 , y0 ) down to (ax0 , ay0 ), and (x00 , y00 ) down to ((1 − a)x00 , (1 − a)y00 ), and suppose that these two performances can be added without any interaction. Another is to think of mixing as simply a perhaps reasonable interpolation between two actual units. Whatever the justification, mixing ensures that the set of feasible performance vectors has the mathematically convenient property of convexity i.e. the set C includes its own affine combinations.] The construction of C is a purely conceptual stage in the definition of the frontier F . Only F matters as far as the definition of the efficiency measure teff is concerned: F could have been constructed more directly by the rescaling, mixing & worsening of the frontier units alone—if only we had been able to identify them in advance. 2.3. RESTRICTION TO A SINGLE INPUT. The generalities of the last section call for simple illustration. Simplicity will be aided by immediate & continuing restriction to what we have in mind for application of our alternative approach—problems with a relatively uncontentious single input. The data D will now be the (s+1)-dimensional performance vectors (xi , yi (1), ..., yi (s)), i = 1, ..., n, generically g = (x, y). [For the 43PFs assessment, a single input x could be provided by the newly available measure of total costs called “Resource Accounting & Budgeting”—the only real arguments being whether to include the cost of pensions (an appreciable element of police force expenditure not controllable in the interests of efficiency of the current year) and the London Allowance. Aggregation of the three components of x (see Section 1) does not rule out posterior analysis & interpretation of any relationship uncovered between an efficiency measure and the way that x is constituted from these components. ] 2.4. ONE OUTPUT. We will start with the simplest case of all, that Farrell does not deign to mention: just one input & one output! [For 43PFS, can it be realistic to be concerned with such a simple case? This question can be answered affirmatively if it can be supposed that an aggregation y of estimated values of all the potentially separable outputs of a unit’s activities could be devised—by a combination of judgement & political will. This paper will argue that this supposition may be a necessity in applications to organizations of the kind represented by 43PFs. The argument can only be decided by careful inspection of techniques that, as if by magic, appear to dispense with the need for value judgement. If it were possible to agree easily on a single variable that managed to evaluate the various outputs of police force activity, there would have been no need for this meeting of the Official Statistics Section.] The obvious estimate of efficiency for single input & output is the intrinsic measure veff = y/x. Even though it may be thought of as the amount of value (in y) per unit of cost (in x), veff does not explicitly depend on the CRS concept. How does veff relate to the teff measure produced by the Farrell approach when the continuum C is produced by a combination of worsening & rescaling? [The low dimensionality of the case (r = 1, s = 1) makes mixing redundant once you allow rescaling.] Fig.1 is almost self-explanatory. [The axes are unconventionally labelled for a reason that will appear]. The continuum C can in this case be created by rescaling the unit m that has the largest value of veff and then worsening the rescaled points, and Cg is the subset {(cx, dy) : c ≤ 1, d ≥ 1} of C. The rescaling ray through unit m is the efficiency frontier F . All the feasible points in the interval f f 0 give the minimum value teff of c/d, which is also X/x & y/Y . [Note that the low dimensionality also renders vacuous the “proportional” in the definition of Cg .] We then have teff = X/x = y × min{xi /yi }/x = veff/ max{veff i }. [The case for thinking about the data in its unreduced, two-dimensional form is that we thereby retain an awareness of size of unit, which would be lost if we were to start with the ratio y/x. Nothing essential is lost by doing so. Realistic applications will take us into many more than three dimensions: one extra dimension is neither here nor there.] Farrell takes seriously the possibility that the CRS motivation may not be reasonable as far as the shape of the efficiency frontier is concerned. He claims that it is “more difficult to relax the assumption” than to deal with the case of multiple outputs. [If only that were so (in a non-technical sense)—some may think!] 7 Figure 1: A hypothetical scattergram for n = 4, r = 1, s = 1: estimation of teff for the generic unit g with input x & output y, using worsening & rescaling to create comparison performances. 8 Figure 2: The case of Fig.1 without rescaling. Fig.2 shows how Fig.1 changes when the comparison continuum C is created without using CRS-motivated rescaling but only worsening & mixing (the latter no longer redundant) and with the n units augmented to include the origin O regarded as a hypothetical unit with zero input & output. [This use of the origin is a bit of covert rescaling.] The superficial change from Fig.1 to Fig.2 is that F is now the least-efficient convex frontier with the property of decreasing returns to scale (DRS), that is consistent with the data D. We therefore agree with Farrell when he says that it is “quite simple to allow for diseconomies of scale” (i.e. DRS), at least in this particular case. [The frontier line can be thought of as what you would get if the y-axis were a flexible string and were pulled up tightly against the frontier points. It is clear that y/x can never increase with x along such frontier lines. The feasibility of the units on the vertical line joining the unit with the most output to the “point at infinity” on the x-axis is based not on mixing but on worsening.] The less superficial change is that teff (defined as the minimum of c/d in Cg ) is now given, at the point f , by the value x/X, which is less than the value y/Y at f 0 . In his Section 2.4, Farrell talks about these two values as distinct options: one corresponding to input minimization holding output constant, the other to output maximization holding input constant. He appears to favour the first, which agrees with our definition of teff as the minimum of a continuum of values of c/d on the frontier. 2.5. TWO OUTPUTS. The alternative geometry for the case (r = 1, s = 2) will take us to three dimensions. So it is a good thing that three dimensions can portray, with a little imagination, all but one of the essential features of the quite undrawable 9 Figure 3: Estimation of the technical efficiency of the generic unit g with input x and outputs y(1) & y(2), using worsening, mixing, and rescaling. geometry of the construction procedure for the typical application, in which s+1 will exceed the artistic limit of three. [The “simple case” of Farrell was (r = 2, s = 1) which lies outside our restriction to the single input case. It is therefore of passing interest only that the full geometry of (r = 2, s = 1) is different from that of (r = 1, s = 2): there is a fundamental asymmetry between the two cases that makes a difference. The asymmetry arises because worsening involves either decreasing outputs or increasing inputs: one cannot simply exchange r with s, and input with output. The asymmetry involves the definition of Farrell’s “points at infinity” and their analogues for (r = 1, s = 2): the difference may have been overlooked by Farrell because his sole application was to a problem with s = 1—agricultural output of the 48 states in 1957 America with 4 input variables. It should, however, have been elucidated in the general algebraic treatment for (r > 1, s > 1).] Fig.3 shows the typical final step in the construction of F for two outputs, whose stages will first be described in simple mechanical terms without explicit reference to mixing or worsening: (i) Rays are drawn from the origin to infinity through each of the n points representing the performances of the n units. These rays are the continua of feasible points created by rescaling. Fig.3 depicts the case where there are at least two frontier units i & j. Three extra infinite rays are drawn: the x-axis itself [corresponding to a Farrell “point at 10 and two side-rays in the planes y(2) = 0 & y(1) = 0. These side-rays are, respectively, the ray from the origin through the point (x1 , y1 (1), 0), and the ray from the origin through (x2 , 0, y2 (2))—where (x1 , y1 (1), y1 (2)) is the performance of the unit with the largest value of y(1)/x, and (x2 , y2 (1), y2 (2)) is the performance of the unit with the largest value of y(2)/x. [They exemplify the “points at infinity” that Farrell would have needed for the case s > 1.] infinity”] (ii) Imagine that the three plane quadrants enclosed by the three pairs of axes —the horizontal one between the y(1) & y(2) axes, and the side ones between the x & y(1) axes and between the x & y(2) axes—constitute a continuous sheet of shrink-wrappable plastic. Keeping it fixed to the x-axis and extending to infinity, shrink-wrap the sheet so that it is a tight fit to the “frontier rays”—defined as those of the n+3 drawn in (i) (apart from the x-axis) that stop the plastic shrinking any further. The surface defined by the shrunken plastic is Farrell’s estimate of the efficiency frontier under the CRS “assumption”. The frontier units are those that lie on the frontier rays: all the other units lie above the frontier surface. (iii) Fig.3 shows the typical case in which the vertical line from the generic unit g = (x, y(1), y(2)) to the horizontal outputs plane (moving down to the technical efficiency frontier while keeping the two outputs constant) meets that frontier at the feasible point f in the facet between the rays of two frontier units i & j. The point f 0 of the same facet is the point with the same input and the same ratio of outputs as g. (The atypical case is where a “side-ray” is one of the rays defining the facet containing the points f & f 0 .) The teff of the generic unit g will then be teff = X/x = y(1)/Y (1) = y(2)/Y (2) . The construction (i)-(iii) implicitly uses mixing to fill in the facets between the rays for the n units, and worsening to fill in the “side flaps” that correspond to the two side-rays. The picture does not generalise easily to the case s > 2 because one cannot readily visualise a shrink-wrapping in more than three dimensions! Fortunately, explicit use of mixing & worsening gives a construction that extends to the case s > 2 as easily as any excursion into hyperspace can. Here, therefore, are the steps of the alternative construction for the case s = 2 that has the same outcome as the shrink-wrapping. The alternative construction (i*)-(iv*) obviates the need for the three extra rays in (i). [We leave the reader to “see” how well it serves in more than three dimensions. If it does no more than provide a vague picture, even that may be useful in understanding the algebraic logic that is expressed in definitions & arguments.] (i*) Use rescaling to get n infinite rays from the origin through the n units. (ii*) Use mixing of all the points in these n rays to make a “solid” cone Cn whose vertex is the origin [the so-called “convex hull of affine combinations”]. (iii*) Apply worsening to each point of Cn to complete the construction of the continuum C. (iv*) For a generic unit g = (x, y) not on the efficiency frontier F of C, draw Cg , the subset of C of points (cx, dy) proportionally better than g. [The “proportionally” now matters as far as y(1) & y(2) are concerned.] The teff of g is the minimum of c/d. [It is the enlargement from Cn to C by the worsening procedure that creates the “side flaps” of the construction (i)-(iii), that may or may not be involved in determining the teff of some units.] 2.6. MORE ON NON-CONSTANT RETURNS TO SCALE: DRS, IRS, & VRS. Dispensing with CRS-motivated rescaling in the construction of C for the case (r = 1, s = 1) led to a frontier F that satisfied the DRS condition. Before considering the same step for the case (r = 1, s = 2), we need to clarify what is meant by decreasing returns for scale for that case. For the general case of multiple outputs, we can state two distinct deployments of the DRS concept: (a) An actual unit g = (x, y) would display DRS if it were to change its performance to (cx, dy) with c > 1 & c/d > 1. (b) An efficiency frontier satisfies the DRS condition if the ratio c/d is necessarily greater than or equal to 1, when (x0 , y0 ) & (cx0 , dy0 ) with c > 1 are both in F . [This deployment of the concept allows CRS as a limiting case of DRS]. 11 It was easy to see graphically what happens in the case (r = 1, s = 1) when rescaling was excluded in the construction of C but mixing with the origin was allowed. It is still possible to see, in much the same way, what emerges when the same exclusion is applied to the construction (i*)-(iii*) for the case (r = 1, s = 2): in the figure that would replace Fig.3, the n rays from the origin in (i*) would terminate at the points representing the n units. It might then be “seen” that the frontier F satisfies the DRS condition. But, if not, the following reductio ad absurdum argument may be preferred, especially as it applies to the problems of interest with s > 2 : Suppose (x0 , y0 ) & (cx0 , dy0 ) were in F (and therefore in C) with c > 1 & c/d < 1, contradicting DRS for F . Mixing of (cx0 , dy0 ) and the origin (0, 0) with weights a = 1/c & 1−a = 1−1/c, respectively, would give a feasible point (x0 , dy0 /c) also in C. But, with d/c > 1, the latter would be proportionally better than (x0 , y0 ), which contradicts the initial supposition. Hence F must satisfy the DRS condtion. For fixed g = (x, y) and (cx, dy) in F , DRS means c/d is a non-decreasing function of c. It follows that teff for g = (x, y), which is the minimum of c/d in the intersection of Cg & F , is given by the point with the smallest value of c which is on the line of constant y through g. We note that the econometric concept of increasing returns to scale (IRS) is incompatible with an efficiency frontier of a convex set such as C—convex because of the mixing in its construction. Only a technique different from Farrell’s can incorporate that concept. Farrell proposes, as “the only practical method”, that the units be divided into groups of “roughly equal output” and that the CRS-based method be applied to each group separately (Farrell & Fieldhouse, 1962). For completeness, we mention the somewhat ambiguously termed variable returns to scale (VRS), a “model” introduced by Banker et al (1984). It is simply what you get if the origin (zero outputs for zero input!) is excluded from the mixing used in the construction of C for DRS. Efficiencies calculated with VRS may be considered quite unacceptable: for example, unit j in Fig.8 would have a VRS efficiency of 100%. [Would you accept VRS efficiencies for 43PFs? Levitt & Joyce (1987) seem to have no qualms in presenting VRS/Banker efficiencies in their study of a single output for 38 police authorities in England & Wales.] What, incidentally, is to be made of two general comments in Banker et al (1984)?: “the concepts and definitions of theoretical economics as formulated for applications to private sector market behavior may not always be best suited for management science (and related) applications in the not-for-profit sectors.” “economics concepts such as returns to scale, etc., have no unambiguous meaning until the efficiency frontier is attained. Thus, by virtue of this comment alone, most of the statistical-econometric studies on this topic are put in serious question.” Thinking of 43PFs, the question of whether or not it is “quite simple to allow for diseconomies of scale” (Farrell’s Section 2.4) may be irrelevant to the question of whether we should do so or not. It is far from obvious that, without further information, the definition of efficiency should be changed to accommodate apparent decreasing or increasing returns to scale. In some applications, such features of the data ought to be treated as possible consequences of inefficiency (when the size of units is under administrative control)— inefficiency in large units when the frontier shows evidence of pronounced DRS, inefficiency in small units when it shows IRS. [This question will resurface when we look at some econometric thinking about 43PFs in the paper of Drake & Simper (2000).] 2.7. A SIMPLER PAPER-THIN PICTURE. It is easy to forget that teff does not and cannot give a complete ordering or ranking of the units in D: in general, it does not even give a partial ordering. What Farrell/DEA does is to place each unit in a two-dimensional continuum of feasible units constructed by hypothetically realistic procedures from the performances of other units—the frontier of the continuum being pulled this way or that by the performances of a handful of units. As far as the determination of the teff of the generic unit g = (x, y) is concerned, the essential features of Sections 2.1-2.6 can be given a paper-thin representation in only two dimensions. In Fig.4, f is the unique frontier performance that is both in Qg and on the frontier FCRS constructed with re-scaling, mixing & worsening: in the absence of slacks, f is an affine mixture of the actual performances of s frontier units in D, as also are the points h & i. Point h is uniquely determined as the one that is on the frontier FV RS (constructed for VRS without re-scaling and without 12 Figure 4: Frontier lines for CRS, DRS, & VRS efficiencies in the quadrant, Qg say, of performances proportionally worse or better than g = (x, y): the variables on the axes are the scalar multipliers c & d. 13 mixing with the origin) but not on FCRS or FDRS , and that has the smallest value of c. Point i by comparison is on both FV RS & FDRS but not FCRS . [In the absence of slacks, the segment f i is the intersection with Qg of the facet that is the affine convex hull of s+1 prontier units in both FDRS & FV RS .] The CRS-motivated teff is AQ/AR. For the g shown, the DRS teff is AQ/AR and the VRS teff is AS/AR: they would be equal if g were to the right of f . The lettering A,Q,R,S,V,V0 facilitates reference to the paper by Drake & Simper (2000) to be considered in Section 5.2. In what appears to be a comment on an equivalent picture, these authors claim that: “All economic organizations which use resources to produce outputs are prone to output ranges which display first increasing then constant and finally decreasing returns to scale.” The comment can be accepted if it is intended to refer to an underlying non-convexity of the sort given serious consideration by Farrell & Fieldhouse (1962), but not if it goes no deeper than the fact that any VRS boundary drawn by the construction method that gives Fig.4 will automatically have the property referred to, whatever the units in D are trying to tell us. This point will be seen to have relevance when we look at Drake & Simper’s suggestions for the reorganization of the police forces in England & Wales. 2.8. DOING WITHOUT SIZE: A DECEPTIVELY SIMPLE PICTURE? Although we have argued in favour of keeping size of unit in the picture, there is a commonly presented picture that has pedagogical merit when CRS can be invoked with conviction. In this case, it is both legitimate & helpful to reduce the geometry of Section 2.5 from three dimensions to the plane of the outputs per unit input p(1) = y(1)/x and p(2) = y(2)/x, as in Fig.5 (where the axes are really p(1) & p(2)). In Fig.5, the efficiency frontier surface F in three dimensions shrinks to a frontier line, and teff is determined by the fraction of the distance of the unit to that line (from the origin). The two segments of the frontier line parallel to the axes correspond to the two side-flaps in the full three-dimensional representation, equivalent to the “slack variables” feature of the simplex method. The fact that CRS has to be invoked with conviction to support this picture is easily overlooked. The points in Figs. 5 & 9 correspond to rays through the origin of Fig.2, and this has to be kept in mind when we are tempted to think of a unit, processed by CRS-motivated DEA, as a mixture of such & such “peer” units. 2.9. CONSTRAINTS ON OUTPUT VARIABLE TYPE: PROBLEMS OF SIGN, ORIGIN, & DEPENDENCE ON SIZE. When cost, in the customary financial or finance-equivalent sense, is the single input, the idea of technical efficiency and its concept of constant returns to scale (CRS) imposes some conventional constraints on the type of output variables that are used in DEA. The first constraint is a relatively weak one, already incorporated in our definition of the worsening procedure: outputs, whether volumetric or not, are taken to be positive in the sense of “the more the better”—other things being equal, outputs are expected to increase with input cost. So, for DEA as currently formulated, we would not take y(1) = “number of burglars apprehended” and y(2) = “ number of burglaries for which no-one was apprehended” . [Note that y(2) can be expected to be positively correlated with input cost across police forces, if only because cost is highly correlated with population.] It is tempting to deal with this problem by subtracting y(2) from a number N larger than all the values of y(2) in D, so that the output is positive in both senses of the word. However, the origin plays a crucial role in DEA: the choice of N would strongly influence the teffs then calculated, as is easily seen from Fig.5 by moving the origin a long way down the y(2) axis. Thanassoulis et al (1987) have suggested another “solution”: simply use the reciprocal of any input that goes in the wrong direction. Something similar is suggested in Box 3 of Spottiswoode (2000): “When it comes to efficiency measurement, all of the indicators will have to work in the same direction. This will mean that, in practice, some indicators will have to be ’inverted’.” [In Section 4, we will introduce a less technical approach to efficiency measurement in which both y(1) & y(2) will be admissible, by the assigment of positive value-weight to y(1) and negative value-weight to y(2). The origin, just a point on an aggregate value scale, still plays a role but one that is neutral with respect the direction in which value is accumulated as additional outputs are brought into the 14 Figure 5: An illustration of the case (r = 1, s = 2) of reduced data, based on indifference to size. 15 numerator. In Section 2.11, an extension of DEA is proposed that also admits negative (i.e. “smaller the better”) outputs: however, this generalization is designed to throw light on the character of DEA rather than to be considered seriously in the analysis of problems such as 43PFs.] A stronger constraint on the type of output variables stems from the size dependence in CRS-motivated rescaling (or from the mixing with the origin for DRS) : it is that (x, y) & (cx, cy) have to be considered equally efficient. This excludes outputs of the type y(3) = “percentage of burglaries for which someone was apprehended”: if y(3) were not excluded, it would be difficult to maintain that halving all the variables (including cost) would not change our measure of efficiency or that percentages over 100 were sensible. The constraint also plays havoc with the idea that N − y(2) might be used to deal with “unapprehended” burglaries. [ When a single input variable is volumetric, like financial input in 43PFs or like total acreage in Farrell’s application, then it makes sense in DEA to use volumetric output variables. Conventionally, these are positively correlated to input. But positive correlation is not enough: use of the reciprocal of a negatively correlated volumetric output violates any compatibility with the CRS concept. It is noteworthy that the public conception of police efficiency is, more often than not, based on some simple proportion such as “detection rate (per crime)”—with no cost input at all. The Section 2.11 generalization would accommodate such outputs. Note that y(3) is derived from two volumetric outputs one “positive” and one “negative”—the number of burglaries for which someone was apprehended, and the total number of burglaries. Efficiency assessment may be easier if these two elements are kept separate. ] 2.10. TECHNICAL EFFICIENCY AS A FLATTERING UPPER BOUND. There is an well-known alternative derivation of Farrell’s CRS-motivated teff measure that has somewhat schizophrenic consequences: there seem to be two competing justifications for the measure. The alternative does not need to create any feasible points at all! Suppose we adopt the view, introduced in Section 2.1, that we should try to use an intrinsic measure of the form veff = v1 p(1) + ... + vs p(s) (1) where p(j) = y(j)/x and v = (v1 , ..., vs ) are intended to be socially agreed value-weights for volumetric outputs y. [Note that, without explicit reference to the CRS concept, veff is a function of the reduced data p.] Suppose also that it proves impossible to agree on v. Then each unit, if it has a spokesman [the Chief Constable would serve for 43PFs] might legitimately ask to be assessed not by the undecided v but by a vector w that puts the unit’s own performance in the “best possible light”. Suppose that it is then agreed to interpret “best possible light” as the maximization with respect to w ≥ 0 of the interactive measure defined as the ratio of weff = w1 p(1) + ... + ws p(s) to the maximum of weff over the n units in D. [Note that no feasible units are needed.] It is straightforward Cartesian geometry in the s-dimensional space of p(1), ..., p(s) [the case s = 2 is widely portrayed in the DEA literature] to show that the the measure thus defined is none other than teff itself. The interpretation of teff here is therefore as an upper bound to the “true” veff—where “true” refers to the undecided & unknown vector v. How useful would such upper bounds be, if the veff approach were to be preferred? The question clearly merits further study in problems such as 43 PFs. That this flattering use of veff will not reproduce the generally different, DRS-motivated teff is obvious, given that use of DRS rather than CRS affects only the continuum C, which does not come into the derivation of the veff-based ratio. Nor would it do so if the latter were defined using as denominator the maximum of weff over C rather than over the n units, since the maximum is determined by one of the units. Farrell argued against any use of a measure such as the veff of Section 2.1. In Farrell’s industrial framework, u1 , ..., ur would be input prices and v1 , ..., vs would be output prices, and veff would be the basis of a “price efficiency” comparison with the efficiency frontier. Farrell’s argument is two-fold: (i) prices may be unstable (from firm to firm, or from time period to time period) compared with the estimate of the production function (when s = 1), or the production “portfolios” (when s > 1), that the efficiency frontier ought to be estimating; 16 (ii) price efficiency is “much more sensitive to the introduction of new firms (units) than is technical efficiency”. The force of (i) must be a matter of judgement in the particular problem, where it refers to the weight vectors u & v. For (ii) [thinking of 43PFS], the greater sensitivity or discriminatory power of the index based on veff, in which weights are fixed rather than being amelioratively adaptive as is teff, may just be an honest reflection of interpretable differences between units. Moreover, underlying Farrell’s argument there appears to be a degree of confidence in the efficiency frontier (constructed so as to flatter the unit performances) as a realistic approximation to the upper limits of efficiency—a confidence that may or may not be justified. 2.11. GENERALIZING DEA TO “NEGATIVE” OUTPUTS & NON-VOLUMETRIC OUTPUTS. The alternative derivation of CRS-motivated teff in Section 2.10 suggests the generalization in which a number of the volumetric outputs are “negative” in the sense of requiring negative value-weights. For the y(1) & y(2) of Section 2.9, Fig.6 gives the simple picture for n = 7 and s = 2. The “negative” y(2) affects the worsening step in the construction of C: the feasible set created by worsening (x, y(1), y(2)) is now {(x0 , y 0 (1), y 0 (2)) : x0 ≥ x, y 0 (1) ≤ y(1), y 0 (2) ≥ y(2)}. It may be verified that, in the reduced-data space, the feasible set Cp and frontier Fp are as shown in Fig.6. The broken lines have constant w1 p(1) + w2 p(2) where w2 is non-positive. The maximization requires is that of the Og/Oh—the ratio of distance to unit g to the distance to the frontier along the ray from the origin through the unit g. It is therefore reasonable to call it teff too (see Section 2.8). Its numerical determination is given by adaptation of the algorithm that serves for the case of all “positive” outputs: max(Og/Oh) w1 pi (1) + w2 pi (2) w1 w2 = ≤ ≥ ≤ max{w1 pg (1) + w2 pg (2)}, 1, i = 1, ..., n, 0, 0. (2) The method extends to the case s > 2 with up to s-1 “negative” outputs. This route to questionable generalizations of DEA is no longer available when some outputs are non-volumetric and when it is no longer possible to use the reduced-data representation. The generalizations in Cooper et al (2000) are expounded in the rather specialized framework of linear programming algorithms (next Section). When understanding rather than computational instruction is wanted, it is better to go back to the basic ideas of Sections 2.1 & 2.2 and specify the geometry of C & F for any generalization based on appropriate use of rescaling, mixing & worsening, prior to the final stage of input minimization. For example, with the three outputs of Section 2.9: • CRS rescaling would apply only to y(1) & y(2) to create rays of feasible sets not from the origin but from points on the y(3) axis: (0, 0, y(3)) for generic unit g. • mixing would then create the convex hull Cn of all the points on these n rays • worsening would then enlarge Cn by reductions of y(1) & y(2) but increases in y(3). Although the algebra of linear programming is then required to turn these steps into a input minimization in C, going back to basics provides a reminder, necessary in any application, of what it is we are doing to the actual unit data when any measure of efficiency is developed in this way. 2.12. LINEAR PROGRAMMING. Farrell was told by one of the discussants of his 1957 paper that there was a new technique—the more than 10-yearold simplex method of linear programming—that would solve any problems in calculating teffs, even for applications in which the number of variables, r + s, became very large. At the same time, the simplex method automatically patched up (by its allowance of “slacks” in variables) the lacuna about “points at infinity” (noted in our Section 2.5). For r = 1, the Simplex connection is made as follows: 17 Figure 6: Illustration of a reduced-data space with “negative” y(2). 18 From Section 2.5, we know that, for both CRS- & DRS-motivated construction of C, we get the teff of g = (x, y) when d = 1 i.e. by “input minimization” at the point (cx, y) in C with minimum c (when teff= c). The conditions for (cx, y) to be in C are cx = Σi ci xi , y ≤ Σi ci yi , ci ≥ 0, i = 1, ..., n, and, only when rescaling is excluded for the DRS case, the extra condition Σi ci ≤ 1. (If at all wanted, the VRS case requires the further condition Σi ci = 1.) The minimization of c with respect to variation of the ci is no problem for the Simplex algorithm : it is the dual of the maximization derivation of teff via weff in Section 2.10. Note that, for the DRS case, 1 − Σi ci is the necessary complementary mixing weight for the origin (0,0). For CRS (without the extra condition for DRS) it is easily verified that the algorithm takes the reduced-data form: min c (3) p/c ≤ Σi ai pi , ai ≥ 0, i = 1, ..., n, Σi ai = 1. For the generalization of Section 2.11, the dual of the algorithm (2) is given by using ≤ for “positive” outputs and ≥ for “negative” outputs, in the second line of (3) . [Cooper et al (2000) claim that “Farrell efficiency” (i.e. teff) should yield historical priority to the “CCR efficiency” of Charnes et al (1978). The claim is examined in Appendix 4.] 2.13. SENSITIVITY TO. . . OR TAKEOVER BY. . . OUTLIERS The reduced-data representation of the case (r = 1, s = 2) can be used to illustrate another feature of the nonintrinsic character of teff. This is the sensitivity of a unit’s teff to outliers among other units—a sensitivity that becomes increasingly significant as, for example, the number of outputs s increases to the level represented by the 20/43 of s/n that the 43PFs problem may require. In Fig.7 (where the axes are really p(1) & p(2)) unit i would have had a teff of 100 % if the outlying unit j (with very large y(1) and negligible y(2)) had not been included. How can we be sure that j deserves a teff of 100% (on grounds of “priority” perhaps) and that it is not “cherry-picking”? [The question can be put with greater realism when there are, say, 20 output variables among which there may be one or two that can be produced in quantity at relatively little cost.] 2.14. ADDING OR DISAGGREGATING AN OUTPUT ALWAYS INCREASES “EFFICIENCY”. Nunamaker (1985), working with the case of a single output & multiple inputs established that adding an extra input variable, or disaggregating an existing input into two component inputs, could not decrease (and would typically increase) the CRS-based teff of any unit. His indicative proof of this theorem, in a footnote to his paper, can be confirmed by the following precise & general argument for the logically equivalent “dual” case (r = 1, s > 1), using the alternative definition of technical efficiency of Section 2.10: From Section 2.10, teff = max{fs (w1 , ..., ws ) : wj ≥ 0, j = 1, ..., s} = max{fs } (for short), where fs = fs (w1 , ..., ws ) = weff/(maximum value of weff in D). For the addition of an (s + 1)th output with weight w(s+1) , fs is just f(s+1) with w(s+1) = 0. Hence max{f(s+1) } ≥ max{fs } , and teff cannot decrease. For the disaggregation of y(1) into y(1a) & y(1b), with y(1a) + y(1b) = y(1) and associated weights w1a & w1b , fs is just f(s+1) with w1a = w1b = w1 , and the same nice property of a maximum can be applied again. This gives theoretical underpinning to the experience of DEA users, only too well aware that, to quote Thanassoulis et al (1987): “the larger the number of inputs and outputs in relation to the number of units being assessed, the less discriminatory the method appears to be. . . . Thus the number of inputs and outputs included in a DEA assessment should be as small as possible, subject to their reflecting adequately the function performed by the units being assessed [my italics]”. In this, low discrimination means not enough units with relatively low teffs. [It seems that DEA practitioners do think there have to be limits to the permissiveness of DEA that allows units to be seen in the “best possible light”. Moreover, the requirement of adequate reflection of some function or activity has to be more demanding than its mere representation in some output or other: in 43PFs, for example, the omission of any output dependent on some particular activity of consequence would be noted and perhaps dysfunctionally exploited by police forces in their allocation of resources eg. prosecution for fraud.] 19 Figure 7: Unit i might be justifiably sensitive about the influence of unit j. 20 2.15. WEIGHT CONSTRAINTS IN DEA. The first indication of technical pathology in DEA came to light before the baby was christened—from the seconder of the vote of thanks for Farrell’s 1957 paper. Translated to the single input problem, Chris Winsten’s example goes like this: Consider any production process in which a particular output, say y(1), is necessarily so closely related to input x that the ratio y(1)/x is practically constant. Then all n units will be on or very close to the efficiency frontier, and will be awarded a teff of 100%, whatever performance they display on the other s − 1 outputs. In reply, Farrell acknowledged that such 100% efficiencies would be “unduly charitable” and said it would be “necessary to bring in extra information and define a more stringent measure”. But, if the general concept of technical efficiency is sound, why did Farrell not stick to his guns, and compare each unit with the formally feasible units generated by mixing with the origin—the reference set of proportionally worse performances? I happen to think that Farrell was right to feel uneasy about Winsten’s example, but wrong to think that the difficulty could be resolved by “extra information” (that the formally feasible reference set is not feasible in any realistic sense?), rather than by facing up to the question of weighting raised by Winsten’s example. In our “dual” version of that example, a teff of 100% is equivalent to giving zero weight to all the outputs except y(1). Defences of such 100% values—that they are useful upper bounds to unknown true efficiencies based on veff (Section 2.10) or, equivalently, that they are the “efficiencies” that present units in the “best possible light”—at least have an honest clarity, but also disguise the potential implications for the underlying weights. The problem is squarely faced in the police force study of Thanassoulis (1995), where it is dealt with in an openly subjective but suggestive fashion. The author recognizes that a police force (unit) “may be at a part of the efficient boundary characterised by unacceptable marginal rates of substitution [weights] between outputs, say valuing one or more outputs excessively while giving negligible value to other outputs”, and finds that the weights were “often counter-intuitive” [my italics]. The remedy proposed is simple: impose restraints on the weights! In his study of 1991 Audit Commission data for the 41 police forces excluding “The Met” and “City of London”, Thanassoulis used only three broad outputs • y(1) = number of violent crime clear-ups • y(2) = number of burglary clear-ups • y(3) = number of “other” crime clear-ups on the grounds that “the bulk of police effort is applied to investigation” and that these three crime categories were “sufficient to convey an overview of the seriousness and complexity of crime found”. The categories clearly provide comprehensive cover of clear-ups, but their small number had to be justified on the grounds that “retention of numerous crime categories would overcomplicate the analysis”. Thanassoulis boldly imposed two inequalities on the output weights: w1 /10 ≥ w2 ≥ 2w3 (in the notation of our Section 2.10), that are then easily built into the primal linear programming algorithm. Section 4 will take up the arguments for extending such inequalities to specify a set of extremal choices (2s−1 in number) for the weights v1 , ..., vs in veff, and for dispensing with the questionable use of DEA to make the “best possible light” choice of constrained weights in the associated efficiency comparisons. The question is taken further in Allen et al (1997). 2.16. PRIORITIES, ENVIRONMENTALS, OR INEFFICIENCIES? For reasons already stated, our account of Farrell’s work has dealt mainly with the case of a single input & multiple outputs. This is, in a sense, the dual of the case that Farrell was mainly concerned with, and that was illustrated with 21 his sole application—to the single agricultural output of 48 American states, expressed in unambiguously financial terms. If we contrast the latter single-output application with a dual case application, such as 43PFs, we may well ask whether the restriction to comparisons in Qg (Section 2.7)—in which the output profile (the ratios of outputs to each other) is the same as the profile of g—has the same relevance. However, one argument that has been adduced in favour of the restriction is that (i) ratios of outputs, such as y(1)/(y(2) with y(1) the number of burglars & y(2) the number of street robbers apprehended, do reflect the priorities of a police force, (ii) such priorities should be respected, and (iii) respectful efficiency comparison should use Farrell/DEA teffs. Dyson et al (1990) followed Charnes et al (1978) in presenting the priority argument as justification for rejecting any attempt to use a common set of weights. They pose the problem of efficiency assessment of schools with “achievements at music and sport amongst the outputs”, observing that some schools may “legitimately value achievements in sport and music differently to other schools”. Levitt & Joyce (1987) in their study of 38 police forces in England & Wales have a DEA “model” with just two outputs: the number of recorded crimes “against the person” cleared up, and the number “against property” cleared up. They claim that by making this distinction “we are able to allow for the possibility that forces may have different priorities towards the solution of one sort of crime as opposed to the other.” Whatever validity there is in the priority argument, it certainly goes beyond the thinking of Farrell—in which the “technical” in teff refers to the idea of a somewhat mechanical production function that converts inputs into a dependent output, and to the associated idea that, if rescaling is thereby justified for any one output, then it is automatically justified for multiple outputs. Assuming that the “priorities” reflect specific environmental pressures, they must result from allocations of sub-costs within the total cost represented by x: they are thus arguably more related to the concept of “allocative efficiency” of Farrell than to the “technical efficiency” estimated as teff. The argument deserves further analysis. In the application of DEA with the appreciable number of output variables (15 ≤ s ≤ 25?) that are needed to cover (and comprehensively motivate) police force activity in the 43PFs problem, the smallness of the number of forces, 43, has the following consequences. The teff of an assessed force will typically be determined by the inverse ratio of its cost to the lowest hypothetical cost in a set of pseudo-forces conceived as “feasible”. There is a fairly high probability that the lowest cost is that of the assessed force itself (when it is on the efficiency frontier). For other than such forces, the lowest cost is found (for the CRS case) by, roughly speaking, rescaling, mixing & possibly worsening the specific performances of s or s − 1 (20 or 19?) of the forces on the efficiency frontier. Many if not all of these will have output vectors y very different from that of the assessed force. If the priority argument is taken seriously, their priorities (determined by specific environmentals) will be very different too, which makes the associated value of teff at least questionable. The teff does not come from a comparison constructed from forces with even approximately the same profile. So we can ask whether mixing should be used at all, to create the pseudo-comparison at the heart of DEA. Also how can we know that it is not the pressures towards inefficiency (to be found in all organisations) that are favouring the particular “priorities” of the force under assessment? Nunamaker (1985) put the issue very clearly, pointing out that the other edge of the sword that allows units to present their performance in the “best possible light” is an edge that allows units to engage in “creative accounting, political lobbying, alteration of input/output mix, etc.” and that “provides...incentives for [unit] managers to act in a dysfunctional and socially unacceptable manner.” These comments & quotations are not intended to weaken the case for rewarding some allocation of activity that is honestly responsive to environmental pressures. But they do suggest that care is needed in how the reward is to be engineered, if it cannot be adequately accommodated by the technique of stratification into groups of environmentally “most similar forces”. A further question about “priorities” is suggested by comparison of units i & j in Fig.5, which have equal teffs of about 85%. Can unit i realistically escape censure on the grounds that it is not really less efficient—but is merely exercising its priority for output y(2) over output y(1)? If such grounds are accepted, why could i not adjust its “priorities” without change in cost x, so that it has the outputs associated with the feasible unit j (having, let us suppose, the same cost)?—output y(2) would be unchanged but y(1) would be increased ten-fold. In one sense, talk of “priorities” is something of a tautological red herring. Unless there is an external criterion, it 22 merely says that a unit likes its profile to be such & such, and misleads if carelessly interpreted as implying that the unit is being directly compared with other units having even approximately the same profile (=priorities!), rather than with feasible units constructed from units with quite different profiles & environmentals. Another, perhaps more significant, sense in which talk of “priorities” may be misleading is that for CRS-, DRS-, & VRS- motivated teffs, teff is (as we have preferred to define it) the “input-minimization” efficiency for fixed outputs, where the minimization is over all the feasible units in C. In other words, the restriction to the set Cg , in which the priority concept has been raised, is superfluous. Of course, the problem of justification is thereby simply transferred to making the case for “input-minimization”. 2.17. EVEN THE SIMPLEST PROBLEM CAN BE PROBLEMATIC. The last section asked whether an ill-formulated untestable hypothesis (about “priorities”) should influence the choice of efficiency assessment technique. Fig.8 illustrates a particular type of data for r = s = 1 that may provoke another hypothesis (this time perhaps testable)—the hypothesis of a “fixed cost” (a special case of IRS). CRS-motivated or DRS-motivated Farrell/DEA gives units i & j very low teffs. If, using VRS, we allow for a “fixed cost”, we are encouraged to see these units in a different light. But if not, what are the principles that favour some hypotheses (eg. “priorities”) but not others? Is there not a case for sticking with a measure of efficiency that does not, as a matter of principle, bother about what is inside the “black box” connecting y with x (and z)? Section 4 will try to specify such a measure, after Section 3 has looked at another technique that may also be trying to say too much about what is going on in the black box—under the mantle of a statistical sophistication that contrasts with the mechanical determinism of DEA. 2.18. CAN FARRELL/DEA ALLOW FOR ENVIRONMENTALS? Farrell’s term for environmentals was “quasi-factors”—as if they were factors of production, differing from other factors only in that they do not come with a price-tag. In the wider DEA literature, such environmentals are known as “uncontrollable” or “non-discretionary” inputs (Charnes & Cooper, 1985). Farrell suggested two ways of dealing with them: the first was to treat them like any other necessary input in the definition of technical efficiency (which does not involve prices); the second was to divide the units into groups homogeneous in the quasi-factors and make a separate efficiency assessment within each group. [Annex C of Spottiswoode (2000) reports the advice of economists & econometric specialists that, for 43PFs, “environmental factors affecting police outputs need to be taken into account—and in modelling terms probably treated as an input”. The difficulties of the “taking into account” are likely to be compounded by the very large number of environmentals that can be claimed to influence the 43PFs outputs (see Section 1).] Farrell gave no application of the use of quasi-inputs, but it seems that he saw them as volumetric inputs positively related to outputs. However, environmentals can also be negatively related to outputs. [For 43PFs, examples of volumetric environmentals drawn from the “Police Funding Formula” documentation (Home Office, 2000a) include: • number of people living in terraced housing, • area (hectares) covered by the police force. For outputs like “number of crimes detected”, the first of these are taken to be “positive”, while the second is taken to be “negative”. It is clear that such judgements are fraught with uncertainties for any technique that tries to make use of them.] Dyson et al (1990) appear ready to treat environmentals as outputs: “ A key aspect of DEA is incorporating environmental factors into the model as either inputs or outputs. Resources available to units are classed as inputs whilst activity levels or performance measures are represented by outputs. One approach to incorporating environmental factors is to consider whether they are effectively additional resources to the unit in which case they can be incorporated as inputs, or whether they are resource users in which case they may be better included as outputs.” So parental education level would be an input in a study of schools, whereas competition level would be a “resource user” and therefore an output in a study of businesses. When environmentals are treated as inputs, there appear to be two ways in which they are incorporated into the 23 Figure 8: A problem with the simplest case? 24 associated linear programming algorithms of DEA. To examine their logic (but not their practicality) it is enough to suppose we have just one output and one volumetric environmental input z in addition to x: so that, in the general framework, there is two-dimensional input vector x = (x, z). The first way leaves DEA to operate in its usual way for multiple inputs: the objection to this is that teff is based on a comparison of a generic unit g = (x, y, z) with a feasible point (cx, y, cz) in Cg in which the “uncontrollable” input z has been changed by the same factor c as for the (hypothetically) controllable input x. What is presumably wanted instead is to determine the point (cx, y, z) on an appropriately defined efficiency frontier, which would allow efficiency of g given z to be determined as c. So the second way must go back to basics, construct an appropriate set C of feasible performances and its efficiency frontier F . All we will here do is to exhibit the 3-stage construction that leads to the method of Charnes & Cooper (1985): • rescaling for z as well as x & y [the existence of the generic unit g = (x, y, z) is taken to imply the feasibility of performances (cx, cy, cz), given the supposed volumetric character of z.] • affine mixing [questionable, since “long distance” mixing means we are not just making a local interpolation.] • worsening of x & y for fixed z [reasonable.] It may be verified that, with C and F thus constructed in the space of (x, y, z), the feasible set Cp , say, and the efficiency frontier Fp ,say,in the reduced-data space of the hybrid ratios p(1) = y/x , p(2) = z/x has the geometry exemplified in Fig.6, even though the reduced variables now have a different interpretation. The partial isomorphism with the theory for Fig.6 then gives us our understanding of the Charnes & Cooper (1985) method and what it does. Let (cx, y, z) be the point in Fp with the same values of y & z as g. The method is either the linear programming algorithm (Norman & Stoker, 1991, Appendix A7) that gives c as c = w1 ≥ w2 ≤ w1 pi (1) + w2 pi (2) ≤ max{w1 pg (1) + w2 pg (2)}, 0, 0, 1, i = 1, ..., n, or its dual (Section 2.12). The generalization to s outputs and t environmentals is mathmatically straightforward. The fact that the Charnes & Cooper method gives an “efficiency” c that, with the data in D other than the (x, y, z) of g treated as fixed, is an automatically determined function of (x, y, z) should give food for thought to those who recommend this way of dealing with environmentals. From the analogue of Fig.6 that here applies, we can see that c = Og/Oh is a decreasing function of z for fixed (x, y) (until the extending ray Og meets “slack”). All this whether z is “positive” or “negative”! Should DEA be required to act as some oracular goddess that can dictate the shape of reality—not a deus but a dea ex machina? [Spottiswoode (2000) quotes more econometric advice—that “environmental variables should be included in the analysis—either in the models or by completing separate DEA analyses on sub-samples of forces which share similar operating conditions. However this latter approach will result in a high proportion of forces being efficient” . Here the term “models” should be interpreted as choices of particular linear programming algorithms.] Relative to such questions of logic, it may be of only secondary interest that the method will award 100% “efficiencies” to a large proportion of the n units, when an already sizeable value of s has to be augmented by a realistically large value of t. Note also that the problem remains of how DEA should allow for non-volumetric environmentals that cannot be put in volumetric form, such as the “density of population” used in devising PFF (Appendix 2). 25 2.19. MISCELLANEA Among the recommendations of Spottiswoode (2000) is the following: The Home Office, in consultation with its policing stakeholders, should review the appropriate input and outcome measures, outcome weight ranges, and SFA and DEA models with a view to first using them in mid-2001 using audited BVPI 2000/01 data. The task of specifying, building and testing the models should be contracted out to independent experts. Elsewhwere in the report the task is extended from “building & testing to “validating”. When it was announced that US President Coolidge had just died, some wit asked “How can they tell?”. It is salutary to try to answer the more difficult question of how to test & validate a DEA “model” that is not much more than an algorithm to produce a string of putative efficiencies (that in any case should not be ranked). The production of a “consistent” string of numbers from SFA does not help since these two techniques are doing very similar things with the data, from almost dual viewpoints. DEA may have greater value when spelled EDA (the Exploratory Data Analysis of Mosteller et al, 1977). Especially if only “mixing” is used in the creation of feasible performances, DEA can be seen as a bold, if not rash, solution to the problem of missing or inadequate data. Proponents of DEA might look to the arch deconstuctor of rationality, Michel Foucault (1973, p.xx), for ideological support: “Order is, at one and the same time, that which is given in things as their inner law. . . and also that which has no existence except in the grid created by a glance, an examination, a language; and it is only in the blank spaces of this grid that order manifests itself as though already there, waiting in silence for the moment of its expression.” 3. STOCHASTIC FRONTIER REGRESSION (SFR). 3.1. INTRODUCTION. The statement of the general problem in Section 1 was slanted towards the idea of an intrinsic, rather than heavily interactive, measure of efficiency of an individual unit. However, a measure such as Farrell/DEA’s teff, that depends on the existence of other units for its definition, cannot be dismissed just because it is interactive. Even an intrinsic measure needs a scale on which it can itself be assessed and employed as a comparative incentive to increases in efficiency. For an intrinsic measure, the scale is provided by the set of n individual measures based on D alone— without any feasible-point infilling! Although the basis of teff is its construction of a hypothetical efficiency frontier, by some creative accounting as it were, the Farrell/DEA method does not venture outside the range of the data D in any significant sense. We now consider a technique, stochastic frontier regression (SFR)—otherwise known as “stochastic frontier analysis” (SFA)— that does go outside D in a conceptually significant and imaginative way and that therefore raises the evergreen question of realism. Devised for the single output case (Aigner et al, 1977 ), SFR is based on the concern that uncontrollable variation in output is interpreted as inefficiency by deterministic techniques like DEA. The problem faced by any statistical method that tries to meet this concern is how to separate the two contributions to the deviation of each unit from the supposed frontier. It is the delicacy (a lack of robustness to assumptions) of the method devised by SFR to do this that poses a significant reality challenge—especially when the method is translated to the single input/multiple output case. 3.2. SFR MODELLING FOR A SINGLE OUTPUT/PRODUCTION. Although our only concern is with the single input/cost case, it is necessary to review the single output/production techniques that have now been adopted for the single input case. For the single output case of the industrial sort 26 considered by Farrell, SFR may choose to take the output y of the generic unit g = (x, y) to be randomly generated by a two-component deviation from a production function Y = f (x): y = Y UV (4) where U & V are independent random variables. Here U , in the interval (0, 1], represents the efficiency of g, whereas V , distributed around the value unity, represents the uncontrollable random variation in y. A statistical estimate of U is the SFR efficiency seff, say, The production function f (x) is commonly assumed to have a reality in the shape of a Cobb-Douglas function, transformable to the logarithmic form log Y = a + b1 log x(1) + ... + br log x(r) (5) Equation (5) satisfies the CRS condition if and only if b1 + ... + br = 1. From (4), log y = log f (x) + v − u (6) where v = log V and u = log(1/U ) ≥ 0. The LimDep software (Greene, 1995) takes v to have a normal distribution with zero mean, and, among other options, gives the user the choice of an exponential, half-normal, or truncatedat-zero normal for u. Assuming all x-values in D are non-zero, the parameters in (5) and in the distributions of u & v are estimated by maximum likelihood. Conditional on v − u = e, the theoretical expectation of the distribution of u in (3.3) is a function of e and the parameters: the efficiency U of g may then be estimated as the negative exponential of the maximum likelihood estimate of this function (Jondrow et al, 1982). [Such SFR models make the implicit assumption that there are no significant errors in the inputs x—in the sense of significant deviations from “true” values X that might be required in (6) to allow u to represent the “true” efficiency of g. Without the assumption, the method runs into the identifiablity problem known as “functional relationship” or “errors in [all] variables” in statistical theory. There is, paradoxically, a deterministic version of SFR that omits the V from (4), attributing all variation to inefficiency.] 3.3. SFR MODELLING FOR A SINGLE INPUT (“COST”). The SFR literature (eg. Bauer, 1990) suggests that the the single input case can be treated as the nearly symmetric “dual” of the single output case of the last section—the formal symmetry would be exact but for the recognition that inefficiency increases cost. This would give X = f (y), x = (X/U )V , with the analogous specialization: log x = log f (y) + v + u (7) where f (y) is now an s-dimensional efficiency frontier surface rather than a production function. Equation (7) is the reduction to a single input of equations (4.1) & (4.2) of Bauer (1990) . Bauer indicates that (7) is applicable only if the output vector y is exogenously determined. The suggestion ignores the likely presence in many applications of variation in y that would, without special assumptions, vitiate any simple functional relationship X = f (y), and that could not be written out of a realistic acccount by a declaration of exogenicity. [For 43PFs, can it be maintained that any realistic error structure can be confined to x and does not involve y?] A more realistic alternative in many problems would be to take the V out of the equation x = (X/U )V and put “error” into y. Then (7) would have to be replaced by s + 1 equations: if the logarithmic form were maintained throughout, these would be: log x = log f (Y) + u log y(j) = log Y (j) + v(j), j = 1, ..., s. (8) (9) The Cobb-Douglas production function might then be put to use as a cost function: log f (Y) = a + b1 log Y (1) + ... + bs log Y (s) (10) with b1 + ... + bs = 1 for CRS. Alternative modelling of the error in y, such as y(j) = Y (j) + v(j), j = 1, ..., s 27 (11) may also be considered (though this would not be realistic for 43PFs). The cost x itself may well be modelled as a linear form divided by the efficiency term U: x = [a + b1 Y (1) + ... + bs Y (s)]/U (12) with a = 0 for CRS-compatibility. The possibilities are almost unlimited. The main criterion (within this approach) must be whether the models make sense with the data in which there are unknown and perhaps unmodellable inefficiencies, for which the choice of the function f would be critical. For example, would it make sense to apply (12) with a non-zero intercept a in the case of the data for s = 1 shown in Fig.8, or would it be sensible to omit the CRS condition b1 + ... + bs = 1 from (10) (see the final paragraph of Section 2.6)—and continue to treat U as a measure of efficiency? Ssuch questions may be secondary to the “errors in variables” problem, as recognized by Schmidt (1985) in requiring that a (single) output be exogenous when a “cost frontier” is to be estimated (as the dual of his “production function” ). Appendix 1 explores the problem in just the simplest case: s = 1 & b(1) = 1 (CRS-compatibility) with even simpler distributional assumptions than those made for a single output. The problem here may have been foreseen in the following comment on the current state of SFR (Cooper et al, 2000, p.264): “There are shortcomings and research challenges that remain to be met. One such challenge is . . . to include multiple outputs as well as multiple inputs.” For the applications of SFR in Section 5, we have therefore been content to demonstrate the questionably relevant, statistical package LimDep, designed for the manifestly different dual case, but actually recommended by Spottiswoode (2000) as a check on the performance of DEA. This package replaces the complexity of (8)-(12) by the deceptively simpler options log x = log f (y) + v + u x = f (y) + v + u (13) (14) When log f (y) in (13) or f (y) in (14) are linear in their parameters (with an intercept), the relationship fitted by LimDep is usually close to a translation, parallel to the x-axis, of the Ordinary Least Squares (OLS) line fitted with x as dependent variable. An alternative to SFR, as described so far, is the efficiency frontier defined by the “corrected” OLS (COLS) line—shifted up & parallel to the x-axis so that all but one residual are negative (Greene, 1980). All this work becomes more questionable if no attention is paid to the reasonableness of the fitted frontier function f (y) as a platform on which efficiency is defined: consider, for example, its application to the simplest of data sets in the shape of Fig.8. Those who maintain with Spottiswoode (2000) that SFR can be used as a “check” on DEA should note that the common use of a Cobb-Douglas f(y) in (13) violates the convexity condition on the “technology space” in s + 1 dimensions defined by worsening the expectation of the frontier surface x = f (y). From the start, there can be a built-in conflict between the “models” for DEA & SFR. However the expectation that there are some similarities between DEA & SFR leading to a degree of correlation between teff & seff is a reasonable one—even though such correlation should not be taken as validation of their logics. [The situation is not as dire as the dispute in physics that provoked the question “What do Abraham Lincoln & Einstein have in common?” and the answer “Both have beards, except Einstein.”] Although Cubbin & Tzanidakis (1998) did things I would not wish to emulate (eg. reducing to three a very large number of relevant outputs by purely statistical technique), their application of SFR managed to raise a concern diametrically opposite to the one that motivated the invention of SFR: “If, as is the case in the water industry, the overall error appears to be almost symmetrically distributed the logic of the stochastic frontier approach would imply a very small range for w [our u] and hence very low levels of inefficiency. This may be difficult for a regulator to accept [my italics].” 3.4. SIMULATION TO THE RESCUE? Thanassoulis (1993) analyses some artificial data with n = 15, r = 1, s = 3 in an attempt to compare the relative merits of DEA & RA (RA is simple regression analysis that dispenses with the complexity of SFR). He claims that, although they are based on hypothetical data, his findings are “a consequence of the underlying nature of 28 two methods and they are therefore generalizable”. The data were generated by taking 15 error-free output vectors (whence Y = y), using a numerical specification of equation (15) to give seven error-free x-values, and another eight x-values with added values of inefficiencies u (no errors of type v are involved). RA then fits the plane surface x = b1 y(1) + b2 y(2) + b3 y(3) (15) by ordinary least squares, and defines efficiency as the ratio of fitted to observed input x. Unsurprisingly, it is found that: (a) DEA gives teffs that are equal to the “true” efficiencies (except for two units where “slack” plays a role), simply because for most units the relevant frontier facet lies on the “true” efficiency frontier; (b) RA (influenced by the values of u added to eight of the units, and with its fitted line shifted up the x-axis from the “true” frontier) does poorly, giving the other seven units efficiencies exceeding 100%. The ranking of these efficiencies is, however, less ojectionable. So DEA is judged good for estimation, while RA is judged to be not so bad for ranking. It is these findings that the paper presents as generalizable. Thanassoulis justifies his avoidance of the complexity of SFR by quotation of Schmidt’s paper. An alternative quotation (Schmidt, 1985, p.306) reflects the problem with such avoidance: “In my opinion the only serious intrinsic problem with stochastic frontiers is that the separation of noise & inefficiency ultimately hinges on strong (and arbitrary) distributional assumptions. This is not easy to defend. However, in defense of stochastic frontier models, it is clear that this problem is not avoided by assuming the frontier to be deterministic. Assuming statistical noise not to exist is itself a strong distributional assumption, and one that is empirically false in data sets that I have analyzed.” Generalities aside, one might test the realism of SFR by asking how the method would react to the simplest of data sets illustrated in Section 2.17. 3.5. CAN SFR DEAL WITH ENVIRONMENTALS? If the problem of formulation for multiple outputs could be resolved, it would be tempting to deal with environmentals by incorporating z into f (Y) or f (y) to adjust for the undoubted influence of environmentals in most applications. However, we still have the problem touched on at the end of Section 2.8: if efficiency is correlated with environmentals (eg. “resident population” or size), as is most likely, then adjustment for z may prematurely remove from consideration an important component of what the study is about. Such adjustment would also have to face the problem of “over-adjustment” (the analogue of the problem of “over-fitting” in multiple regression) and of then justifying the finally decided adjustment in terms of some realistic model. 3.6. “STATE OF THE ART” ? Spottiswoode (2000) relied on “advice from leading experts in the fields of economics and econometrics having practical experience in efficiency measurement”, summarised in a quotation from one such source: “The use of both DEA and a parametric frontier technique such as . . . SFA undoubtedly represents the ’state of the art’ in terms of relative efficiency analysis and would represent the optimal approach to efficiency analysis across police forces.” A paper from the same source (Drake & Simper, 1999) was said to be “encouraging” in the final paragraph of the technical annexes of Spottiswoode (2000), along with yet another quotation: “both DFA and DEA produced very similar efficiency rankings suggesting that both are viable methodologies for the relative efficiency analysis in public sector services such as the police.” Here “DFA” is the authors’ Distribution Free Analysis version of SFR—a complex procedure applied to 5 years’ data of 43PFs with 4 inputs but only 3 outputs. The procedure will be reviewed in Section 5.2. 29 4. THE THIRD WAY (TTW). 4.1. INTRODUCTION “The Third Way” (TTW), here proposed as an alternative to DEA & SFR rather than a complement, has historical precedents in welfare economics, that may have been blighted by technical problems in reconciling valuation theory and other economic theory. The reluctance to engage in the thorny issue of multiperson preferences may have been reinforced by the famous “Impossibility Theorem” of Kenneth Arrow (1951)—the apparently unattractive finding that, roughly speaking, dictatorship was the only option for social choice satisfying some simple axioms. However, one form of dictatorship is the benevolent exercise of political will & judgement by a democratically elected government. Given the manifest technical difficulties with less overtly political techniques such as DEA & SFR, should the possibility of benevolently dictated valuation be left unexplored, especially in problems like 43PFs? The idea that we need to introduce some sort of exogenous valuation, in order to resolve matters, surfaces here and there throughout the efficiency literature, even among exponents of DEA such as Thanassoulis quoted in Section 2.16. From a less committed viewpoint, Lewis (1986), echoing Nunamaker (1985), put the issue clearly in comments on DEA: “the efficiency criterion used regards all the variables as having equal importance. This may mean that a DMU which dominates in the production of a relatively unimportant output is assessed as efficient at the expense of a DMU which is, in fact, more efficient at producing more valuable outputs”. Introduced in Sections 2.1, 2.9 & 2.10 as the undoctored use of veff, the technical simplicity of TTW will be unattractive to those who believe that complex problems must have complex, preferably technically advanced, solutions. For example, Cubbin & Tzanidakis (1998) would dismiss veff as a “simple ratio analysis” compared with “sophisticated mathematical & statistical modelling”. [Until quite recently, “sophisticated” meant “deprived of original simplicity”. King Lear’s regretful adage that “Striving to better, oft we mar what’s well” is clearly far too complacent for a problem such as 43PFS, but it could hold good in the distinctly un-Shakespearean form: “Striving to refine a method or model with more bells & whistles can be counterproductive, and it may be more realistic to move in the direction of greater simplicity.”] Spottiswoode (2000) claims that: “All techniques for measuring comparative police efficiency would work best when there are a limited number of input and outcome variables relative to the number of forces being measures.” This claim is a response to the non-discriminatory feature of DEA (see Section 2.14) and to the problems of validating speculative statistical modelling in SFR (see Section 3). The claim should not influence the implementation of TTW, since TTW is designed to accommodate a good number of necessary explicit outputs [see Section 1 (ii)]. 4.2. A GOOD START? Despite its limitations, we develop TTW from another look at the police study of Section 2.15. Thanassoulis used DEA with four inputs —numbers of violent crimes, burglaries, other crimes and officers. With these, teff is based on weighted combinations of clear-up rates, each with respect to some weighted combination, determined by DEA, of the four inputs. Of these, only the number of officers can be considered a cost, and, if the analysis of this limited database were to be repeated, it might be more logical to use a single input x—the total cost of a police force referred to in Section 2.3. The second easy change would be to add three “negative outputs” that might have been available in the Audit Commission data-base: • y(4) = number of violent crimes in 1990 not cleared up by the end of 1991 • y(5) = number of burglaries in 1990 not cleared up by the end of 1991 • y(6) = number of “other” crimes in 1990 not cleared up by the end of 1991. 30 A third change involves the important issue of the quality of the data records: it would be not to count crimes that are solved simply by being “taken into consideration” with some other cleared-up offence. Since 1993, the Audit Commission has done just this, and counts only crimes “detected by primary means”, for which the police have had to carry out a crime-solving investigation (cf. the wondrous Nottingham detection rate of the 1960s). Then, assigning a value of v1 = 100 to the clear-up of a single violent crime, the following intervals for v2 , ..., v6 in veff might be agreeable by a panel of “the great and the good” authorised to do so by a benevolent dictatorship: 6 2 −20 −2 −1 ≤ ≤ ≤ ≤ ≤ v2 v3 v4 v5 v6 ≤ ≤ ≤ ≤ ≤ 10 3 −10 −1 −1/2 These inequalities would be roughly consistent with Thanassoulis’s bold weight-restraints, but they go further in putting a negative value on crimes that were recorded in the previous year but not cleared-up by the end of the assessment year. The final calculations would rely on the linearity (and therefore monotonicity) of veff as a function of each of v2 , ..., v6 . For each of the 32 choices of end-points of the 5 intervals, one would calculate veff for each police force and the corresponding percentage efficiencies (100× veff/max{veff}). No attempt will be made to refine the discussion of this simple example: it serves merely to introduce the general specification of TTW (Mark One!). 4.3. GENERAL SPECIFICATION OF TTW. At the general level, there are four distinct stages: (i) problem description, and collation & screening of relevant data; (ii) definition/specification of input cost & outputs; (iii) valuation of outputs, and calculation of undoctored veffs; (iv) stratification and/or adjustment of veffs for environmentals. For (i), little can be said at the general level, except to parrot the call for “Best practice!” (now widely heard in the land). For (ii), TTW is intended for problems in which the reduction of all costs to a single bottom-line figure (the input x) is a relatively straightforward operation when the accuracy aimed at is realistically set, to within 5% say. If there is unresolvable argument about whether any substantial element should be included, it may be best to allow two efficiency measures to be developed—one for its inclusion and one for its exclusion. For outputs, however, TTW is intentionally designed for problems in which there is, typically, a large number of contending outputs—all required to appear explicitly in the openly published formula that generates the efficiencies. In its criticism & rejection of the TTW approach (see Section 4.6), Spottiswoode (2000) ignores one crucial advantage of TTW—that it does not break down when the number of outputs is increased (to avoid the problems, already alluded to, that are associated with excessive aggregation of outputs that call for separate consideration and appearance in the efficiency formula eg. successful or unsuccessful, but all very costly, prosecutions for fraud). For (iii), I have already suggested that TTW should use undoctored veffs externally adjusted for environmentals if necessary (as if each were the yield of an agricultural plot adjusted for weather and soil fertility, or the outcome of a clinical trial adjusted for the case history of patients). The output variables y(1), ..., y(s) in the numerator v1 y(1) + ... + vs y(s) of veff should be such that, by choice of the value-weights v= v1 , ..., vs , the numerator represents, for each unit, the total value of the bulk of the activity of the unit. The outputs should be jointly comprehensive so that the bulk of the activity is covered, and volumetric so that the coefficient vj is then the value per unit volume of the jth output. In many problems, it should be possible to define outputs that are disjoint (eg. the different categories of crime for 43PFs) to facilitate the assignment of value-weights. Only the ratios of the value-weights need to be specified, which may be helpful in problems where consensus about v allows only intervals to be agreed. For 31 this, TTW might take a pivotal output whose value-weight might be set equal to unity or, more conveniently, 100. An interval should then be established for each of the other value-weights, within which all parties can locate their preferences. This would give an (s − 1)-dimensional box for the latter, whose 2s−1 vertices would deliver a set of undoctored veffs for the n units. For (iv), there are difficult but not, it is to be hoped, insurmountable technical problems. Their analysis calls for a Section of their own. 4.4. ALLOWANCE FOR ENVIRONMENTALS. There is just one idea that, in effect declares “No problem!”. That is the idea that input costs have been predetermined for each unit, at levels that take proper account of environmentals, and in the sense that veff is then an acceptable, already adjusted, measure of efficiency. The validity of the idea could only be assessed by close & rather subjective inspection of the method of predetermination—and, even if it were possible, its justification would depend on the particularities of the problem. However, it is important to recognise that such questionable justification could be maintained even if the veff values were found to be statistically significantly related to environmentals—the justification would be on the untestable grounds that the unknown efficiencies of the units must be themselves be correlated with environmentals! [For 43PFs, the claim has been made that the Police Funding Formula (PFF), that determines the input cost of police forces, is a realization of the “No problem!” idea. The question is whether the PFF, in its determination of the police force input x (see Section 2.3), effectively does the job of allowing for environmentals—in the sense that, once they have been locked into x, we can throw away the key and forget about them. One may have the intuition that such a technical achievement would require a knowledge & understanding of the 43PFs problem that goes well beyond what is currently available. Appendix 2 looks in detail at the PFF and finds good reason to dismiss the idea. The easiest technique to think about is TTW’s veff. For this, there is no way of logically refuting the claim that, once a police force has been given its PFF-determined x, it should be able to attain, by routine activity, the same value of veff as any other police force of equal efficiency whatever its environmentals. (This claim would be making a heavy assumption about the relationship of the value-weights v in veff to the activity requirements of different outputs.) The only way of weighing the claim is to look at the detailed formulation of the PFF (which we do in Appendix 2).] The possibility of confounding of efficiencies with environmentals is a realistic one, and has to be taken into account in any method of allowance for environmentals (see Section 3.5). To the extent that it is possible, take for a mental picture the 1 + s + t dimensional space of input, outputs and environmentals. It is instructive in considering the logic of allowance for environmentals to imagine that there is no shortage of units—that n is effectively infinite. It would then be possible to use the environmentals z to define thin strata, and look at things firstly within individual strata and then between strata. Within a stratum, there is no question of correlation between efficiency and environmentals since the latter are essentially constant, and any measure of efficiency will be determined by the variations of (x, y) for the fixed z. Between strata, the variation of the stratum means of (x, y) or its logarithmic transforms must reflect either a necessary “technical” dependence on environmentals at constant efficiency level, or some relationship between efficiency and environmentals. Such insight may clarify logic, but it does not resolve the problem of how to do the calculations when, as in 43PFs, we have only 43 units spread over at least 21 dimensions (1+10+10 = 21 is a barely realistic minimum). What will help is committment to the TTW principle of taking veff as an undoctored measure and only then bringing in the environmentals: for 43PFs the dimensionality might then go down to 10, which is still a large enough number. The realism that may then be attainable will depend strongly on the particular problem. There are two general ideas that may help out: unit subdivision and cross-validation. Farrell’s units were states of the U.S.A., for which the data may have been available or obtainable at the much smaller county level. The units for 43PFs occupy, on average, 1/43rd the area of England & Wales, and the data must have been recorded at the level of the smaller police force divisions or the even smaller basic command units. Variations within units may be more informative about the technical dependence of performance on environmentals than on efficiency, given that efficiency may be fairly homogeneous in the centrally organized body that is a police force. In any case, the extra data would be relevant in some way. When it comes to fitting the equations, the technique of cross-validatory choice (Stone, 1974) of an allowance formula should be considered as a model-free device for controlling over-adjustment. Posterior adjustment of veff by z makes statistical sense because of the intrinsic character of veff: each unit’s veff has its own z to be considered as influential for that very unit alone. The same cannot be said for teff, whose heavily interactive character allows each unit’s teff to be, perhaps strongly, influenced by the environmentals of other units. For DEA, there are additional grounds for questioning any affirmative answer to our initial question. They concern the nature of what is being claimed. For CRS-based teff, for example, it would have to be maintained that, if they are equally efficient, police forces with the same output profile (i.e. the ratios of the y(j) to each other) would be 32 able to attain the same value of teff. Would that be realistic? 4.5. NEGOTIABLE VALUE-WEIGHTS. A crucial question remains to be explored: how are the value-weights v to be arrived at? A benevolent dictatorship will want to retain the goodwill of the individuals who work in the n units being assessed, even though its main concern is to influence the units to maximise the societal value of their outputs. We are, after all, concerned with non-profit-making organizations that depend greatly on the organizational skills of managers responsive to pressures and demands beyond the ken of any supervisory quango representing the interests of society at large. We are faced with a dilemma. On the one hand, it would be unreasonable for purely societal value-weights v to be imposed that took little or no account of the relative internal costs of generating different outputs. On the other hand, weights that simply reflected best estimates (in some sense) of the existing internal costs would be failing to give any incentive for units to meet the “market pressures” represented by the supposedly widely agreed societal values of different outputs. What is needed is some negotiable, even experimental, compromise between these two extremes. Appendix 3 presents a possible way in which this compromise might be reached: it offers a one-parameter family of value-weights in which the parameter allows for a negotiable “fine-tuning”. The proposal does not override the necessity of environmental allowance along the lines of Section 4.4. 4.6. SOME OBJECTIONS TO TTW. At a critical juncture in its tussle with the 43PFs problem, the report of the Treasury’s Public Services Productivity Panel (Spottiswoode, 2000) maintains that there are two key objections to using a “simple” or “very simple” efficiency index: “First, it allows for no variation in the weight that could be assigned to each outcome for each authority and its force. But the relative importance of each outcome could legitimately vary from force to force, reflecting local circumstances and local police plans. An approach that lets the outcome weights vary is preferable.” This objection clearly relates to the priorities/environmentals question (Section 2.16) and will not be further discussed here: readers will no doubt make their own assessment of the balance of advantage (if any) and disadvantage (if any) in allowing locally influenceable weights—a very different matter from non-local allowance for locally influential but uncontrollable environmentals. Quoting again: “Second, such an index would implicitly assume a ’linear’ or straight-line relationship between inputs and outcomes; that is, if a force doubled the inputs it would get double the outcomes. This is unlikely to hold in practice. An approach that allowed for non-linear relationships between inputs and outcomes would be preferable.” It is quite right to say that veff, having the form of aggregated value per unit cost, would remain constant if its single input cost and its output volumes were doubled. But that is far from constituting an assumption about what might happen if a police force did such-and-such. If “in practice”, it were found that veff decreased with x (representing the “size” of the force), there are two options open to TTW: either to accept the finding as indication that larger forces are more inefficient than smaller ones and perhaps go on to suggest the subdivision of the organizational structure of large forces or to make an allowance for size treated as an environmental variable over which the forces have no control (see Section 4.4). Compared with this above-board option, can it really be preferable to build some econometrically motivated, non-linear relationship of questionable technicality into the method (DEA or SFR!) that then automatically calculates some “efficiency” measure? 4.7. NEGLECTED ADVICE. Annex D of Spottiswoode (2000), entitled “Summary of advice on DEA & SFA from economics and econometric specialists”, has four paragraphs on what it got from the National Economic Research Associates. NERA sees DEA & “regression” as useful initial “top down” analyses and accepts the idea of a “cross-check” between the two (noting a particular difficulty in the interaction between regression and the Police Funding Formula). If this approach is to be followed, it suggests using DEA for exploring the database but then making a “bottom up” analysis involving detailed assessments of costs & activities— “a huge undertaking”: 33 “If agreement can be reached on specific weights for all outputs then a more straightforward form of ratio analysis than DEA may be feasible. However, any such analysis should make allowance for variations in operating circumstances.” A less demanding approach would be to examine the alternative of incentive mechanisms based on a clear definition of outputs with appropriate weights and, feasibly, on a “mechanism that would avoid the need to undertake the relative efficiency assessment, perhaps by relating senior management pay to increases in a weighted output index over time.” The Spottiswoode report does not justify its inadequate response to this annexed advice. [The four paragraphs here summarized were a straight transcript from NERA (1999), with only changes in authorial voice eg. from “we” to “NERA”. The rest of the NERA report can be recommended for its good sense about the problem here faced. It reveals that the brief given to NERA by the Treasury Panel was narrowly focused on DEA & SFR, but the good sense referred to was not restricted to the consideration of these two related techniques. ] 5. TWO APPLICATIONS OF DEA, SFR, & TTW. 5.1. THE 21 SPANISH HIGH COURTS. The study of Pedraja-Chaparro & Salinas-Jimenez (1996, P-C/S-J for short) follows closely that of Kittelsen & Førsund (1992), who applied DEA to the 107 distict courts of Norway with two inputs (the numbers of judges & office staff) and seven outpts (the numbers of cases in categories that cover, and do not merely represent or act as proxies for the whole activity of the courts). The Norwegian study does not give the data for re-analysis, but two revealing quotations from it are worth reproducing here, before we move to Spain for published data. Justifying the use of DEA as a second-best option: “ Faced with detailed, if not comprehensive, data on the quantities of services produced but no information on the relative values of the different services, the problem is to find methods that can utilize the information available in the data but does not demand more information.” Justifying the reduction to seven categories of an initial list of 19 categories: “ One of the characteristics of the DEA method is that as the dimensionality of the problem is increased the number of efficient units increases as well. Use of the 19 product subdivisions . . . would generate fully efficient scores for almost all the courts. Apart from being an unreasonable result, such an analysis would hardly give any interesting information.” The 21 Spanish High Courts studied by P-C/S-J were those in the Administrative Litigation Division, engaged in relatively homogeneous cases. For their main study, P-C/S-J used two inputs & two outputs but no environmentals. Here we replace the two inputs (staff numbers) by their total financial cost x: Inputs x = total cost of x(1) judges & x(2) office staff Outputs y(1) = no. of cases resolved by full process y(2) = no. of other cases P-C/S-J had some initial reservations about their analysis: “The selection of the outputs may be criticized for not taking into account two major points: first, the heterogeneity in each of the two outputs, which might explain the different efficiency scores . . . second, the important consequences for the DEA results of an incorrect model specification (selection of variables).” In defense, they plead both non-availability of more detailed data and, referring to Nunamaker’s Theorem, the fact that, with only 21 units, efficiency scores would be very sensitive to any increase in the number of variables. A Spanish bull of a dilemma with two fine horns! 34 Figure 9: The CRS-motivated reduced-data scattergram of 21 Spanish high courts. Plots of p(1) = y(1)/x and p(2) = y(2)/x against x show no evidence of size dependence. Fig.9 shows that the CRS frontier in the (p(1) , p(2)) scattergram is determined by courts 17 & 19, and that these two courts are the so-called 100% efficient peers for the other 19 courts. Table 1 gives the results of the calculation of eight different percentage “efficiencies”: • teffa = P-C/S-J’s CRS teff based on x(1) & x(2), • teff = the CRS teff based on x, • Uh = U based on (13) with Cobb-Douglas f (y), v normal, & u half-normal, • Ue = ditto but with u exponential, • %veff = 100 × veff/(max{veff} in D) for four choices of value-weight ratio. Note that teff ≤ teffa, uniformly, as Nunamaker’s theorem dictates. That Uh & Ue can differ appreciably reflects the sensitivity of SFR to distributional assumptions. If the disagreement between the DEA and SFR measures here 35 Court 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 y(1) 281 897 3699 582 617 617 1343 679 2889 9634 675 1138 1498 757 1164 821 663 885 2091 2249 332 y(2) 77 440 2692 171 152 207 458 88 1204 4674 453 357 443 514 333 390 1256 901 370 1994 1741 x(1) 4 7 28 5 5 3 14 4 21 54 5 8 10 5 5 4 4 6 6 9 21 x(2) 9 19 75 10 10 10 21 10 43 143 10 15 19 10 13 9 10 11 21 34 46 x 40603 79661 367693 48929 50548 40380 132916 45292 225875 662387 50341 78030 100215 48928 54682 40615 45668 57495 79655 123828 227047 teffa 32 52 57 60 63 65 67 68 72 74 77 79 82 87 92 99 100 100 100 100 100 teff 28 49 48 48 48 63 41 57 54 63 63 59 60 73 85 88 100 82 100 92 80 Uh 47 69 65 70 71 81 63 78 74 80 77 79 80 83 89 89 82 83 92 87 88 Ue 51 78 74 79 80 87 72 86 83 87 85 86 87 88 92 92 88 88 94 91 91 8/1 27 45 41 46 47 59 39 57 50 57 54 56 58 63 82 80 67 65 100 75 75 4/1 27 46 43 47 47 60 40 56 52 59 57 57 59 66 83 82 78 70 100 81 76 2/1 28 49 48 48 48 62 41 56 54 63 63 59 60 73 85 88 99 81 100 92 80 1/1 21 40 41 37 36 49 32 40 43 51 53 46 46 62 65 71 100 74 74 82 64 Table 1: Data for y(1), y(2), x(1), x(2) & x from P-C/S-J’s Tables 5 & 8, and eight percentage “efficiencies”: two for DEA, two for SFR, and four for TTW. were viewed as relatively unimportant in the overall comparison of forces, consistency does not mean validity: the two sets of measures have other crucial assumptions in common. That both Uh & Ue exhibit appreciable shrinkage compared with teff (smaller teff’s are raised, larger teff’s are lowered) reflects the moderating influence of SFR’s statistical approach in its difficult judgement between what is random error and what is inefficiency, in the residual. The %veffs for v1 /v2 = 2/1 (giving “full process” cases twice the value of cases that do not go through the whole legal process) are very close to the teffs—which is not surprising given that the frontier line between units 17 & 19 has the equation 1.95p(1)+p(2) = constant. It may be noted that although veff is monotone in each vi , %veff is not monotone in the ratio v1 /v2 . The changes in %veff as v1 /v2 is varied reflect both the interpretability of this measure and its to-be-expected sensitivity to such variation. 5.2. THE 43 POLICE FORCES OF ENGLAND & WALES. The case for efficiency assessment of the 43 police forces of England & Wales is has been put by Spottiswoode (2000) as follows: “There is a plethora of indicators and information about police outputs and outcomes. But, to date, it has not been possible to draw this information together to build a comprehensive or systematic measure of relative police efficiency in meeting their ultimate objectives of promoting safety and reducing crime, disorder and the fear of crime.” The Audit Commission (1999) has been rather more explicit: “Police response to emergencies is improving, and the proportion of crimes detected is increasing. However, some of these improvements, such as the increases in detection rates, are principally the result of falling levels of crime—the number of crimes being solved is not generally increasing. . . . the public will welcome the fact that the chance of any particular crime being cleared up has increased. However, people may question why increases in spending have not resulted in the absolute number of crimes being cleared up.. . . there 36 are significant variations in performance between police forces. These variations cannot simply be explained by differences in workload or in the circumstances forces face. While overall those forces recording the highest levels of crime also have the lowest detection rates, a few forces with the highest recorded rates of burglaries and violent crimes have among the highest detection rates. And the clear up rate for burglaries in the best performing metropolitan force is almost twice that for the worst.” The Spottiswoode report is torn between (i) recognition of the multi-faceted character of the outputs (which we take to include “outcomes”) needed to “reflect the key outcomes that the police are expected to achieve” and (ii) the inability of DEA to cope with more than a limited number of inputs & outputs: “In the case of the police in England and Wales, the number of forces (43) is relatively small, and discipline is required on the number of input and outcome variables to be used. This technical constraint would be overcome if efficiency measurement is undertaken at Basic Command Unit (BCU) level or by using time-series data. However, even if BCU level data is used, a limited number of variables should be preferred, as this would facilitate the task of establishing and tracking relationships between variables, and limit the scope for data and measurement error.” The report is optimistic that a balance can be struck, with about eight outputs (“best value performance indicators” is the favoured term). It is “. . . critical that the selected outcome measures capture the essence of police outcomes and thus, implicitly, the many dimensions to policing. This does not mean that there has to be a multitude of outcome measures. The focus of the outcome measures should be on what the police are being expected to achieve for the money they have. This is different from trying to model everything that forces do on a day-to-day basis.” Are these the words of good sense freely arrived at—or of rationalization forced by an unwisely favoured technique? What does it mean to “capture the essence”? For this study, it has not yet been possible to carry out a realistic analysis of 43PFs that starts, as it must, with a carefully selected, comprehensive set of volumetric output variables and relevant environmentals. [There is no problem with the cost input x: H.M.Treasury’s concern with public expenditure has ensured that.] So this section will be used to prepare the ground for such analysis by looking at two papers that—ignoring the interesting work of Thanassoulis (1995) on which we have commented in Sections 2.15 & 4.2—claim to be “the first to examine the relative efficiency of the English & Welsh police forces”. Unlike the Spanish courts study, the papers do not include the basic data (212 records & 1505 numbers): so we are unable reduce the data to a single input database for re-analysis. However, there are features of the authors’ increasingly arcane analysis that will serve to test our obvious prejudices. For both studies, Drake & Simper (1999, 2000) have taken 5 years’ data with 4 inputs, but only 3 outputs and no environmentals: Inputs employment costs premises-related expenses transport-related costs capital & other costs. Outputs clear-up rate no. of traffic offences dealt with no. of breathalyser tests 5.2.1. THE FIRST STUDY The first paper (Drake & Simper, 2000) does not state whether “clear-up rate” is a proportion or the volumetric number of clear-ups. The former would tend to bias the findings against units with large values of the inputs, which are all volumetric. The paper also does not state whether “Taken Into Consideration” (TIC) cases are included, a notorious source of inflatable statistics: the highly critical views of Walker (1992) about the use of clear-ups are quoted and disregarded. The other two outputs might also be questioned on the grounds that they are easily & cheaply manipulable by police forces so that their use may give a distorted efficiency picture. Moreover the small number of outputs would leave many forces with the sense that their full range of activities was not being given due weight in any derived efficiency measure. Neither these questions, nor those associated with environmentals, are dealt with, either before or after the straightforward calculation of 37 • “overall technical efficiency” OE (CRS-based teff) = AQ/AR in Fig.4, • “pure technical efficiency” P T E (VRS-based teff) = AS/AR in Fig.4, • “scale efficiency” SE = OE/P T E = AQ/AS in Fig.4. These three quantities, described as highly informative, are the basis of some suggestions for enhancing the efficiency of English & Welsh policing. It is claimed that they “may enable us to shed some light on the optimal size and structure of police forces”. Despite the modesty of tone, this is a bold claim. What is it in the results obtained from such limited data, by a method with acknowledged defects, that can justify such a claim? The authors focus comment on percentages that constitute a 2×2 table of 5-year average efficiencies, with a derived third column: Police force Surrey “The Met” . CRS(OE) 62% 58% VRS(P T E) 69% 100% CRS/VRS(SE) 89% 58% Surrey’s CRS of 62%—the lowest 5-year average among the 35 “non-metropolitan” forces (excluding City, “The Met”, and six other forces)—suggests “failure to utilise resources effectively”. But Surrey’s SE of 89% suggests that it is “not too far removed from the constant returns region of operation” (a remark that would merit further analysis). This contrasts with the performance of “The Met”. Its CRS of 58% is lower than that of Surrey and the lowest of all 43 forces, but it seems “it would be inappropriate to label the Metropolitan as a highly inefficient police force”: the VRS of 100% (necessarily 100% in each of the five years) suggest that “given the scale of the Metropolitan’s operations, it is a highly efficient police force with no obvious inefficiencies in resource utilisation” (the latter phrase may be thought to point misleadingly to allocative efficiency). But the Metropolitan Commissioner of Police (the Met’s chief constable) must read on before congratulating his force, for the paper will tell him that his SE of 58% confirms (really an extant implication of the VRS value of 100%) that “all of the observed overall inefficiency is associated with scale effects. Given that the Metropolitan is the largest force in the country, this result strongly suggests that there are diseconomies of scale at work in respect of large police force operations. As in other large organisations, this is probably attributable to the extra bureaucracy and layers of management structure which tend to accompany large scale.” In this welter of econometrically motivated comment, it is salutary to note that the Met would get a VRS(P T E) of 100% whatever it did, as long as it (by far the largest force in the country) managed to turn in the largest score on any one of the three outputs of this study. It may also be noted that the inference about overall inefficiency being attributable to diseconomies of scale, in the passage just quoted, is from a sample of size one: the VRS values of all but one of the other six large “metropolitan” forces are all less than the average of the 35 generally smaller non-metropolitan forces. [The exception is Greater Manchester which looks as if it has, like the Met, hit the jackpot with a VRS value of 100% in all five years—did it repeatedly hit the maximum for one particular output?]. The paper then uncovers another source of speculation—one that Section 2.7 has already noted. The results show “clear evidence of an inverted U shaped relationship in respect of scale efficiency [CRS/VRS].” It is an instructive exercise to see, with Fig.4 as a guide, that this feature is almost inevitably present whatever the data—as a necessary consequence of the construction of nested & contiguous convex bodies of feasible performances for CRS & VRS. So we need not read much into the comment on the second feature, that “[although it is] a very common finding in economic studies of industrial production, it is a particularly interesting result to find that the same economic production relationship appears to hold good in public sector services such as policing.” The rest of the paper is of interest only for its use of unnecessarily advanced statistical techniques to analyse the 212 (5×43 - 3) force/year combinations (3 were unavailable) as if they were from 212 independent police forces. 38 5.2.2. THE SECOND STUDY Spottiswoode (2000) dedicates the last two paragraphs of its Technical Annex C to the second study (Drake & Simper, 1999), describing it as “more encouraging” than other studies to the expressed hope that, “properly specified”, DEA & SFA “should produce broadly similar results”. The stress here on consistency of the two techniques may be diverting attention from the more important question of what it is they actually do with the data. There is not the space here to give a detailed critique of the study, which would, in any case, be better done by a first-rank econometrician. One feature does however strike this statistician as rather odd. The method (an adaptation of the SFR of Section 3.3 to accommodate multiple inputs and to take account of yearto-year changes) requires “prices” w1 , . . ., w4 and redefined inputs so that the “inputs” listed in Section 5.2 are the “cost shares” (for those prices) of the redefined inputs. The latter are taken to be labour & (3 times repeated) total population in the police force area i.e. only two different inputs. The method then adopts & adapts some standard econometric cost function theory to posit an input-allocatively-optimal “trans-log” total cost function of the 8 variables y(1), y(2), y(3), w1 , w2 , w3 , w4 and year, which any observed total cost (the sum of the original “inputs”) is then taken to exceed by a unit- & year-dependent inefficiency factor. The method makes no distributional assumptions about the make-up of this factor, as is done in fully stochastic SFR. Because of the (unjustified?) multiplicity of prices, there are as many as 37 parameters in the fitting of this function to the 212 observations of total (yearly) cost from the 43 police forces. If the problem of multiple inputs were by-passed by reducing them to a single cost, there remains the significant problem that the PFF funding formula (Appendix 2) determines a large proportion of that cost as a function of both outputs & environmentals. How could the manifest interaction of this determination with the empirical fitting of a cost function be resolved? ACKNOWLEDGEMENTS The following individuals and organizations have helped either knowingly or unwittingly: • Dr Juanita Roche (ex HM Treasury) for initial impetus • R.A.S. for continuing motivation • Hillingdon Council for free travel • staff of the Home Office RDS Economics & Resource Analysis Unit for healthy argument, computational assistance & generous access to literature • Ina Dau & Richard Chandler for divine help with emacs & LATEX. 39 APPENDIX 1 AN ILLUSTRATIVE SFR MODEL FOR SINGLE INPUT & OUTPUT. Consider the model, SFRM say, in which data for r = s = 1 are generated, independently for each unit, as follows: For 0 ≤ a < 1, 0 < b ≤ 1, c > 0, unobserved “true” input, X, and output, Y = X/c, are associated with the generic unit g = (x, y), where x & y are independently randomly related to X & Y by x = X/U and y = Y V , where U is randomly uniformly distributed in the interval (a, 1), and V is randomly uniformly distributed, with expectation unity, in (1 − b, 1 + b). This model is not proposed as a serious competitor for the attention of stochastic frontier analysts: it serves only to reveal problems that arise even in grossly simplified models—problems that might be obscured, beyond ease of understanding, by the necessary technicality of apparently more realistic models. No attempt will be made here to explore these problems beyond the point at which a reader’s curiosity might be satisfied. The parameters of the model can be taken to be a, b, c, Y1 , ..., Yn (equivalently X1 , ..., Xn ). Their number, n + 3, increases with n, so that consistent & unbiased estimation cannot be expected as a matter of course. The data D has non-zero likelihood only when the parameters satisfy the inequalities axi /c < Yi < xi /c & yi /(1 + b) < Yi < yi /(1 − b), i = 1, ..., n (16) in which case the likelihood L is given (up to proportionality) by the nth power of d = c/[b(1 − a)]. Maximization of d subject to (16) gives unique maximum likelihood estimates (MLEs) of the parameters a, b & c However there will, in general, be an interval of MLEs for each of the parameters Y1 , ..., Yn (or X1 , ..., Xn ). The latter include X̂ for the generic unit, for which seff then has an interval of values Û = X̂/x as its non-uniquely determined MLE. The feature of non-uniqueness appears even in the case where a & b can be (unrealistically) fixed by superior knowledge. For the simplest case of a = 0 and b = 1, we maximise c subject to yi /2 < Yi < xi /c, i = 1, ..., n, for some Y. The maximum is ĉ = 2xm /ym where xm /ym is the minimum of the ratio x/y in D (for the unit with index m) and ĉ is then the MLE of c. The associated MLE of Ym is ym /2 and that of Xm is then ĉym /2 = xm . But, for i not equal to m, there is an interval (yi /2, xi /ĉ) of values of Yi that maximize the likelihood. Unit m is the one with the largest y/x and the one with the largest teff of 100% (see Section 2.4). Compatibly, for this unit, Û = 1 (100%) also. For i not equal to m, however, Û has an interval of values ranging from the value of teff for unit i up to 100%. Inspection of Fig.10 clarifies the picture. In Fig.10, the estimated “true” efficiency frontier is the ray through the origin, ORW: this line has twice the slope of the line OQ through unit m. The non-unique MLE of the “true” input X & output Y of g is the sub-interval RW. The non-unique MLE of the efficiency U increases from the value teff at R to 100% at W. For unit m, this interval shrinks to a single point at which the efficiency U is 100%. The unit g can be regarded as generated either from X = x × teff, Y = y/2 at R with an efficiency of teff and a doubling of Y (i.e. v = 2) or from X = x, Ŷ = x/ĉ at W with an efficiency of 100% and a v of y/Ŷ = 2×teff. (The latter can be seen in the trigonometry of Fig.10: teff = (xm /ym )/(x/y) = P Q/P S = P R/P T = OP/(OP + T W ) by “similar triangles”.) The non-uniqueness is an artefact of the use of uniform distributions in SFRM. However, an analogous near-flatness (in key directions) of the summit of the likelihood function will be a necessary consequence of the general structure of any model with two analogously competing components of random variation U & V , and n parameters Y1 , ..., Yn about which no assumptions are made. 40 Figure 10: The non-uniqueness of the maximum likelihood estimation of the SFR efficiency, seff, for generic unit g = (x, y) using a specialization of model SFRM. 41 Figure 11: Police force areas of England & Wales APPENDIX 2 THE POLICE FUNDING FORMULA (PFF) How the English & Welsh pay for their 43 police forces would merit a Palme d’Or for complexity—in a country that is far from backward in finding complex ways of achieving its administrative objectives. The funding mechanism can only be appreciated when no attempt is made to ascertain the logic of having several channels of funding. There are four major players in the distribution game: • Councils and Police Authorities (PAs) • Department for the Environment, Transport & the Regions (DETR) • Home Office • H.M.Treasury. 42 A useful Association of Police Authorities document (APA, 1999) warns its readers that the arrangements are “complex” (twice) and that the funding involves a “complicated formula” (four times). The purport of these warnings may be to congratulate clever mandarins rather than encourage readers to think that such complexity may be unnecessary. For each spending year, the Treasury decides on the “Total Standard Spending” (T SS) that central government will “support”. This sum includes a “Council Tax for Standard Spending” (CT SS) slice (15% of T SS) that is a notional aggregate contribution from the “rates”. It also includes a 15% slice from the business rates (N N DR by devious acronym). The Home Office weighs in with its 50% contribution—the “Specific Police Grant” (SP G). The remaining 20% is made up by the “Revenue Support Grant” (RSG) channeled through the offices of DETR. The balance sheet here is simply: T SS = SP G + (CT SS + N N DR + RSG) = SP G + SSA (17) where SSA is the thereby defined “Standard Spending Assessment”. How is T SS divided among the 43 police authorities? CT SS is allocated strictly by the numbers of households in Council Tax property bands—N N DR strictly by the number of residents in the police force area. The Home Office uses its PFF formula to divide its SP G, and gives DETR mandarins the task of allocating RSG so that SSA is also effectively divided by the same formula. [The pecking order appears to be: HMT−→ HO−→ DETR−→ PAs←→ Councils, in which only the Home Office and Police Authorities have significant knowledge of, and responsibility for, what police do: the Home Office has direct responsibility for the usage of £1.8B by “The Met”. The foreword by the Chief Secretary to the Treasury in Spottiswoode (2000) makes clear HMT’s determination to get involved in the nuts & bolts of police machinery. Where is the oversight of HMT’s competence to do this? Among the nuggets in the government paper “Adding It Up” (Cabinet Office, 2000) is the advice that “to perform their challenge role effectively central departments should undertake a review of their analytical capability”.] THE FORMULA. Three documents—APA(2000) & Home Office(2000a, 2000b)—give a good overall view of the make-up of this Home Office formula. But its final form, although stated with great precision, could have been designed to leave most interested parties in the dark. Applied to any particular funding sum, the formula first puts it into 10 “pots” (components) in portions determined by either necessity or by need (assessed by a broadly-based “activity analysis” for the country as a whole): 1. 29.8% for crime management 2. 8.6% for call management 3. 7.5% for public order management & public reassurance 4. 7.0% for traffic management 5. 2.6% for community policing 6. 17.5% for patrol 7. 14.5% for pensions 8. 10.0% for police establishment 9. 2.0% for additional security 10. 0.5% for sparsity. The bulk of the 26.5% in pots 7-9 is distributed by “necessity” with no environmental input, but environmentals are deeply involved in the share-out of the other pots to the 43 forces. The APA(1999) document puts an informative gloss on the statistical mechanics of this share-out: “The relative workload demands on individual police forces are . . . estimated using a range of indicators so that these individual pots of funding can be distributed to each police authority in proportion to these demands. . . . Information from a number of forces was pooled to produce a model of police workload based on the characteristics of force areas. These are mainly socio-demographic factors—for example, looking at all forces across the country, the number of incidents [the workload measure for the pot 1] is found to be statistically associated with density of population, long term unemployment and other population factors. Therefore, for any particular force, the predicted workload can be estimated on the basis of these characteristics or indicators, which then determines the distribution of funding under each of the components.” 43 The story from the statistical coalface for pot 1 (Home Office, 2000a) involves (i) an unspecified list of variables that ’might have an impact on the number of incidents’, (ii) principal components analysis, (iii) elimination of variables with either low correlation with the number of incidents or high correlation with others, (iv) multiple linear regression of the number of incidents on the selected variables, and (v) use of the fitted regression to divide the pot. More a minefield than a coalface, perhaps, but—justified “to avoid the introduction of perverse incentives by relying directly on workload data collected by police forces”—the regression technique also fills in missing data and allows dubious data to be rejected without excluding any force from a share of a pot. It is tempting to go on describing the details of PFF if only to point out that its precise but extended specification in the Home Office (2000b) document, accurate to the pound, cannot be much help to Chief Constables wanting to know how their own environmentals influence the share they get (a case for ONS oversight?) However, readers may already have concluded that, as constituted, PFF does not realize the “No problem!” aspiration of Section 4.4. APPENDIX 3 NEGOTIABLE VALUE-WEIGHTS Stage 1: Establish societal weights sj , j = 1, ..., s for the s outputs. Stage 2: Obtain from each unit, its own best estimate of how its input cost, generically x, should be notionally divided as x = x[1] + ... + x[s] to represent the internal costs of generating the output volumes y(1), ..., y(s) respectively. [The estimation may not be easy when the same internal costable activity serves to generate more than one output.] Stage 3: Calculate cost weights cj , j = 1, ..., s, as the medians, or more generally as trimmed means (Mosteller & Tukey, 1977), of the n ratios yi (j)/xi [j], i = 1, ..., n. Stage 4: Calculate negotiable value-weights vj (N ) = sj + N cj , j = 1, ..., s, where N is a negotiable positive number, for a range of values of N , together with the associated efficiences adjusted for environmentals. Stage 5: Negotiate the value of N with police authorities & chief constables. 44 APPENDIX 4 HISTORICAL NOTE We read in Cooper et al (2000) that: “No empirical applications [of the ideas of Koopmans (1951)] were reported before the appearance of the 1957 article by M.J.Farrell . . . This showed how these methods could be applied to data in order to arrive at relative efficiency evaluations . . . in contrast with Koopmans and Pareto who conceptualized matters in terms of theoretically known efficient responses without much (if any) attention to how inefficiency, or more precisely, technical inefficiency could be identified. Koopmans, for instance, assumed that producers would respond optimally to prices which he referred to as efficiency prices. . . . the identification of inefficiencies seems to have been first brought into view in [Debreu (1951)]. Even though their models took the form of linear programming problems, both Debreu and Farrell formulated their models in the tradition of ’activity analysis’. Little attention had been paid to computational implementation in the activity analysis literature. Farrell therefore undertook a massive and onerous series of matrix inversions in his first efforts. The alternative of linear programming algorithms was called to Farrell’s attention by A.J.Hoffman who served as commmentator in this same issue of the Journal of the Royal Statistical Society. Indeed, the activity analysis approach had already been identified with linear programming and reformulated and extended in [Charnes & Cooper, 1957]. See also Chapter IX in [Charnes & Cooper, 1961]. The modern version of DEA originated in two articles [Charnes, Cooper & Rhodes, 1978, 1981].” On another page, Farrell’s 1957 paper is seen as defective because it failed to deal effectively with “slacks”: “[Farrell] did not fully satisfy the conditions for Pareto-Koopmans efficiency but stopped short, instead, with . . . ’weak efficiency’ (also called ’Farrell efficiency’) because non-zero slack, when present in any input or output, can be used to effect additional improvements without worsening any other input or output. Farrell, we might note, was aware of this shortcoming in his approach which he tried to treat by introducing new (unobserved) ’points at infinity’ but was unable to give his concept implementable form. In any case, this was all accomplished by Charnes, Cooper & Rhodes [21 years later] in a mathematical formulation . . . as ’CCR-efficiency’.” These quotations from Cooper et al (2000) are suggesting that Farrell was trying, rather unsuccessfully, merely to apply already well-formulated ideas, and that we should refrain from describing his 1957 paper as “seminal”. What the quotations do not emphasize is that Farrell made “one small step” (Neil Armstrong) for econometrics with the “necessary parallels” (Farrell) that he modestly drew with the Pareto-Koopmans-Debreu theory. That theory was concerned with the overall efficiency of an “economic system” (Debreu) or a “managed enterprise” (Charnes & Cooper, 1961), broken down into interacting “activities”(Koopmans) or “production units” (Debreu). Farrell’s innovation was to see that the same technique that gave Debreu’s efficiency coefficient for the whole system could be applied to the individual units. The issue of his computational competence in dealing with “slack” is a relatively minor one: in any case, the paper of Farrell & Fieldhouse (1962) showed that Farrell had by then responded with more than necessary competence to Hoffman’s 1957 advice about the help to be found in linear programming. 45 REFERENCES Bracketed numbers refer to sections not pages. Aigner,D., Lovell,C.A.K., & Schmidt,P. (1977), “Formulation and estimation of stochastic frontier production function models”, J.Econometrics, 6, 21-37, [3.1]. Allen,R., Athanassopoulos,A., Dyson,R.G., & Thanassoulis,E. (1997), “Weights restrictions and value judgements in data envelopment analysis: evolution, development and future directions”, Annals of Operations Research, 73, 13-34, [2.15] . APA (1999), Pounding the Beat: A guide to police finance in England and Wales, Association of Police Authorities, London, [Appendix 2]. Arrow,K.J.,1951, Social Choice and Individual Values, Cowles Commission Monograph No.12, Wiley & Sons, New York, [4.1]. Audit Commission (1999), Local Authority performance indicators 1997/98: Police and Fire Services, Audit Commission, London, [2.15, 4.2, 5.2]. Banker,R.D., Charnes,A., & Cooper,W.W. (1984), “Some models for estimating technical and scale inefficiencies in Data Envelopment Analysis”, Management Science, 30, 1078-92, [2.6]. Bauer,P.W. (1990), “Recent developments in the econometric estimation of frontiers”, J.Econometrics, 46, 39-56, [3.3]. Cabinet Office (2000), Adding It Up, www.cabinet-office.gov.uk/innovation, London, [Appendix 2]. Charnes,A. & Cooper,W.W. (1957), “On the theory and computation of delegation-type models: K-efficiency, functional efficiency and goals”, Proceedings of the Sixth International Meeting of the Institute of Management Science, Pergamon Press, London, [Appendix 4]. Charnes,A. & Cooper,W.W. (1961), Management Models and Industrial Applications of Linear Programming, Wiley & Sons, New York, [Appendix 4]. Charnes,A., Cooper,W.W., & Rhodes,E. (1978), “Measuring the efficiency of decision making units”, European Journal of Operational Research, 2, 429-44, [2.12, 2.16, Appendix 4]. Charnes,A., Cooper,W.W., & Rhodes,E. (1981), “Evaluating programs and managerial efficiency: An application of Data Envelopment Analysis to program follow through”, Management Science, 27, 668-97, [Appendix 4]. Charnes,A. & Cooper,W.W. (1985), “Preface to topics in Data Envelopment Analysis”, Annals of Operations Research, 2, 59-94, [2.18]. Cooper,W.W., Seiford,L.M., & Tone,K. (2000), Data Envelopment Analysis: A Comprehensive Text with Models, Applications, and References, Kluwer, Boston, [1, 2.11, 2.12, 3.3, Appendix 4]. Cubbin,J. & Tzanidakis,G. (1998), “Regression versus data envelopment analysis for efficiency measurement: an application to the England and Wales regulated water industry”, Utilities Policy, 7, 75-85, [3.3, 4.1]. Debreu,G.(1951), “The coefficient of resource utilization”, Econometrica, 19, 273-92, [Appendix 4]. DETR (1999), “Performance indicators for 2000/2001”, Section 11.1, Department of the Environment, Transport and the Regions, London, www.local-regions.detr.gov.uk/bestvalue/bvindex.htm , [1]. Drake,L. & Simper,R. (1999), “X-efficiency and scale economies in policing: A comparative study using the distribution free approach and DEA”, Economic Research Paper No. 99/7, Department of Economics, Loughborough University [3.6, 5.2, 5.2.2]. 46 Drake,L. & Simper,R. (2000), “Productivity estimation and the size-efficiency relationship in English and Welsh police forces: an application of Data Envelopment Analysis and Multiple Discriminant Analysis”, International Review of Law & Economics, 20, 53-73, [2.6, 2.7, 5.2, 5.2.1]. Dyson,R.G., Thanassoulis,E. & Boussofiane,A.(1990),”Data Envelopment Analysis”, in Operational Research Tutorial Papers, L.C.Hendry & R.Eglese (editors), pp.13-28, Operational Society, U.K., [2.16, 2.18]. Farrell,M.J.(1957), “The measurement of productive efficiency (with discussion)”. J.Roy.Statist.Soc.A, 120, 253-90, [Abstract;2.1, 2.2, 2.4-2.6, 2.9, 2.10, 2.12, 2.15, 2.16, 2.18, 3.1, 4.4, Appendix 4]. Farrell,M.J. & Fieldhouse,M. (1962), “Estimating efficient production functions under increasing returns to scale”, J.Royal Statistical Soc. A, 125, 252-67, [2.6, 2.7, Appendix 4]. Foucault, M., (1973), The Order of Things: An Archaeology of the Human Sciences, Vintage Books, New York. Greene,W.H.(1980),”Maximum likelihood estimation of econometric frontier functions”, J.Econometrics, 13, 27-56, [3.3]. Greene,W.H.(1995), LimDep Version 7.0 User’s Manual, Econometric Software, Inc., Castle Hill, [3.2]. Home Office (2000a), “Police Funding Formula 2000/2001, AFWG(00)1”, RDS Economics & Resource Analysis Unit, [2.18, 4.4, Appendix 2]. Home Office (2000b), “The Police Grant Report (England and Wales) 2000/01”, www.homeoffice.gov.uk/ppd/pru/pgr2001.htm, [Appendix 2] . Jarvie,I.C. (1985), “Philosophy of the social sciences”, in The Social Science Encyclopedia” eds. Adam & Jessica Kuper, Routledge & Kegan Paul, London. Jondrow,J., Lovell,C.A.K., Materov,I.S., & Schmidt,P. (1982), “On the estimation of technical efficiency in the stochastic frontier production function model”, J.Econometrics, 19, 233-8, [3.2]. Kittelsen,S.A.C. & Førsund,F.R. (1992), “Efficiency analysis of Norwegian district courts”, J.Productivity Anal., 3, 277-306, [5.1]. Koopmans,T.C.(ed.)(1951), Activity Analysis of production and Allocation, Cowles Commission Monograph 13, Wiley & Sons, New York, [Appendix 4]. Kopp,R.J. & Mullahy,J.(1990), “Moment-based estimation and testing of stochastic frontier models” , J.Econometrics, 46, 165-83. Kumbhakar,S.C. & Lovell,C.A.K. (1999), Stochastic Frontier Analysis, Cambridge University Press, [1]. Levitt,M.S. & Joyce,M.A.S. (1987), The Growth and Efficiency of Public Spending, Cambridge University Press, [2.6, 2.16]. Lewis, Sue (1986), “Measuring output and performance: Data Envelopment Analysis”, H.M.Treasury’s Public Expenditure Survey Committee: Development Sub-Committee, [4.1]. Mosteller,F. & Tukey,J.W. (1977), Data Analysis and Regression, Addison-Wesley, Reading, Mass., [Appendix 3]. NERA (1999), “Peer review of a possible approach to better measure police efficiency: A report for the Public Services Productivity Panel”, National Economic Research Associates, London, www.nera.com. Norman,M. & Stoker,B. (1991), Data Envelopment Analysis: The Assessment of Performance, Wiley, Chichester, [1]. Nunamaker,T.R. (1985), “Using Data Envelopment Analysis to measure the efficiency of non-profit organizations: a critical evaluation”, Managerial and Decision Economics, 6, 50-8, [2.14, 2.16]. 47 Pedraja-Chaparro,F. & Salinas-Jimenez,J. (1996), “An assessment of the efficiency of Spanish Courts using DEA”, Applied Economics, 28, 1391-403, [5.1]. Pedraja-Chaparro,F., Salinas-Jimenez,J., & Smith,P. (1999), “On the quality of the data development analysis model”, J.Operational Res.Soc., 50, 636-44, [1]. Schmidt,P. (1985), “Frontier production functions ”, Econometric Reviews, 4, 289-355, [1, 3.3, 3.4] . Spottiswoode,C. (2000), “Improving police performance: A new approach to measuring police efficiency”, www.hmtreasury.gov.uk/pspp/studies.html, Public Services Productivity Panel, H.M. Treasury, London, [1, 2.9, 2.18, 3.3, 3.6, 4.1, 4.3. 4.6, 5.2, Appendix 2]. Stone, M. (1974), “Cross-validatory choice and assessment of statistical predictions”, J.Roy.Statist.Soc.B, 36, 111-47, [4.4]. Thanassoulis,E., Dyson,R.G., & Foster,M.J., (1987), “Relative efficiency assessments using Data Envelopment Analysis: An application to data on rates departments”, J.Operational Res.Soc., 38, 397-411, [2.9, 2.14]. Thanassoulis,E. (1993), “A comparison of Regression Analysis and Data Envelopment Analysis as alternative methods for performance assessments”, J.Operational Res. Soc., 44, 1129-44, [3.4]. Thanassoulis,E. (1995), “Assessing police forces in England and Wales using Data Development Analysis”, European J. Operational Res., 87, 641-57, [2.15, 4.2, 5.2]. Walker, Monica A., (1992), “Do we need a clear-up rate?”, Policing and Society, 2, 293-306, [5.2.1]. INDEX Numbers refer to sections. Arrow’s Impossibility Theorem, 4.1 benevolent dictatorship, 4.1 DEA • see Farrell,M.J. • allowance for environmentals, 2.18 • as a flattering technique, 2.10 • as EDA, 2.19 • as oracular goddess, 2.18 • does not rank units, 2.7 • lack of discrimination, 2.14 • questionable generalization, 2.11 • sensitivity to outliers (takeover by?) outliers, 2.13 • testing & validating (is it possible?), 2.19 • weight constraints, 2.15 efficiency 48 • frontier F , 2.1 • stochastic (seff or U ), 2.1, 5.1 • value (veff), 2.4, 2.10, 3.2, 5.1 • technical/Farrell/DEA (teff), 2.1, 2.4, 2.10, 5.1 environmentals, 1, 2.16 • as “negative” outputs, 2.18 • and Farrell/DEA, 2.18 • and stochastic frontier regression, 3.5 • and “the third way”, 4.4 Farrell,M.J. • approach of (modified), 2 • priority question, Appendix 4 feasible • performance or point, 2.1 • construction of feasible set C, 2.2 Foucault’s blank spaces, 2.19 frontier, 2.1, 2.2 • unit, 2.1 input • minimization, 2.4 • volumetric, 1 linear programming, 2.12, 2.18 mixing, 2.2 Nunamaker’s Theorem, 2.14 output • “positive”, 2.9 • “negative”, 2.9, 2.10 • non-volumetric, 2.10 49 • volumetric, 1, 2.9 output maximization, 2.4 points at infinity, 2.5, 2.12, Appendix 4 police • comments of Audit Commission, 5.2 • funding formula, 2.18, 4.4, Appendix 2 • inputs, outputs & environmentals, 1 • priorities, 2.16 • study of Thanassoulis, 2.15, 4.2 • studies of Drake & Simper, 5.2, 5.2.1, 5.2.2 rescaling, 2.2 returns to scale • constant (CRS), 2.2, 2.4, 2.6, 2.9, 2.10, 2.12 • decreasing (DRS), 2.4, 2.6, 2.9, 2.10 • increasing (IRS), 2.6 • variable (VRS), 2.6, 2.7 • comments of Banker, Charnes & Cooper, 2.6 • comments of Drake & Simper, 2.7, 5.2.1 “state of the art”, 3.6 Spanish courts re-analysis, 5.1 stochastic frontier regression, 3 • as a consistency “check” on DEA, 3.3, 5.2.2, 4.7 • errors in all variables, 3.3 • exogenicity condition, 3.3 • simulation of Thanassoulis, 3.4 technical efficiency as flattering upper bound, 2.10 “the third way”, 4, 5.1 • allowance for environmentals, 4.4 • concurrence with NERA recommendation, 4.7 50 • cross-validation, 4.4 • Mark One algorithm, Appendix 3 • objections, 4.6 • unit subdivision, 4.4 value judgement, 2.4 • freedom from, 1 value weights, 2.10, 4.3 worse • certainly, 2.1 • proportionally, 2.1 • worsening, 2.2 51

QUESTIONS OF EFFICIENCY ASSESSMENT IN PUBLIC SERVICES:

Related documents

Products

Support

QUESTIONS OF EFFICIENCY ASSESSMENT IN PUBLIC SERVICES:

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib