- University at Albany

Data Envelopment Analysis Data Envelopment Analysis (DEA) is an increasingly popular management tool. This write-up is an introduction to Data Envelopment Analysis (DEA) for those who are unfamiliar with the technique. For a more in-depth discussion of DEA, the interested reader is referred to Seiford and Thrall [1990] or the seminal work by Charnes, Cooper, and Rhodes [1978]. DEA is commonly used to evaluate the efficiency of a number of producers. A typical statistical approach is characterized as a central tendency approach and it evaluates producers relative to an average producer In contrast, DEA compares each producer with only the "best" producers. By the way, in the DEA literature, a producer is usually referred to as a decision making unit or DMU. DEA is not always the right tool for a problem but is appropriate in certain cases. (See Strengths and Limitations of DEA.) In DEA, there are a number of producers. The production process for each producer is to take a set of inputs and produce a set of outputs. Each producer has a varying level of inputs and gives a varying level of outputs. For instance, consider a set of banks. Each bank has a certain number of tellers, a certain square footage of space, and a certain number of managers (the inputs). There are a number of measures of the output of a bank, including number of checks cashed, number of loan applications processed, and so on (the outputs). DEA attempts to determine which of the banks are most efficient, and to point out specific inefficiencies of the other banks. A fundamental assumption behind this method is that if a given producer, A, is capable of producing Y(A) units of output with X(A) inputs, then other producers should also be able to do the same if they were to operate efficiently. Similarly, if producer B is capable of producing Y(B) units of output with X(B) inputs, then other producers should also be capable of the same production schedule. Producers A, B, and others can then be combined to form a composite producer with composite inputs and composite outputs. Since this composite producer does not necessarily exist, it is typically called a virtual producer. The heart of the analysis lies in finding the "best" virtual producer for each real producer. If the virtual producer is better than the original producer by either making more output with the same input or making the same output with less input then the original producer is inefficient. The subtleties of DEA are introduced in the various ways that producers A and B can be scaled up or down and combined. Numerical Example To illustrate how DEA works, let's take an example of three banks. Each bank has exactly 10 tellers (the only input), and we measure a bank based on two outputs: Checks cashed and Loan applications. The data for these banks is as follows:   Bank A: 10 tellers, 1000 checks, 20 loan applications Bank B: 10 tellers, 400 checks, 50 loan applications  Bank C: 10 tellers, 200 checks, 150 loan applications Now, the key to DEA is to determine whether we can create a virtual bank that is better than one or more of the real banks. Any such dominated bank will be an inefficient bank. Consider trying to create a virtual bank that is better than Bank A. Such a bank would use no more inputs than A (10 tellers), and produce at least as much output (1000 checks and 20 loans). Clearly, no combination of banks B and C can possibly do that. Bank A is therefore deemed to be efficient. Bank C is in the same situation. However, consider bank B. If we take half of Bank A and combine it with half of Bank C, then we create a bank that processes 600 checks and 85 loan applications with just 10 tellers. This dominates B (we would much rather to have the virtual bank we created than bank B). Bank B is therefore inefficient. Another way to see this is that we can scale down the inputs to B (the tellers) and still have at least as much output. If we assume that inputs are linearly scalable, then we estimate that we can get by with 6.3 tellers. We do that by taking .34 times bank A plus .29 times bank B. The result uses 6.3 tellers and produces at least as much as bank B does. We say that bank B's efficiency rating is .63. Banks A and C have an efficiency rating of 1. Graphical Example The single input two-output or two input-one output problems are easy to analyze graphically. The previous numerical example is now solved graphically. (An assumption of constant returns to scale is made and explained in detail later.) The analysis of the efficiency for bank B looks like the following: If it is assumed that convex combinations of banks are allowed, then the line segment connecting banks A and C shows the possibilities of virtual outputs that can be formed from these two banks. Similar segments can be drawn between A and B along with B and C. Since the segment AC lies beyond the segments AB and BC, this means that a convex combination of A and C will create the most outputs for a given set of inputs. This line is called the efficiency frontier. The efficiency frontier defines the maximum combinations of outputs that can be produced for a given set of inputs. Since bank B lies below the efficiency frontier, it is inefficient. Its efficiency can be determined by comparing it to a virtual bank formed from bank A and bank C. The virtual player, called V, is approximately 54% of bank A and 46% of bank C. (This can be determined by an application of the lever law. Pull out a ruler and measure the lengths of AV, CV, and AC. The percentage of bank C is then AV/AC and the percentage of bank A is CV/AC.) The efficiency of bank B is then calculated by finding the fraction of inputs that bank V would need to produce as many outputs as bank B. This is easily calculated by looking at the line from the origin, O, to V. The efficiency of player B is OB/OV which is approximately 63%. This figure also shows that banks A and C are efficient since they lie on the efficiency frontier. In other words, any virtual bank formed for analyzing banks A and C will lie on banks A and C respectively. Therefore since the efficiency is calculated as the ratio of OA/OV or OA/OV, banks A and C will have efficiency scores equal to 1.0. The graphical method is useful in this simple two dimensional example but gets much harder in higher dimensions. The normal method of evaluating the efficiency of bank B is by using a linear programming formulation of DEA. Since this problem uses a constant input value of 10 for all of the banks, it avoids the complications caused by allowing different returns to scale. Returns to scale refers to increasing or decreasing efficiency based on size. For example, a manufacturer can achieve certain economies of scale by producing a thousand circuit boards at a time rather than one at a time - it might be only 100 times as hard as producing one at a time. This is an example of increasing returns to scale (IRS.) On the other hand, the manufacturer might find it more than a trillion times as difficult to produce a trillion circuit boards at a time though because of storage problems and limits on the worldwide copper supply. This range of production illustrates decreasing returns to scale (DRS.) Combining the two extreme ranges would necessitate variable returns to scale (VRS.) Constant Returns to Scale (CRS) means that the producers are able to linearly scale the inputs and outputs without increasing or decreasing efficiency. This is a significant assumption. The assumption of CRS may be valid over limited ranges but its use must be justified. As an aside, CRS tends to lower the efficiency scores while VRS tends to raise efficiency scores. Using Linear Programming Data Envelopment Analysis is a linear programming procedure for a frontier analysis of inputs and outputs. DEA assigns a score of 1 to a unit only when comparisons with other relevant units do not provide evidence of inefficiency in the use of any input or output. DEA assigns an efficiency score less than one to (relatively) inefficient units. A score less than one means that a linear combination of other units from the sample could produce the same vector of outputs by using a smaller vector of inputs. The score reflects the radial distance from the estimated production frontier to the DMU under consideration. There are a number of equivalent formulations for DEA. The most direct formulation of the example above is as follows: Let be the vector of inputs into DMU i. Let be the corresponding vector of outputs. Let be the inputs into a DMU for which we want to determine its efficiency and So the X's and the Y's are the data. The measure of efficiency for following linear program: be the outputs. is given by the where is the weight given to DMU i in its efforts to dominate DMU 0 and is the efficiency of DMU 0. So the 's and are the variables. Since DMU 0 appears on the left hand side of the equations as well, the optimal cannot possibly be more than 1. When we solve this linear program, we get a number of things: 1. The efficiency of DMU 0 ( ), with meaning that the unit is efficient. 2. The unit's ``comparables'' (those DMU with nonzero ). 3. The ``goal'' inputs (the difference between and ) 4. Alternatively, we can keep inputs fixed and get goal outputs ( ) DEA assumes that the inputs and outputs have been correctly identified. Usually, as the number of inputs and outputs increase, more DMUs tend to get an efficiency rating of 1 as they become too specialized to be evaluated with respect to other units. On the other hand, if there are too few inputs and outputs, more DMUs tend to be comparable. In any study, it is important to focus on correctly specifying inputs and outputs. The linear programs for evaluating the 3 DMUs are given by:         LP for evaluating DMU 1:         LP for evaluating DMU 2:   LP for evaluating DMU 3: min THETA st 5L1+8L2+7L3 - 5THETA <= 0 14L1+15L2+12L3 - 14THETA <= 0 9L1+5L2+4L3 >= 9 4L1+7L2+9L3 >= 4 16L1+10L2+13L3 >= 16 L1, L2, L3 >= 0 min THETA st 5L1+8L2+7L3 - 8THETA <= 0 14L1+15L2+12L3 - 15THETA <= 0 9L1+5L2+4L3 >= 5 4L1+7L2+9L3 >= 7 16L1+10L2+13L3 >= 10 L1, L2, L3 >= 0 min THETA       st 5L1+8L2+7L3 - 7THETA <= 0 14L1+15L2+12L3 - 12THETA <= 0 9L1+5L2+4L3 >= 4 4L1+7L2+9L3 >= 9 16L1+10L2+13L3 >= 13 L1, L2, L3 >= 0 The solution to each of these is as follows:                 DMU 1.                 DMU 2.   DMU 3. Adjustable Cells Cell $B$10 $B$11 $B$12 $B$13 Final Reduced Objective Allowable Allowable Value Cost Coefficien Increase Decrease 1 0 1 1E+30 1 1 0 0 0.92857142 0.619047619 0 0.24285714 0 1E+30 0.242857143 0 0 0 0.36710963 0.412698413 Name theta L1 L2 L3 Constraints Cell Name $B$16 IN1 $B$17 IN2 $B$18 OUT1 $B$19 OUT2 $B$20 OUT3 Final Shadow Value Price -0.103473 0 -0.289724 -0.07142857 9 0.085714286 4 0.057142857 16 0 Constraint Allowable R.H. Side Increase 0 1E+30 0 0 9 0 4 0 16 0 Allowable Decrease 0 1E+30 0 0 1E+30 Adjustable Cells Final Reduced Objective Allowable Allowable Cell Name Value Cost Coefficien Increase Decrease $B$10 theta 0.773333333 0 1 1E+30 1 $B$11 L1 0.261538462 0 0 0.866666667 0.577777778 $B$12 L2 0 0.22666667 0 1E+30 0.226666667 $B$13 L3 0.661538462 0 0 0.342635659 0.385185185 Constraints Final Shadow Cell Name Value Price $B$16 IN1 -0.24820512 0 $B$17 IN2 -0.452651 -0.0666667 $B$18 OUT1 5 0.08 $B$19 OUT2 7 0.0533333 $B$20 OUT3 12.78461538 0 Adjustable Cells Constraint Allowable Allowable R.H. Side Increase Decrease 0 1E+30 0.248205128 0 0.46538461 1E+30 5 10.75 0.655826558 7 1.05676855 3.41509434 10 2.78461538 1E+30               Allowable Cell Name Decrease $B$10 theta 1 $B$11 L1 0.722222222 $B$12 L2 0.283333333 $B$13 L3 0.481481481 Final Reduced Objective Allowable Value Cost Coefficien Increase 1 0 1 1E+30 0 0 0 1.08333333 0 0.283333333 0 1 0 0.42829457 0 1E+30 Constraints Cell Name $B$16 IN1 $B$17 IN2 $B$18 OUT1 $B$19 OUT2 $B$20 OUT3 Final Shadow Constraint Value Price R.H. Side -0.559375 0 0 -0.741096 -0.08333333 0 4 0.1 4 9 0.066666667 9 13 0 13 Allowable Increase 1E+30 0 16.25 0 0 Allowable Decrease 0 1E+30 0 0 1E+30 Note that DMUs 1 and 3 are overall efficient and DMU 2 is inefficient with an efficiency rating of 0.773333. Hence the efficient levels of inputs and outputs for DMU 2 are given by:  Efficient levels of Inputs:  Efficient levels of Outputs: Note that the outputs are at least as much as the outputs currently produced by DMU 2 and inputs are at most as big as the 0.773333 times the inputs of DMU 2. This can be used in two different ways: The inefficient DMU should target to cut down inputs to equal at most the efficient levels. Alternatively, an equivalent statement can be made by finding a set of efficient levels of inputs and outputs by dividing the levels obtained by the efficiency of DMU 2. This focus can then be used to set targets primarily for outputs rather than reduction of inputs. SAS Example. Alternate Formulation There is another, probably more common formulation, that provides the same information. We can think of DEA as providing a price on each of the inputs and a value for each of the outputs. The efficiency of a DMU is simply the ratio of the inputs to the outputs, and is constrained to be no more than 1. The prices and values have nothing to do with real prices and values: they are an artificial construct. The goal is to find a set of prices and values that puts the target DMU in the best possible light. The goal, then is to Here, the variables are the u's and the v's. They are vectors of prices and values respectively. This fractional program can be equivalently stated as the following linear programming problem (where Y and X are matrices with columns and respectively). We denote this linear program by (D). Let us compare it with the one introduced earlier, which we denote by (P): To fix ideas, let us write out explicitely these two formulations for DMU 2, say, in our example. Formulation (P) for DMU 2: min THETA st -5 L1 - 8 L2 - 7 L3 + 8 THETA >= 0 -14L1 -15 L2 -12 L3 + 15THETA >= 0 9 L1 + 5 L2 + 4 L3 >= 5 4 L1 + 7 L2 + 9 L3 >= 7 16L1 +10 L2 +13 L3 >= 10 L1>=0, L2>=0, L3>=0 Formulation (D) for DMU 2: max st - 5 - 8 - 7 8 5 U1 + 7 U2 + 10 U3 V1 - 14V2 + 9 V1 - 15V2 + 5 V1 - 12V2 + 4 V1 + 15V2 V1>=0, V2>=0, U1 + 4 U2 + 16 U3 <= 0 U1 + 7 U2 + 12 U3 <= 0 U1 + 9 U2 + 13 U3 <= 0 = 1 U1>=0, U2>=0, U3>=0 Formulations (P) and (D) are dual linear programs! These two formulations actually give the same information. You can read the solution to one from the shadow prices of the other. We will not discuss linear programming duality in this course. You can learn about it in some of the OR electives. Applications The simple bank example described earlier may not convey the full view on the usefulness of DEA. It is most useful when a comparison is sought against "best practices" where the analyst doesn't want the frequency of poorly run operations to affect the analysis. DEA has been applied in many situations such as: health care (hospitals, doctors), education (schools, universities), banks, manufacturing, benchmarking, management evaluation, fast food restaurants, and retail stores. The analyzed data sets vary in size. Some analysts work on problems with as few as 15 or 20 DMUs while others are tackling problems with over 10,000 DMUs. Strengths and Limitations of DEA As the earlier list of applications suggests, DEA can be a powerful tool when used wisely. A few of the characteristics that make it powerful are:     DEA can handle multiple input and multiple output models. It doesn't require an assumption of a functional form relating inputs to outputs. DMUs are directly compared against a peer or combination of peers. Inputs and outputs can have very different units. For example, X1 could be in units of lives saved and X2 could be in units of dollars without requiring an a priori tradeoff between the two. The same characteristics that make DEA a powerful tool can also create problems. An analyst should keep these limitations in mind when choosing whether or not to use DEA.     Since DEA is an extreme point technique, noise (even symmetrical noise with zero mean) such as measurement error can cause significant problems. DEA is good at estimating "relative" efficiency of a DMU but it converges very slowly to "absolute" efficiency. In other words, it can tell you how well you are doing compared to your peers but not compared to a "theoretical maximum." Since DEA is a nonparametric technique, statistical hypothesis tests are difficult and are the focus of ongoing research. Since a standard formulation of DEA creates a separate linear program for each DMU, large problems can be computationally intensive. References DEA has become a popular subject since it was first described in 1978. There have been hundreds of papers and technical reports published along with a few books. Technical articles about DEA have been published in a wide variety of places making it hard to find a good starting point. Here are a few suggestions as to starting points in the literature. 1. Charnes, A., W.W. Cooper, and E. Rhodes. "Measuring the efficiency of decision making units." European Journal of Operations Research (1978): 429-44. 2. Banker, R.D., A. Charnes, and W.W. Cooper. "Some models for estimating technical and scale inefficiencies in data envelopment analysis." Management Science 30 (1984): 107892. 3. Dyson, R.G. and E. Thanassoulis. "Reducing weight flexibility in data envelopment analysis." Journal of the Operational Research Society 39 (1988): 563-76. 4. Seiford, L.M. and R.M. Thrall. "Recent developments in DEA: the mathematical programming approach to frontier analysis." Journal of Econometrics 46 (1990): 7-38. 5. Ali, A.I., W.D. Cook, and L.M. Seiford. "Strict vs. weak ordinal relations for multipliers in data envelopment analysis." Management Science 37 (1991): 733-8. 6. Andersen, P. and N.C. Petersen. "A procedure for ranking efficient units in data envelopment analysis." Management Science 39 (1993): 1261-4. 7. Banker, R.D. "Maximum likelihood, consistency and data envelopment analysis: a statistical foundation." Management Science 39 (1993): 1265-73. The first paper was the original paper describing DEA and results in the abbreviation CCR for the basic constant returns-to-scale model. The Seiford and Thrall paper is a good overview of the literature. The other papers all introduce important new concepts. This list of references is certainly incomplete. A good source covering the field of productivity analysis is The Measurement of Productive Efficiency edited by Fried, Lovell, and Schmidt, 1993, from Oxford University Press. There is also a recent book from Kluwer Publishers, Data Envelopment Analysis: Theory, Methodology, and Applications by Charnes, Cooper, Lewin, and Seiford. To stay more current on the topics, some of the most important DEA articles appear in Management Science, The Journal of Productivity Analysis, The Journal of the Operational Research Society, and The European Journal of Operational Research. The latter just published a special issue, "Productivity Analysis: Parametric and Non-Parametric Approaches" edited by Lewin and Lovell which has several important papers. Volume 15, Issue 1, 2002 Estimation of technical efficiency: a review of some of the stochastic frontier and DEA software. Inés Herrero Department of Applied Economics, University of Huelva, and Sean Pascoe CEMARE, Department of Economics, University of Portsmouth The level of technical efficiency of a particular firm is characterised by the relationship between observed production and some ideal or potential production (Greene 1993). The measurement of firm specific technical efficiency is based upon deviations of observed output from the best production or efficient production frontier. If a firm's actual production point lies on the frontier it is perfectly efficient. If it lies below the frontier then it is technically inefficient, with the ratio of the actual to potential production defining the level of efficiency of the individual firm. Farrell's (1957) definition of technical efficiency led to the development of methods for estimating the relative technical efficiencies of firms. The common feature of these estimation techniques is that information is extracted from extreme observations from a body of data to determine the best practice production frontier (Lewin and Lovell 1990). From this the relative measure of technical efficiency for the individual firm can be derived. Despite this similarity the approaches for estimating technical efficiency can be generally categorised under the distinctly opposing techniques of parametric and non-parametric methods (Seiford and Thrall 1990). Stochastic estimations incorporate a measure of random error. This involves the estimation of a stochastic production frontier, where the output of a firm is a function of a set of inputs, inefficiency and random error. An often quoted disadvantage of the technique, however, is that they impose an explicit functional form and distribution assumption on the data. In contrast, the linear programming technique of data envelopment analysis (DEA) does not impose any assumptions about functional form, hence it is less prone to mis-specification. Further, DEA is a non-parametric approach so does not take into account random error. Hence, it is not subsequently subject to the problems of assuming an underlying distribution about the error term. However, since DEA cannot take account of such statistical noise, the efficiency estimates may be biased if the production process is largely characterised by stochastic elements. The purpose in this paper is to present a range of methods for estimating technical efficiency, and to review the software that is available to do so. Economic, allocative and technical efficiency Technical efficiency is just one component of overall economic efficiency. However, in order to be economically efficient, a firm must first be technically efficient. Profit maximisation requires a firm to produce the maximum output given the level of inputs employed (i.e. be technically efficient), use the right mix of inputs in light of the relative price of each input (i.e. be input allocative efficient) and produce the right mix of outputs given the set of prices (i.e. be output allocative efficient) (Kumbhaker and Lovell 2000). These concepts can be illustrated graphically using a simple example of a two input (x1, x2)-two output (y1, y2) production process (Figure 1). Efficiency can be considered in terms of the optimal combination of inputs to achieve a given level of output (an input-orientation), or the optimal output that could be produced given a set of inputs (an output-orientation). In Figure 1(a), the firm is producing a given level of output (y1*, y2*) using an input combination defined by point A. The same level of output could have been produced by radially contracting the use of both inputs back to point B, which lies on the isoquant associated with the minimum level of inputs required to produce (y1*, y2*) (i.e. Iso(y1*, y2*)). The input-oriented level of technical efficiency (TEI(y,x)) is defined by 0B/0A. However, the least-cost combination of inputs that produces (y1*, y2*) is given by point C (i.e. the point where the marginal rate of technical substitution is equal to the input price ratio w2/w1). To achieve the same level of cost (i.e. expenditure on inputs), the inputs would need to be further contracted to point D. The cost efficiency (CE(y,x,w)) is therefore defined by 0D/0A. The input allocative efficiency (AEI(y,w,w)) is subsequently given by CE(y,x,w)/TEI(y,x), or 0D/0B in Figure 1.1(a) (Kumbhaker and Lovell 2000). The production possibility frontier for a given set of inputs is illustrated in Figure 1(b) (i.e. an output-orientation). If the inputs employed by the firm were used efficiently, the output of the firm, producing at point A, can be expanded radially to point B. Hence, the output oriented measure of technical efficiency (TEO(y,x)), can be given by 0A/0B. This is only equivalent to the input-oriented measure of technical efficiency under conditions of constant returns to scale. While point B is technically efficient, in the sense that it lies on the production possibility frontier, a higher revenue could be achieved by producing at point C (the point where the marginal rate of transformation is equal to the price ratio p2/p1). In this case, more of y1 should be produced and less of y2 in order to maximise revenue. To achieve the same level of revenue as at point C while maintaining the same input and output combination, output of the firm would need to be expanded to point D. Hence, the revenue efficiency (RE(y,x,p)) is given by 0A/0D. Output allocative efficiency (AEO(y,w,w)) is given by RE(y,x,w)/TEI(y,x), or 0B/0D in Figure 1(b) (Kumbhaker and Lovell 2000). Figure 1: Input (a) and output (b) oriented efficiency measures Stochastic production frontier software Stochastic frontiers can be estimated using a different range of multi-purpose econometric software which can be adapted for the desired estimation. This software includes known statistical packages such as LIMDEP, TSP, Shazam, GAUSS, SAS, etc. The two most commonly used packages for estimating of stochastic production frontiers and inefficiency are FRONTIER 4.1 (Coelli 1996a) and LIMDEP (Greene 1995). A recent review of both packages is provided by Sena (1999). FRONTIER 4.1 is a single purpose package specifically designed for the estimation of stochastic production frontiers (and nothing else), while LIMDEP is a more general package designed for a range of non-standard (i.e. non-OLS) econometric estimation. An advantage of the former model (FRONTIER) is that estimates of efficiency are produced as a direct output from the package. The user is able to specify the distributional assumptions for the estimation of the inefficiency term in a program control file. In LIMDEP, the package estimates a one-sided distribution, but the separation of the inefficiency term from the random error component requires additional programming. FRONTIER is able to accommodate a wider range of assumptions about the error distribution term than LIMDEP (Table 1), although it is unable to model exponential distributions. Neither package can include gamma distributions. Only FRONTIER is able to estimate an inefficiency model as a one-step process. An inefficiency model can be estimated in a two-stage process using LIMDEP. However, this may create bias as the distribution of the inefficiency estimates is pre-determined through the distributional assumptions used in its generation. Table 1. Distributional assumptions allowed by the software Distribution LIMDEP FRONTIER Time invariant firm specific inefficiency · Half-normal distribution Yes Yes · Truncated normal distribution Yes Yes · Exponential distribution Yes No · Half-normal distribution No Yes · Truncated normal distribution No Yes Time variant firm specific inefficiency One step inefficiency model No Yes Source: Sena (1999) FRONTIER 4.1 The most commonly used package for estimation of stochastic production frontiers in the literature is FRONTIER 4.1 (Coelli, 1996a). This incorporates the maximum likelihood estimation of the parameters. The estimation process consists on three main steps. At the first step OLS is applied to estimate the production function. This provides unbiased estimators for the 's (except for the intercept term and the variance estimate). The OLS estimates are used as starting values to estimate the final ML model. First, the value of the likelihood function is estimated for different values of  between 0 and 1 given the values for the 's derived in the OLS. Finally an interative Davidon-Fletcher-Powell algorithm calculates the final parameter estimates, using the values of the 's from the OLS and the value of  from the intermediate step as starting values. FRONTIER 4.1 has been created specifically for the estimation of production frontiers. As such, it is a relatively easy tool to use in estimating stochastic frontier models. It is flexible in the way that it can be used to estimate both production and cost functions, can estimate both time-varying and invariant efficiencies, or when panel data is available, and it can be used when the functional form have the dependent variable both in logged or in original units. FRONTIER solves two general models. The error components model can be formulated as Yit = Xit  + (Vit - Uit) Where Yit is the (logged) output obtained by the i-th firm in the t-th time period; Xit is a (kx1) vector of (transformation of the) input quantities of the i-th firm in the t-th time period;  is a (kx1) vector of unknown parameters; and Vit are assumed to be iid N(0, v2) random errors, and Uit = Ui exp (-(t-T)), where Ui are assumed to be iid as truncations at zero of the N(i, u2). This is the Battese and Coelli (1992) model. However some other models can be summarized as special cases of this one and can also be solved using FRONTIER. Setting =0, the time invariant model of Battese, Coelli and Colby (1989) is obtained. The Battese and Coelli (1988) model results from the previous one for the particular case of problems in which balanced data is available. If we add =0 to the aforementioned assumptions, the Pitt and Lee (1981) model results. And if we finally set T=1 in the Pitt and Lee model, we obtain the original cross-sectional data model of Aigner, Lovell and Schmidt (1977). If >0, the inefficiency term, Uit, is always decreasing with time, whereas <0 implies that Uit is always increasing with time. That could be one of the main problems when using this model, technical efficiency is forced to be a monotonous function of time. The second model included in the FRONTIER package is the Technical Efficiency (TE) effects model (Battese and Coelli 1995). It can be expressed as Yit = Xit  + (Vit - Uit), where Yit, Xit,  and Vit are as defined earlier and Uit ~N(mit, u2), where mit = Zit, Zit is the vector of firmspecific variables which may influence the firms' efficiency. FRONTIER offers also the solution of the model of Stevenson (1980) which is a particular case of the previous model that can be obtained for the cases in which T is equal to 1 (for cross-sectional data). There are two approaches to estimating the inefficiency models. These may be estimated with either a one step or a two step process. For the two-step procedure the production frontier is first estimated and the technical efficiency of each firm is derived. These are subsequently regressed against a set of variables, Zit, which are hypothesised to influence the firms' efficiency. A problem with the two-stage procedure is the inconsistency in the assumptions about the distribution of the inefficiencies. In the first stage, the inefficiencies are assumed to be independently and identically distributed (iid) in order to estimate their values. However, in the second stage, the estimated inefficiencies are assumed to be a function of a number of firm specific factors, and hence are not identically distributed unless all the coefficients of the factors are simultaneously equal to zero (Coelli, Rao and Battese, 1998). FRONTIER uses the ideas of Kumbhakar, Ghosh and McGuckin (1991) and Reifschneider and Stevenson (1991) and estimates all of the parameters in one step to overcome this inconsistency. The inefficiency effects are defined as a function of the firm specific factors (as in the two-stage approach) but they are then incorporated directly into the MLE. This is something that should be taken into consideration when programming in some of the general statistical packages. FRONTIER offers a wide variety of tests on the different functional forms of the models that can be conducted easily by placing restrictions on the models and testing the significance of the restrictions using the likelihood ratio test. The FRONTIER program is easy to use. A brief instruction file and a data file have to be created. The executable file and the start-up file can be downloaded from the Internet free of charge at the CEPA (University of New England) web page: http://www.uq.edu.au/economics/cepa/frontier.htm. Other packages LIMDEP is a multi-purpose econometric package that can be programmed to estimate several stochastic frontier models using some of the stochastic frontier estimation routines that are already implemented in the package. The estimation process can be carried out when the inefficiency variable is assumed to have the half-normal and truncated normal distribution but also when the exponential distribution is assumed. Hence, when there seems to be strong evidence to assume an exponential distribution, LIMDEP may be a better option than FRONTIER. Among other features, LIMDEP offers the possibility of including random effects for panel data and estimates of one-sided disturbance components. The package includes the optimisation of non-linear regression with two-step least squares and user defined maximum likelihood estimation. There is a LIMDEP Web page at http://www.limdep.com/ The flexibility feature of GAUSS allows the possibility to be adapted for the estimation of production frontiers. GAUSS includes some non-linear optimisation routines which can be used for this purpose. Being a very general statistical package also has some disadvantages and the programming structure is not as easy as some of the other software packages. The GAUSS program can be ordered at http://www.aptech.com/prod.html. There is also a Web page that provides answers to questions frequently asked about the program: http://www.indiana.edu/~statmath/gauss/index.html The TSP package can be specifically adapted for the estimation of the most basic models of production frontiers, using the ML user programmed likelihood functions. The program also includes non-linear three-stage least squares. The program can be ordered and more information can be obtained at: http://www.tspintl.com/products/tsp The statistical program Shazam can also be adapted for the estimation of production frontiers. It allows the optimisation of non-linear functions either by Maximum Likelihood or least squares and it includes some hypothesis testing. Information about the program can be found at: http://shazam.econ.ubc.ca/ DEA software Most of the general-purpose mathematical optimisation software can be adapted to solve Data Envelopment Analysis problems. Examples of program code for DEA models have already been published and are readily adaptable (e.g. Olesen and Petersen (1995) present the GAMS code for a DEA model that can be adapted to suit most analyses). Emrouznejad (2000) developed a SAS program for different DEA models, including different options as input/output orientation and CRS/VRS. The Emrouznejad (2000) program and comments can be downloaded from: http://www.deazone.com/software/index.htm#sasdea. These general programs offer the possibility of a wide range of applications using non-specialist DEA software. However, there are several DEA-specific programs that provide a variety of interesting facilities. Seven of the most common ones are listed below, including the sources where further information can be found: 1. EMS 1.3 (Efficiency measurement system) by H. Scheel. University of Dortmund. Germany. This can be downloaded from the net. http://www.wiso.uni-dortmund.de/ 2. DEAP 2.1 (Coelli, 1996b). Centre for Efficiency and Productivity Analysis (CEPA). University of New England, NSW, Australia. It is now free and can also be downloaded from the net, as well as its manual from the CEPA Web pages http://www.uq.edu.au/economics/cepa/deap.htm 3. Frontier Analyst, version 3. Professional Edition. Banxia Software Ltd. Glasgow. Scotland. http://www.banxia.com/ 4. WARWICK DEA software. Aston Business School. University of Aston. Birmingham, England, UK. http://www.warwick.ac.uk/~bsrlu/ 5. OnFront 2. Economic Measurement and Quality I Lund AB Lund. Sweden. http://www.emq.se/software.html 6. DEA-Solver Professional 2.0. Saitech Inc. New Jersey. U.S.A. This is the most recent package. Its manual is the Cooper, Seiford and Tone (2000) book (Data Envelopment Analysis. Kluwer Academic Publishers). http://www.saitech-inc.com/ 7. IDEAS 6.1. Professional version. Consulting Inc., Amherst, MA, USA. http://www.ideas2000.com/ Package Facilities The key features of the different software packages have been summarised in Table 2. Table 2. Key features of the packages Input/ A&P Exog. Catego Multi Windo Malmq Weigh CRS Addit. RTS outpu Super Fixed Stag NIR w . t VR Mode inf t Vbles rical e S analysis index restr. S l o orient. Effic. . vbles. DEA EMS X DEAP X X X BANXIA WARWIC (*) K (*) OnFront 2 (*) X DEA Solver X IDEAS (*) X X X X X X X X X X X X X X X X X X X X X (*) X X X X X X X X X X X X X X X X X X X X (*) This can be carried out by doing some modifications to the program EMS and DEA-solver appear to offer the most options. Of the software examined, only EMS and DEA-Solver can be used to carry out window analysis when time series data are available, although if this is the case, DEAP and OnFront can be used for the calculation of Malmquist indices. OnFront2 also includes many different options, some of them not offered by any other package, such as the ability to carry out input and output congestion measures and Capacity Utilization measures. Warwick software can be adapted to compute Malmquist indices but only by running the DOS version in batch mode to compute efficiencies. These efficiency scores are needed later in order to compute the indices; similarly it can be adapted to carry out window analysis. The additive model can be estimated using EMS and Warwick software, and the Slacks-BasedMeasure (SBM) model using either EMS or DEA-solver. All the programs under study - except DEAP - allow the possibility of choosing between an input or output orientation, the assumption of either constant or variable returns to scale, the possibility of including non discretionary variables (i.e. exogenously fixed variables), and offer some scale information. This latter possibility has recently been added in the new version of OnFront under the name of vector analysis. Banxia lacks some of the options that other programs offer, such as window analysis, but it presents the advantage of offering many graphs relating efficiency to the different inputs and outputs. It also allows the introduction and/or elimination of some of the decision-making units (DMUs). It is a very user-friendly package and very easy to work with even for those who do not have a wide knowledge of DEA. A thorough user´s guide is also provided. One-day workshops introducing some of the theory of DEA, its practical applications and how to handle the program take place regularly in London. Details regarding contents and cost information can be found at: http://www.banxia.com/training/bxfanalyst.html Regarding the use of categorical variables, only IDEAS and DEA-Solver offer their inclusion as an option in the different models. The possibility of including weak or strong disposability of inputs has been added in the new version of OnFront. EMS, Warwick software and the new version of OnFront are the only packages that compute Andersen and Petersen super-efficiency scores. EMS and DEA-solver include the possibility of working with non-convex technologies and to compare different programs' efficiencies. EMS and DEAP offer a good on-line guide for users. Banxia and OnFront provide a reference guide to users together with the program. DEA-Solver program just includes a very brief reference guide in the package, while more detailed explanations about the options can be found in the associated reference book (Cooper et al. 2000). Operating System and PC Requirements. Most of the software packages run under Windows though some of them, like OnFront and Warwick DEA, offer both DOS and Windows versions. There are some others like PIONEER (not included in the above table) that run under Unix (for additional information contact barr@seas.smu.edu). DEAP is the only package among those presented here that does not offer a Windows version. It is based on MS-DOS and the input files have to be text files. It also lacks some of the options that other programs offer. In spite of these disadvantages, DEAP presents many desirable technical characteristics. Foremost of these is that DEAP allows a multi-step procedure to be used in solving the DEA models. This can be either a one-step, two-step or a multi-stage procedure. This is important as using a one-step procedure can produce instabilities. Many of the other programs, such as the Warwick Software, avoid instabilities using a two-step procedure but wrong peers can be obtained for some of the inefficient units. Both problems can be overcome by the use of the DEAP multi-step procedure. DEAP does not requires a large memory PC and being currently free of charge is another good reason for using it. The operating System used by all programs, with the exception of DEAP, is Windows 95 or later(note) and NT. EMS accepts Excel97 or text data files. Warwick and OnFront require input in the form of text files (although this restriction can be lessened using the Windows clipboard). The new version of Banxia offers a direct data import from Excel and SPSS and the reports produced there can be directly exported to RTF (as Word), Excel and html files. The DEA-solver is based in Excel 97. Access, Dbase, Excel and Text files are all accepted files for IDEAS. Requirements regarding the capacity of the computer are highly related to how sophisticated the software is. The most demanding are Banxia and OnFront. Banxia software requires 50MB of hard disk space, 32 MB of RAM and they strongly recommend using a Pentium PC. OnFront requires at least 16 MB of RAM and 8Mb of disk space. DEAP only requires 4 MB of RAM and it can even be run using a 386 PC. However it is not as user-friendly as the others. Software Prices The prices in 2001 (excluding VAT, shipping and handling) for the software under analysis are presented in Table 3. Table 3. Price for main DEA software, 2001 (excludes VAT) Academic price Commercial price Additional licence Additional licence (academic) (commercial) EMS 1.3 Free Free Free Free DEAP 2.1 Free Free Free Free BANXIA 3 £195-595 £395-2395 35% discount 35% discount WARWICK £200/500 £200/500 - - OnFront2 $750 $1750 $450 $900 DEAsolver 2.0 $800 $1600 $400 $800 $280/2800 - - IDEAS $140/1400 The prices are for individual licences either for academic or business purposes. There are some other prices for Campus licences, educational purposes, etc. Some of the software presented here, such as Banxia, Warwick or IDEAS, can be obtained at different prices if the number of DMUs under study is limited. Banxia software offers a wide variety of prices depending on the maximum number of units it analyses. The lower level is for 75 units (£195) then 250 (£295), 500 units (£395) 1500 units (£495) and 2500 units (£595). The same range can be bought for commercial purposes and prices vary from £395 to £2395. Warwick software offers a limited version for analysis undertaking less than 50 units (£200) and a full version for an unlimited number of DMUs (£500). The same applies for the DOS Warwick version (£150/300). IDEAS also offers two versions, the standard version which allows analyses of up to 500 DMUs, 20 variables and 20 price ratios and the professional version for analyses of up to 2,000 DMUs, 25 variables and 120 price ratios. The rest of the programs can be used for any number of units (within the capacity of the memory of the computer). Banxia has recently launch some extra packages at different prices: Cross efficiency module (£95), 500 unit three-month-trial licence programme (£295), Support and maintenance for a year (£229) and the upgraded from the previous version is offered by obtaining the CD and new manual (£169) or by obtaining an electronic copy (£129, with no physical products). Acknowledgement This review was undertaken as part of the EU funded project 'Technical efficiency in EU fisheries: implications for monitoring and management through effort controls' (QLK5-CT199901295). Note Warwick software also allows for Windows 3.1 References Aigner, D., C. A. K. Lovell and P. Schmidt (1977). Formulation and estimation of stochastic frontier production function models. Journal of Econometrics 6: 21-37. Banxia Software (1998). Banxia Frontier Analyst. User's Guide. Professional Edition. Frontier Analyst. Battese, E. and T. J. Coelli (1988). Prediction of firm level technical inefficiencies with a generalised frontier production function and panel data, Journal of Econometrics, 38, 387-399. Battese, G. E., T. J. Coelli and T. C. Colby, (1989) Estimation of Frontier Production Functions and the Efficiencies of Indian Farms Using Panel Data from ICRISAT's Village Level Studies. Journal of Quantitative Economics, 5(2): 327-48 Battese, G. E. and T. J. Coelli (1992). Frontier Production Function, technical efficiency and panel data: with application to Paddy Farmers in India. Journal of Productivity Analysis 3: 153169. Battese, G. E. and T. J. Coelli (1995). A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical Economics 20: 325-332. Coelli, T. (1996a). A guide to FRONTIER version 4.1: a computer program for frontier production function estimation. CEPA Working Paper 96/07, Department of Econometrics, University of New England, Armidale, Australia. Coelli, T. J. (1996b). 'A guide to DEAP version 2.1: A Data Envelopment Analysis (Computer) program'. CEPA Working Papers No.8/96. ISSN 1327-435X. ISBN 1 86389 4969. Coelli, T. J., D. S. P. Rao and G. E. Battese (1998). An Introduction to Efficiency and Productivity Analysis, Kluwer Academic Publishers: USA. Cooper, W. W., Seiford, L. M. and Tone, K. (2000)., Data Envelopment Analysis. Kluwer Academic Publishers. ISBN 0792386930. Emrouznejad, A. (2000). An Extension to SAS/OR for Decision System Support, Proceeding of the 25 SAS User Group International Conference, April 2000, Indianapolis pp1456-1461. Färe, R. and Grosskopf, S. (1998). 'Reference Guide to OnFront'. Economic Measurement and Quality I Lund AB Lund. Sweden. Farrell, M. J. (1957). The measurement of productive efficiency, Journal of the Royal Statistical Society, 120, 252-90. Greene, W. (1995). LIMDEP (Version 7): User's Manual and Reference Guide, Econometric Software Inc., New York. Greene, W. H. (1993). "Frontier Production Functions", EC-93-20. Stern School of Business, New York University. Hollingsworth, B., (1997). 'A review of Data Envelopment Analysis Software'. The Economic Journal, 1997, 1268-1270 Hollingsworth, B., (1999). Data Envelopment Analysis and Productivity Analysis: a review of the options. The Economic Journal, 458-462. Kumbhaker, S. C. and Lovell C. A. K. (2000). Stochastic Frontier Analysis. Cambridge: Cambridge University Press. Kumbhakar, S., S. Ghosh and J. McGuckin. (1991). A generalized Production Frontier Approach for Estimating Determinants of Inefficiency in U.S. Dairy Farms. Journal of Business and Economic Statistics, 9:279-286. Lewin, A. Y. and C. A. K. Lovell (1990). Editors Introduction, Journal of Econometrics, 46, 3-5. Olesen and Petersen (1995). A presentation of GAMS for DEA. Computers & Operations Research 23(4) pp 323-339 Pitt, M. M. and L. F. Lee. (1981). The measurement and sources of technical inefficiency in the Indonesian weaving industry. Journal of Development Economics, 9:43-64. Reifschneider, D. and R. Stevenson. (1991). Systematic departures from the frontier: A framework for the analysis of firm inefficiency. International Economic Review, 32 (3):715-723. Scheel, H. (2000). EMS: Efficiency Measurement System User´s Manual. Version 1.3 Seiford, L. M. and R. M. Thrall (1990) Recent developments in DEA: the mathematical programming approach to frontier analysis, Journal of Econometrics, 46, 7-38. Sena, V. (1999). Stochastic frontier estimation: a review of the software options. Journal of Applied Econometrics, 14: 579-586. Stevenson, R. E., (1980). Likelihood function for generalized stochastic frontier estimation, Journal of Econometrics, 13, 57-66.

- University at Albany

Related documents

Products

Support

- University at Albany

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib