- University at Albany

advertisement
Data Envelopment Analysis
Data Envelopment Analysis (DEA) is an increasingly popular management tool. This write-up is
an introduction to Data Envelopment Analysis (DEA) for those who are unfamiliar with the
technique. For a more in-depth discussion of DEA, the interested reader is referred to Seiford
and Thrall [1990] or the seminal work by Charnes, Cooper, and Rhodes [1978].
DEA is commonly used to evaluate the efficiency of a number of producers. A typical statistical
approach is characterized as a central tendency approach and it evaluates producers relative to an
average producer In contrast, DEA compares each producer with only the "best" producers. By
the way, in the DEA literature, a producer is usually referred to as a decision making unit or
DMU. DEA is not always the right tool for a problem but is appropriate in certain cases. (See
Strengths and Limitations of DEA.)
In DEA, there are a number of producers. The production process for each producer is to take a
set of inputs and produce a set of outputs. Each producer has a varying level of inputs and gives a
varying level of outputs. For instance, consider a set of banks. Each bank has a certain number of
tellers, a certain square footage of space, and a certain number of managers (the inputs). There
are a number of measures of the output of a bank, including number of checks cashed, number of
loan applications processed, and so on (the outputs). DEA attempts to determine which of the
banks are most efficient, and to point out specific inefficiencies of the other banks.
A fundamental assumption behind this method is that if a given producer, A, is capable of
producing Y(A) units of output with X(A) inputs, then other producers should also be able to do
the same if they were to operate efficiently. Similarly, if producer B is capable of producing Y(B)
units of output with X(B) inputs, then other producers should also be capable of the same
production schedule. Producers A, B, and others can then be combined to form a composite
producer with composite inputs and composite outputs. Since this composite producer does not
necessarily exist, it is typically called a virtual producer.
The heart of the analysis lies in finding the "best" virtual producer for each real producer. If the
virtual producer is better than the original producer by either making more output with the same
input or making the same output with less input then the original producer is inefficient. The
subtleties of DEA are introduced in the various ways that producers A and B can be scaled up or
down and combined.
Numerical Example
To illustrate how DEA works, let's take an example of three banks. Each bank has exactly 10
tellers (the only input), and we measure a bank based on two outputs: Checks cashed and Loan
applications. The data for these banks is as follows:


Bank A: 10 tellers, 1000 checks, 20 loan applications
Bank B: 10 tellers, 400 checks, 50 loan applications

Bank C: 10 tellers, 200 checks, 150 loan applications
Now, the key to DEA is to determine whether we can create a virtual bank that is better than one
or more of the real banks. Any such dominated bank will be an inefficient bank.
Consider trying to create a virtual bank that is better than Bank A. Such a bank would use no
more inputs than A (10 tellers), and produce at least as much output (1000 checks and 20 loans).
Clearly, no combination of banks B and C can possibly do that. Bank A is therefore deemed to
be efficient. Bank C is in the same situation.
However, consider bank B. If we take half of Bank A and combine it with half of Bank C, then
we create a bank that processes 600 checks and 85 loan applications with just 10 tellers. This
dominates B (we would much rather to have the virtual bank we created than bank B). Bank B is
therefore inefficient.
Another way to see this is that we can scale down the inputs to B (the tellers) and still have at
least as much output. If we assume that inputs are linearly scalable, then we estimate that we can
get by with 6.3 tellers. We do that by taking .34 times bank A plus .29 times bank B. The result
uses 6.3 tellers and produces at least as much as bank B does. We say that bank B's efficiency
rating is .63. Banks A and C have an efficiency rating of 1.
Graphical Example
The single input two-output or two input-one output problems are easy to analyze graphically.
The previous numerical example is now solved graphically. (An assumption of constant returns
to scale is made and explained in detail later.) The analysis of the efficiency for bank B looks
like the following:
If it is assumed that convex combinations of banks are allowed, then the line segment connecting
banks A and C shows the possibilities of virtual outputs that can be formed from these two banks.
Similar segments can be drawn between A and B along with B and C. Since the segment AC lies
beyond the segments AB and BC, this means that a convex combination of A and C will create
the most outputs for a given set of inputs.
This line is called the efficiency frontier. The efficiency frontier defines the maximum
combinations of outputs that can be produced for a given set of inputs.
Since bank B lies below the efficiency frontier, it is inefficient. Its efficiency can be determined
by comparing it to a virtual bank formed from bank A and bank C. The virtual player, called V,
is approximately 54% of bank A and 46% of bank C. (This can be determined by an application
of the lever law. Pull out a ruler and measure the lengths of AV, CV, and AC. The percentage of
bank C is then AV/AC and the percentage of bank A is CV/AC.)
The efficiency of bank B is then calculated by finding the fraction of inputs that bank V would
need to produce as many outputs as bank B. This is easily calculated by looking at the line from
the origin, O, to V. The efficiency of player B is OB/OV which is approximately 63%. This
figure also shows that banks A and C are efficient since they lie on the efficiency frontier. In
other words, any virtual bank formed for analyzing banks A and C will lie on banks A and C
respectively. Therefore since the efficiency is calculated as the ratio of OA/OV or OA/OV, banks
A and C will have efficiency scores equal to 1.0.
The graphical method is useful in this simple two dimensional example but gets much harder in
higher dimensions. The normal method of evaluating the efficiency of bank B is by using a linear
programming formulation of DEA.
Since this problem uses a constant input value of 10 for all of the banks, it avoids the
complications caused by allowing different returns to scale. Returns to scale refers to increasing
or decreasing efficiency based on size. For example, a manufacturer can achieve certain
economies of scale by producing a thousand circuit boards at a time rather than one at a time - it
might be only 100 times as hard as producing one at a time. This is an example of increasing
returns to scale (IRS.)
On the other hand, the manufacturer might find it more than a trillion times as difficult to
produce a trillion circuit boards at a time though because of storage problems and limits on the
worldwide copper supply. This range of production illustrates decreasing returns to scale (DRS.)
Combining the two extreme ranges would necessitate variable returns to scale (VRS.)
Constant Returns to Scale (CRS) means that the producers are able to linearly scale the inputs
and outputs without increasing or decreasing efficiency. This is a significant assumption. The
assumption of CRS may be valid over limited ranges but its use must be justified. As an aside,
CRS tends to lower the efficiency scores while VRS tends to raise efficiency scores.
Using Linear Programming
Data Envelopment Analysis is a linear programming procedure for a frontier analysis of inputs
and outputs. DEA assigns a score of 1 to a unit only when comparisons with other relevant units
do not provide evidence of inefficiency in the use of any input or output. DEA assigns an
efficiency score less than one to (relatively) inefficient units. A score less than one means that a
linear combination of other units from the sample could produce the same vector of outputs by
using a smaller vector of inputs. The score reflects the radial distance from the estimated
production frontier to the DMU under consideration.
There are a number of equivalent formulations for DEA. The most direct formulation of the
example above is as follows:
Let
be the vector of inputs into DMU i. Let
be the corresponding vector of outputs. Let
be the inputs into a DMU for which we want to determine its efficiency and
So the X's and the Y's are the data. The measure of efficiency for
following linear program:
be the outputs.
is given by the
where is the weight given to DMU i in its efforts to dominate DMU 0 and is the efficiency of
DMU 0. So the 's and are the variables. Since DMU 0 appears on the left hand side of the
equations as well, the optimal cannot possibly be more than 1. When we solve this linear
program, we get a number of things:
1. The efficiency of DMU 0 ( ), with
meaning that the unit is efficient.
2. The unit's ``comparables'' (those DMU with nonzero ).
3. The ``goal'' inputs (the difference between
and
)
4. Alternatively, we can keep inputs fixed and get goal outputs (
)
DEA assumes that the inputs and outputs have been correctly identified. Usually, as the number
of inputs and outputs increase, more DMUs tend to get an efficiency rating of 1 as they become
too specialized to be evaluated with respect to other units. On the other hand, if there are too few
inputs and outputs, more DMUs tend to be comparable. In any study, it is important to focus on
correctly specifying inputs and outputs.
The linear programs for evaluating the 3 DMUs are given by:








LP for evaluating DMU 1:








LP for evaluating DMU 2:


LP for evaluating DMU 3:
min THETA
st
5L1+8L2+7L3 - 5THETA <= 0
14L1+15L2+12L3 - 14THETA <= 0
9L1+5L2+4L3 >= 9
4L1+7L2+9L3 >= 4
16L1+10L2+13L3 >= 16
L1, L2, L3 >= 0
min THETA
st
5L1+8L2+7L3 - 8THETA <= 0
14L1+15L2+12L3 - 15THETA <= 0
9L1+5L2+4L3 >= 5
4L1+7L2+9L3 >= 7
16L1+10L2+13L3 >= 10
L1, L2, L3 >= 0
min THETA






st
5L1+8L2+7L3 - 7THETA <= 0
14L1+15L2+12L3 - 12THETA <= 0
9L1+5L2+4L3 >= 4
4L1+7L2+9L3 >= 9
16L1+10L2+13L3 >= 13
L1, L2, L3 >= 0
The solution to each of these is as follows:
















DMU 1.
















DMU 2.


DMU 3.
Adjustable Cells
Cell
$B$10
$B$11
$B$12
$B$13
Final
Reduced Objective Allowable
Allowable
Value
Cost Coefficien
Increase
Decrease
1
0
1
1E+30
1
1
0
0 0.92857142 0.619047619
0 0.24285714
0
1E+30 0.242857143
0
0
0 0.36710963 0.412698413
Name
theta
L1
L2
L3
Constraints
Cell Name
$B$16 IN1
$B$17 IN2
$B$18 OUT1
$B$19 OUT2
$B$20 OUT3
Final
Shadow
Value
Price
-0.103473
0
-0.289724 -0.07142857
9 0.085714286
4 0.057142857
16
0
Constraint Allowable
R.H. Side
Increase
0
1E+30
0
0
9
0
4
0
16
0
Allowable
Decrease
0
1E+30
0
0
1E+30
Adjustable Cells
Final
Reduced Objective Allowable
Allowable
Cell Name
Value
Cost Coefficien
Increase
Decrease
$B$10 theta 0.773333333
0
1
1E+30
1
$B$11 L1
0.261538462
0
0 0.866666667 0.577777778
$B$12 L2
0 0.22666667
0
1E+30 0.226666667
$B$13 L3
0.661538462
0
0 0.342635659 0.385185185
Constraints
Final
Shadow
Cell Name
Value
Price
$B$16 IN1 -0.24820512
0
$B$17 IN2
-0.452651 -0.0666667
$B$18 OUT1
5
0.08
$B$19 OUT2
7 0.0533333
$B$20 OUT3 12.78461538
0
Adjustable Cells
Constraint
Allowable
Allowable
R.H. Side
Increase
Decrease
0
1E+30 0.248205128
0 0.46538461
1E+30
5
10.75 0.655826558
7 1.05676855
3.41509434
10 2.78461538
1E+30














Allowable
Cell Name
Decrease
$B$10 theta
1
$B$11 L1
0.722222222
$B$12 L2
0.283333333
$B$13 L3
0.481481481
Final
Reduced
Objective
Allowable
Value
Cost
Coefficien
Increase
1
0
1
1E+30
0
0
0 1.08333333
0 0.283333333
0
1
0 0.42829457
0
1E+30
Constraints
Cell Name
$B$16 IN1
$B$17 IN2
$B$18 OUT1
$B$19 OUT2
$B$20 OUT3
Final
Shadow Constraint
Value
Price R.H. Side
-0.559375
0
0
-0.741096 -0.08333333
0
4
0.1
4
9 0.066666667
9
13
0
13
Allowable
Increase
1E+30
0
16.25
0
0
Allowable
Decrease
0
1E+30
0
0
1E+30
Note that DMUs 1 and 3 are overall efficient and DMU 2 is inefficient with an efficiency rating
of 0.773333.
Hence the efficient levels of inputs and outputs for DMU 2 are given by:

Efficient levels of Inputs:

Efficient levels of Outputs:
Note that the outputs are at least as much as the outputs currently produced by DMU 2 and inputs
are at most as big as the 0.773333 times the inputs of DMU 2. This can be used in two different
ways: The inefficient DMU should target to cut down inputs to equal at most the efficient levels.
Alternatively, an equivalent statement can be made by finding a set of efficient levels of inputs
and outputs by dividing the levels obtained by the efficiency of DMU 2. This focus can then be
used to set targets primarily for outputs rather than reduction of inputs.
SAS Example.
Alternate Formulation
There is another, probably more common formulation, that provides the same information. We
can think of DEA as providing a price on each of the inputs and a value for each of the outputs.
The efficiency of a DMU is simply the ratio of the inputs to the outputs, and is constrained to be
no more than 1. The prices and values have nothing to do with real prices and values: they are an
artificial construct. The goal is to find a set of prices and values that puts the target DMU in the
best possible light. The goal, then is to
Here, the variables are the u's and the v's. They are vectors of prices and values respectively.
This fractional program can be equivalently stated as the following linear programming problem
(where Y and X are matrices with columns
and
respectively).
We denote this linear program by (D). Let us compare it with the one introduced earlier, which
we denote by (P):
To fix ideas, let us write out explicitely these two formulations for DMU 2, say, in our example.
Formulation (P) for DMU 2:
min
THETA
st
-5 L1 - 8 L2 - 7 L3 + 8 THETA >= 0
-14L1 -15 L2 -12 L3 + 15THETA >= 0
9 L1 + 5 L2 + 4 L3
>= 5
4 L1 + 7 L2 + 9 L3
>= 7
16L1 +10 L2 +13 L3
>= 10
L1>=0, L2>=0, L3>=0
Formulation (D) for DMU 2:
max
st
- 5
- 8
- 7
8
5 U1 + 7 U2 + 10 U3
V1 - 14V2 + 9
V1 - 15V2 + 5
V1 - 12V2 + 4
V1 + 15V2
V1>=0, V2>=0,
U1 + 4 U2 + 16 U3 <= 0
U1 + 7 U2 + 12 U3 <= 0
U1 + 9 U2 + 13 U3 <= 0
= 1
U1>=0, U2>=0, U3>=0
Formulations (P) and (D) are dual linear programs! These two formulations actually give the
same information. You can read the solution to one from the shadow prices of the other. We will
not discuss linear programming duality in this course. You can learn about it in some of the OR
electives.
Applications
The simple bank example described earlier may not convey the full view on the usefulness of
DEA. It is most useful when a comparison is sought against "best practices" where the analyst
doesn't want the frequency of poorly run operations to affect the analysis. DEA has been applied
in many situations such as: health care (hospitals, doctors), education (schools, universities),
banks, manufacturing, benchmarking, management evaluation, fast food restaurants, and retail
stores.
The analyzed data sets vary in size. Some analysts work on problems with as few as 15 or 20
DMUs while others are tackling problems with over 10,000 DMUs.
Strengths and Limitations of DEA
As the earlier list of applications suggests, DEA can be a powerful tool when used wisely. A few
of the characteristics that make it powerful are:




DEA can handle multiple input and multiple output models.
It doesn't require an assumption of a functional form relating inputs to outputs.
DMUs are directly compared against a peer or combination of peers.
Inputs and outputs can have very different units. For example, X1 could be in units of
lives saved and X2 could be in units of dollars without requiring an a priori tradeoff
between the two.
The same characteristics that make DEA a powerful tool can also create problems. An analyst
should keep these limitations in mind when choosing whether or not to use DEA.




Since DEA is an extreme point technique, noise (even symmetrical noise with zero mean)
such as measurement error can cause significant problems.
DEA is good at estimating "relative" efficiency of a DMU but it converges very slowly to
"absolute" efficiency. In other words, it can tell you how well you are doing compared to
your peers but not compared to a "theoretical maximum."
Since DEA is a nonparametric technique, statistical hypothesis tests are difficult and are
the focus of ongoing research.
Since a standard formulation of DEA creates a separate linear program for each DMU,
large problems can be computationally intensive.
References
DEA has become a popular subject since it was first described in 1978. There have been
hundreds of papers and technical reports published along with a few books. Technical articles
about DEA have been published in a wide variety of places making it hard to find a good starting
point. Here are a few suggestions as to starting points in the literature.
1. Charnes, A., W.W. Cooper, and E. Rhodes. "Measuring the efficiency of decision making
units." European Journal of Operations Research (1978): 429-44.
2. Banker, R.D., A. Charnes, and W.W. Cooper. "Some models for estimating technical and
scale inefficiencies in data envelopment analysis." Management Science 30 (1984): 107892.
3. Dyson, R.G. and E. Thanassoulis. "Reducing weight flexibility in data envelopment
analysis." Journal of the Operational Research Society 39 (1988): 563-76.
4. Seiford, L.M. and R.M. Thrall. "Recent developments in DEA: the mathematical
programming approach to frontier analysis." Journal of Econometrics 46 (1990): 7-38.
5. Ali, A.I., W.D. Cook, and L.M. Seiford. "Strict vs. weak ordinal relations for multipliers
in data envelopment analysis." Management Science 37 (1991): 733-8.
6. Andersen, P. and N.C. Petersen. "A procedure for ranking efficient units in data
envelopment analysis." Management Science 39 (1993): 1261-4.
7. Banker, R.D. "Maximum likelihood, consistency and data envelopment analysis: a
statistical foundation." Management Science 39 (1993): 1265-73.
The first paper was the original paper describing DEA and results in the abbreviation CCR for
the basic constant returns-to-scale model. The Seiford and Thrall paper is a good overview of the
literature. The other papers all introduce important new concepts. This list of references is
certainly incomplete.
A good source covering the field of productivity analysis is The Measurement of Productive
Efficiency edited by Fried, Lovell, and Schmidt, 1993, from Oxford University Press. There is
also a recent book from Kluwer Publishers, Data Envelopment Analysis: Theory, Methodology,
and Applications by Charnes, Cooper, Lewin, and Seiford.
To stay more current on the topics, some of the most important DEA articles appear in
Management Science, The Journal of Productivity Analysis, The Journal of the Operational
Research Society, and The European Journal of Operational Research. The latter just published a
special issue, "Productivity Analysis: Parametric and Non-Parametric Approaches" edited by
Lewin and Lovell which has several important papers.
Volume 15, Issue 1, 2002
Estimation of technical efficiency: a review of some of the stochastic frontier
and DEA software.
Inés Herrero
Department of Applied Economics, University of Huelva, and
Sean Pascoe
CEMARE, Department of Economics, University of Portsmouth
The level of technical efficiency of a particular firm is characterised by the relationship between
observed production and some ideal or potential production (Greene 1993). The measurement of
firm specific technical efficiency is based upon deviations of observed output from the best
production or efficient production frontier. If a firm's actual production point lies on the frontier
it is perfectly efficient. If it lies below the frontier then it is technically inefficient, with the ratio
of the actual to potential production defining the level of efficiency of the individual firm.
Farrell's (1957) definition of technical efficiency led to the development of methods for
estimating the relative technical efficiencies of firms. The common feature of these estimation
techniques is that information is extracted from extreme observations from a body of data to
determine the best practice production frontier (Lewin and Lovell 1990). From this the relative
measure of technical efficiency for the individual firm can be derived. Despite this similarity the
approaches for estimating technical efficiency can be generally categorised under the distinctly
opposing techniques of parametric and non-parametric methods (Seiford and Thrall 1990).
Stochastic estimations incorporate a measure of random error. This involves the estimation of a
stochastic production frontier, where the output of a firm is a function of a set of inputs,
inefficiency and random error. An often quoted disadvantage of the technique, however, is that
they impose an explicit functional form and distribution assumption on the data. In contrast, the
linear programming technique of data envelopment analysis (DEA) does not impose any
assumptions about functional form, hence it is less prone to mis-specification. Further, DEA is a
non-parametric approach so does not take into account random error. Hence, it is not
subsequently subject to the problems of assuming an underlying distribution about the error term.
However, since DEA cannot take account of such statistical noise, the efficiency estimates may
be biased if the production process is largely characterised by stochastic elements.
The purpose in this paper is to present a range of methods for estimating technical efficiency,
and to review the software that is available to do so.
Economic, allocative and technical efficiency
Technical efficiency is just one component of overall economic efficiency. However, in order to
be economically efficient, a firm must first be technically efficient. Profit maximisation requires
a firm to produce the maximum output given the level of inputs employed (i.e. be technically
efficient), use the right mix of inputs in light of the relative price of each input (i.e. be input
allocative efficient) and produce the right mix of outputs given the set of prices (i.e. be output
allocative efficient) (Kumbhaker and Lovell 2000). These concepts can be illustrated graphically
using a simple example of a two input (x1, x2)-two output (y1, y2) production process (Figure 1).
Efficiency can be considered in terms of the optimal combination of inputs to achieve a given
level of output (an input-orientation), or the optimal output that could be produced given a set of
inputs (an output-orientation).
In Figure 1(a), the firm is producing a given level of output (y1*, y2*) using an input combination
defined by point A. The same level of output could have been produced by radially contracting
the use of both inputs back to point B, which lies on the isoquant associated with the minimum
level of inputs required to produce (y1*, y2*) (i.e. Iso(y1*, y2*)). The input-oriented level of
technical efficiency (TEI(y,x)) is defined by 0B/0A. However, the least-cost combination of
inputs that produces (y1*, y2*) is given by point C (i.e. the point where the marginal rate of
technical substitution is equal to the input price ratio w2/w1). To achieve the same level of cost
(i.e. expenditure on inputs), the inputs would need to be further contracted to point D. The cost
efficiency (CE(y,x,w)) is therefore defined by 0D/0A. The input allocative efficiency
(AEI(y,w,w)) is subsequently given by CE(y,x,w)/TEI(y,x), or 0D/0B in Figure 1.1(a)
(Kumbhaker and Lovell 2000).
The production possibility frontier for a given set of inputs is illustrated in Figure 1(b) (i.e. an
output-orientation). If the inputs employed by the firm were used efficiently, the output of the
firm, producing at point A, can be expanded radially to point B. Hence, the output oriented
measure of technical efficiency (TEO(y,x)), can be given by 0A/0B. This is only equivalent to the
input-oriented measure of technical efficiency under conditions of constant returns to scale.
While point B is technically efficient, in the sense that it lies on the production possibility
frontier, a higher revenue could be achieved by producing at point C (the point where the
marginal rate of transformation is equal to the price ratio p2/p1). In this case, more of y1 should be
produced and less of y2 in order to maximise revenue. To achieve the same level of revenue as at
point C while maintaining the same input and output combination, output of the firm would need
to be expanded to point D. Hence, the revenue efficiency (RE(y,x,p)) is given by 0A/0D. Output
allocative efficiency (AEO(y,w,w)) is given by RE(y,x,w)/TEI(y,x), or 0B/0D in Figure 1(b)
(Kumbhaker and Lovell 2000).
Figure 1: Input (a) and output (b) oriented efficiency measures
Stochastic production frontier software
Stochastic frontiers can be estimated using a different range of multi-purpose econometric
software which can be adapted for the desired estimation. This software includes known
statistical packages such as LIMDEP, TSP, Shazam, GAUSS, SAS, etc. The two most
commonly used packages for estimating of stochastic production frontiers and inefficiency are
FRONTIER 4.1 (Coelli 1996a) and LIMDEP (Greene 1995). A recent review of both packages is
provided by Sena (1999).
FRONTIER 4.1 is a single purpose package specifically designed for the estimation of stochastic
production frontiers (and nothing else), while LIMDEP is a more general package designed for a
range of non-standard (i.e. non-OLS) econometric estimation. An advantage of the former model
(FRONTIER) is that estimates of efficiency are produced as a direct output from the package.
The user is able to specify the distributional assumptions for the estimation of the inefficiency
term in a program control file. In LIMDEP, the package estimates a one-sided distribution, but
the separation of the inefficiency term from the random error component requires additional
programming.
FRONTIER is able to accommodate a wider range of assumptions about the error distribution
term than LIMDEP (Table 1), although it is unable to model exponential distributions. Neither
package can include gamma distributions. Only FRONTIER is able to estimate an inefficiency
model as a one-step process. An inefficiency model can be estimated in a two-stage process
using LIMDEP. However, this may create bias as the distribution of the inefficiency estimates is
pre-determined through the distributional assumptions used in its generation.
Table 1. Distributional assumptions allowed by the software
Distribution
LIMDEP FRONTIER
Time invariant firm specific inefficiency
· Half-normal distribution
Yes
Yes
· Truncated normal distribution
Yes
Yes
· Exponential distribution
Yes
No
· Half-normal distribution
No
Yes
· Truncated normal distribution
No
Yes
Time variant firm specific inefficiency
One step inefficiency model
No
Yes
Source: Sena (1999)
FRONTIER 4.1
The most commonly used package for estimation of stochastic production frontiers in the
literature is FRONTIER 4.1 (Coelli, 1996a). This incorporates the maximum likelihood
estimation of the parameters. The estimation process consists on three main steps. At the first
step OLS is applied to estimate the production function. This provides unbiased estimators for
the 's (except for the intercept term and the variance estimate). The OLS estimates are used as
starting values to estimate the final ML model. First, the value of the likelihood function is
estimated for different values of  between 0 and 1 given the values for the 's derived in the
OLS. Finally an interative Davidon-Fletcher-Powell algorithm calculates the final parameter
estimates, using the values of the 's from the OLS and the value of  from the intermediate step
as starting values.
FRONTIER 4.1 has been created specifically for the estimation of production frontiers. As such,
it is a relatively easy tool to use in estimating stochastic frontier models. It is flexible in the way
that it can be used to estimate both production and cost functions, can estimate both time-varying
and invariant efficiencies, or when panel data is available, and it can be used when the functional
form have the dependent variable both in logged or in original units.
FRONTIER solves two general models. The error components model can be formulated as
Yit = Xit  + (Vit - Uit)
Where Yit is the (logged) output obtained by the i-th firm in the t-th time period; Xit is a (kx1)
vector of (transformation of the) input quantities of the i-th firm in the t-th time period;  is a
(kx1) vector of unknown parameters; and Vit are assumed to be iid N(0, v2) random errors, and
Uit = Ui exp (-(t-T)), where Ui are assumed to be iid as truncations at zero of the N(i, u2).
This is the Battese and Coelli (1992) model. However some other models can be summarized as
special cases of this one and can also be solved using FRONTIER. Setting =0, the time
invariant model of Battese, Coelli and Colby (1989) is obtained. The Battese and Coelli (1988)
model results from the previous one for the particular case of problems in which balanced data is
available. If we add =0 to the aforementioned assumptions, the Pitt and Lee (1981) model
results. And if we finally set T=1 in the Pitt and Lee model, we obtain the original cross-sectional
data model of Aigner, Lovell and Schmidt (1977).
If >0, the inefficiency term, Uit, is always decreasing with time, whereas <0 implies that Uit is
always increasing with time. That could be one of the main problems when using this model,
technical efficiency is forced to be a monotonous function of time.
The second model included in the FRONTIER package is the Technical Efficiency (TE) effects
model (Battese and Coelli 1995). It can be expressed as Yit = Xit  + (Vit - Uit), where Yit, Xit, 
and Vit are as defined earlier and Uit ~N(mit, u2), where mit = Zit, Zit is the vector of firmspecific variables which may influence the firms' efficiency. FRONTIER offers also the solution
of the model of Stevenson (1980) which is a particular case of the previous model that can be
obtained for the cases in which T is equal to 1 (for cross-sectional data).
There are two approaches to estimating the inefficiency models. These may be estimated with
either a one step or a two step process. For the two-step procedure the production frontier is first
estimated and the technical efficiency of each firm is derived. These are subsequently regressed
against a set of variables, Zit, which are hypothesised to influence the firms' efficiency. A
problem with the two-stage procedure is the inconsistency in the assumptions about the
distribution of the inefficiencies. In the first stage, the inefficiencies are assumed to be
independently and identically distributed (iid) in order to estimate their values. However, in the
second stage, the estimated inefficiencies are assumed to be a function of a number of firm
specific factors, and hence are not identically distributed unless all the coefficients of the factors
are simultaneously equal to zero (Coelli, Rao and Battese, 1998). FRONTIER uses the ideas of
Kumbhakar, Ghosh and McGuckin (1991) and Reifschneider and Stevenson (1991) and
estimates all of the parameters in one step to overcome this inconsistency. The inefficiency
effects are defined as a function of the firm specific factors (as in the two-stage approach) but
they are then incorporated directly into the MLE. This is something that should be taken into
consideration when programming in some of the general statistical packages.
FRONTIER offers a wide variety of tests on the different functional forms of the models that can
be conducted easily by placing restrictions on the models and testing the significance of the
restrictions using the likelihood ratio test. The FRONTIER program is easy to use. A brief
instruction file and a data file have to be created. The executable file and the start-up file can be
downloaded from the Internet free of charge at the CEPA (University of New England) web page:
http://www.uq.edu.au/economics/cepa/frontier.htm.
Other packages
LIMDEP is a multi-purpose econometric package that can be programmed to estimate several
stochastic frontier models using some of the stochastic frontier estimation routines that are
already implemented in the package. The estimation process can be carried out when the
inefficiency variable is assumed to have the half-normal and truncated normal distribution but
also when the exponential distribution is assumed. Hence, when there seems to be strong
evidence to assume an exponential distribution, LIMDEP may be a better option than
FRONTIER. Among other features, LIMDEP offers the possibility of including random effects
for panel data and estimates of one-sided disturbance components. The package includes the
optimisation of non-linear regression with two-step least squares and user defined maximum
likelihood estimation. There is a LIMDEP Web page at http://www.limdep.com/
The flexibility feature of GAUSS allows the possibility to be adapted for the estimation of
production frontiers. GAUSS includes some non-linear optimisation routines which can be used
for this purpose. Being a very general statistical package also has some disadvantages and the
programming structure is not as easy as some of the other software packages. The GAUSS
program can be ordered at http://www.aptech.com/prod.html. There is also a Web page that
provides answers to questions frequently asked about the program:
http://www.indiana.edu/~statmath/gauss/index.html
The TSP package can be specifically adapted for the estimation of the most basic models of
production frontiers, using the ML user programmed likelihood functions. The program also
includes non-linear three-stage least squares. The program can be ordered and more information
can be obtained at: http://www.tspintl.com/products/tsp
The statistical program Shazam can also be adapted for the estimation of production frontiers. It
allows the optimisation of non-linear functions either by Maximum Likelihood or least squares
and it includes some hypothesis testing. Information about the program can be found at:
http://shazam.econ.ubc.ca/
DEA software
Most of the general-purpose mathematical optimisation software can be adapted to solve Data
Envelopment Analysis problems. Examples of program code for DEA models have already been
published and are readily adaptable (e.g. Olesen and Petersen (1995) present the GAMS code for
a DEA model that can be adapted to suit most analyses). Emrouznejad (2000) developed a SAS
program for different DEA models, including different options as input/output orientation and
CRS/VRS. The Emrouznejad (2000) program and comments can be downloaded from:
http://www.deazone.com/software/index.htm#sasdea.
These general programs offer the possibility of a wide range of applications using non-specialist
DEA software. However, there are several DEA-specific programs that provide a variety of
interesting facilities. Seven of the most common ones are listed below, including the sources
where further information can be found:
1. EMS 1.3 (Efficiency measurement system) by H. Scheel. University of Dortmund. Germany. This
can be downloaded from the net. http://www.wiso.uni-dortmund.de/
2. DEAP 2.1 (Coelli, 1996b). Centre for Efficiency and Productivity Analysis (CEPA). University of
New England, NSW, Australia. It is now free and can also be downloaded from the net, as well as
its manual from the CEPA Web pages http://www.uq.edu.au/economics/cepa/deap.htm
3. Frontier Analyst, version 3. Professional Edition. Banxia Software Ltd. Glasgow. Scotland.
http://www.banxia.com/
4. WARWICK DEA software. Aston Business School. University of Aston. Birmingham, England, UK.
http://www.warwick.ac.uk/~bsrlu/
5. OnFront 2. Economic Measurement and Quality I Lund AB Lund. Sweden.
http://www.emq.se/software.html
6. DEA-Solver Professional 2.0. Saitech Inc. New Jersey. U.S.A. This is the most recent package. Its
manual is the Cooper, Seiford and Tone (2000) book (Data Envelopment Analysis. Kluwer
Academic Publishers). http://www.saitech-inc.com/
7. IDEAS 6.1. Professional version. Consulting Inc., Amherst, MA, USA. http://www.ideas2000.com/
Package Facilities
The key features of the different software packages have been summarised in Table 2.
Table 2. Key features of the packages
Input/ A&P Exog. Catego Multi
Windo Malmq Weigh
CRS
Addit. RTS
outpu Super Fixed
Stag
NIR
w
.
t
VR
Mode inf
t
Vbles rical
e
S
analysis index restr.
S
l
o
orient. Effic.
.
vbles. DEA
EMS
X
DEAP
X
X
X
BANXIA
WARWIC
(*)
K
(*)
OnFront 2 (*)
X
DEA
Solver
X
IDEAS
(*)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
(*)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
(*) This can be carried out by doing some modifications to the program
EMS and DEA-solver appear to offer the most options. Of the software examined, only EMS and
DEA-Solver can be used to carry out window analysis when time series data are available,
although if this is the case, DEAP and OnFront can be used for the calculation of Malmquist
indices. OnFront2 also includes many different options, some of them not offered by any other
package, such as the ability to carry out input and output congestion measures and Capacity
Utilization measures. Warwick software can be adapted to compute Malmquist indices but only
by running the DOS version in batch mode to compute efficiencies. These efficiency scores are
needed later in order to compute the indices; similarly it can be adapted to carry out window
analysis.
The additive model can be estimated using EMS and Warwick software, and the Slacks-BasedMeasure (SBM) model using either EMS or DEA-solver.
All the programs under study - except DEAP - allow the possibility of choosing between an input
or output orientation, the assumption of either constant or variable returns to scale, the possibility
of including non discretionary variables (i.e. exogenously fixed variables), and offer some scale
information. This latter possibility has recently been added in the new version of OnFront under
the name of vector analysis.
Banxia lacks some of the options that other programs offer, such as window analysis, but it
presents the advantage of offering many graphs relating efficiency to the different inputs and
outputs. It also allows the introduction and/or elimination of some of the decision-making units
(DMUs). It is a very user-friendly package and very easy to work with even for those who do not
have a wide knowledge of DEA. A thorough user´s guide is also provided. One-day workshops
introducing some of the theory of DEA, its practical applications and how to handle the program
take place regularly in London. Details regarding contents and cost information can be found at:
http://www.banxia.com/training/bxfanalyst.html
Regarding the use of categorical variables, only IDEAS and DEA-Solver offer their inclusion as
an option in the different models. The possibility of including weak or strong disposability of
inputs has been added in the new version of OnFront.
EMS, Warwick software and the new version of OnFront are the only packages that compute
Andersen and Petersen super-efficiency scores. EMS and DEA-solver include the possibility of
working with non-convex technologies and to compare different programs' efficiencies.
EMS and DEAP offer a good on-line guide for users. Banxia and OnFront provide a reference
guide to users together with the program. DEA-Solver program just includes a very brief
reference guide in the package, while more detailed explanations about the options can be found
in the associated reference book (Cooper et al. 2000).
Operating System and PC Requirements.
Most of the software packages run under Windows though some of them, like OnFront and
Warwick DEA, offer both DOS and Windows versions. There are some others like PIONEER
(not included in the above table) that run under Unix (for additional information contact
barr@seas.smu.edu). DEAP is the only package among those presented here that does not offer a
Windows version. It is based on MS-DOS and the input files have to be text files. It also lacks
some of the options that other programs offer. In spite of these disadvantages, DEAP presents
many desirable technical characteristics. Foremost of these is that DEAP allows a multi-step
procedure to be used in solving the DEA models. This can be either a one-step, two-step or a
multi-stage procedure. This is important as using a one-step procedure can produce instabilities.
Many of the other programs, such as the Warwick Software, avoid instabilities using a two-step
procedure but wrong peers can be obtained for some of the inefficient units. Both problems can
be overcome by the use of the DEAP multi-step procedure. DEAP does not requires a large
memory PC and being currently free of charge is another good reason for using it.
The operating System used by all programs, with the exception of DEAP, is Windows 95 or
later(note) and NT. EMS accepts Excel97 or text data files. Warwick and OnFront require input in
the form of text files (although this restriction can be lessened using the Windows clipboard).
The new version of Banxia offers a direct data import from Excel and SPSS and the reports
produced there can be directly exported to RTF (as Word), Excel and html files. The DEA-solver
is based in Excel 97. Access, Dbase, Excel and Text files are all accepted files for IDEAS.
Requirements regarding the capacity of the computer are highly related to how sophisticated the
software is. The most demanding are Banxia and OnFront. Banxia software requires 50MB of
hard disk space, 32 MB of RAM and they strongly recommend using a Pentium PC. OnFront
requires at least 16 MB of RAM and 8Mb of disk space. DEAP only requires 4 MB of RAM and
it can even be run using a 386 PC. However it is not as user-friendly as the others.
Software Prices
The prices in 2001 (excluding VAT, shipping and handling) for the software under analysis are
presented in Table 3.
Table 3. Price for main DEA software, 2001 (excludes VAT)
Academic price Commercial price
Additional licence Additional licence
(academic)
(commercial)
EMS 1.3
Free
Free
Free
Free
DEAP 2.1
Free
Free
Free
Free
BANXIA 3
£195-595
£395-2395
35% discount
35% discount
WARWICK
£200/500
£200/500
-
-
OnFront2
$750
$1750
$450
$900
DEAsolver 2.0 $800
$1600
$400
$800
$280/2800
-
-
IDEAS
$140/1400
The prices are for individual licences either for academic or business purposes. There are some
other prices for Campus licences, educational purposes, etc. Some of the software presented here,
such as Banxia, Warwick or IDEAS, can be obtained at different prices if the number of DMUs
under study is limited. Banxia software offers a wide variety of prices depending on the
maximum number of units it analyses. The lower level is for 75 units (£195) then 250 (£295),
500 units (£395) 1500 units (£495) and 2500 units (£595). The same range can be bought for
commercial purposes and prices vary from £395 to £2395. Warwick software offers a limited
version for analysis undertaking less than 50 units (£200) and a full version for an unlimited
number of DMUs (£500). The same applies for the DOS Warwick version (£150/300). IDEAS
also offers two versions, the standard version which allows analyses of up to 500 DMUs, 20
variables and 20 price ratios and the professional version for analyses of up to 2,000 DMUs, 25
variables and 120 price ratios. The rest of the programs can be used for any number of units
(within the capacity of the memory of the computer). Banxia has recently launch some extra
packages at different prices: Cross efficiency module (£95), 500 unit three-month-trial licence
programme (£295), Support and maintenance for a year (£229) and the upgraded from the
previous version is offered by obtaining the CD and new manual (£169) or by obtaining an
electronic copy (£129, with no physical products).
Acknowledgement
This review was undertaken as part of the EU funded project 'Technical efficiency in EU
fisheries: implications for monitoring and management through effort controls' (QLK5-CT199901295).
Note
Warwick software also allows for Windows 3.1
References
Aigner, D., C. A. K. Lovell and P. Schmidt (1977). Formulation and estimation of stochastic
frontier production function models. Journal of Econometrics 6: 21-37.
Banxia Software (1998). Banxia Frontier Analyst. User's Guide. Professional Edition. Frontier
Analyst.
Battese, E. and T. J. Coelli (1988). Prediction of firm level technical inefficiencies with a
generalised frontier production function and panel data, Journal of Econometrics, 38, 387-399.
Battese, G. E., T. J. Coelli and T. C. Colby, (1989) Estimation of Frontier Production Functions
and the Efficiencies of Indian Farms Using Panel Data from ICRISAT's Village Level Studies.
Journal of Quantitative Economics, 5(2): 327-48
Battese, G. E. and T. J. Coelli (1992). Frontier Production Function, technical efficiency and
panel data: with application to Paddy Farmers in India. Journal of Productivity Analysis 3: 153169.
Battese, G. E. and T. J. Coelli (1995). A model for technical inefficiency effects in a stochastic
frontier production function for panel data. Empirical Economics 20: 325-332.
Coelli, T. (1996a). A guide to FRONTIER version 4.1: a computer program for frontier
production function estimation. CEPA Working Paper 96/07, Department of Econometrics,
University of New England, Armidale, Australia.
Coelli, T. J. (1996b). 'A guide to DEAP version 2.1: A Data Envelopment Analysis (Computer)
program'. CEPA Working Papers No.8/96. ISSN 1327-435X. ISBN 1 86389 4969.
Coelli, T. J., D. S. P. Rao and G. E. Battese (1998). An Introduction to Efficiency and
Productivity Analysis, Kluwer Academic Publishers: USA.
Cooper, W. W., Seiford, L. M. and Tone, K. (2000)., Data Envelopment Analysis. Kluwer
Academic Publishers. ISBN 0792386930.
Emrouznejad, A. (2000). An Extension to SAS/OR for Decision System Support, Proceeding of
the 25 SAS User Group International Conference, April 2000, Indianapolis pp1456-1461.
Färe, R. and Grosskopf, S. (1998). 'Reference Guide to OnFront'. Economic Measurement and
Quality I Lund AB Lund. Sweden.
Farrell, M. J. (1957). The measurement of productive efficiency, Journal of the Royal Statistical
Society, 120, 252-90.
Greene, W. (1995). LIMDEP (Version 7): User's Manual and Reference Guide, Econometric
Software Inc., New York.
Greene, W. H. (1993). "Frontier Production Functions", EC-93-20. Stern School of Business,
New York University.
Hollingsworth, B., (1997). 'A review of Data Envelopment Analysis Software'. The Economic
Journal, 1997, 1268-1270
Hollingsworth, B., (1999). Data Envelopment Analysis and Productivity Analysis: a review of
the options. The Economic Journal, 458-462.
Kumbhaker, S. C. and Lovell C. A. K. (2000). Stochastic Frontier Analysis. Cambridge:
Cambridge University Press.
Kumbhakar, S., S. Ghosh and J. McGuckin. (1991). A generalized Production Frontier Approach
for Estimating Determinants of Inefficiency in U.S. Dairy Farms. Journal of Business and
Economic Statistics, 9:279-286.
Lewin, A. Y. and C. A. K. Lovell (1990). Editors Introduction, Journal of Econometrics, 46, 3-5.
Olesen and Petersen (1995). A presentation of GAMS for DEA. Computers & Operations
Research 23(4) pp 323-339
Pitt, M. M. and L. F. Lee. (1981). The measurement and sources of technical inefficiency in the
Indonesian weaving industry. Journal of Development Economics, 9:43-64.
Reifschneider, D. and R. Stevenson. (1991). Systematic departures from the frontier: A
framework for the analysis of firm inefficiency. International Economic Review, 32 (3):715-723.
Scheel, H. (2000). EMS: Efficiency Measurement System User´s Manual. Version 1.3
Seiford, L. M. and R. M. Thrall (1990) Recent developments in DEA: the mathematical
programming approach to frontier analysis, Journal of Econometrics, 46, 7-38.
Sena, V. (1999). Stochastic frontier estimation: a review of the software options. Journal of
Applied Econometrics, 14: 579-586.
Stevenson, R. E., (1980). Likelihood function for generalized stochastic frontier estimation,
Journal of Econometrics, 13, 57-66.
Download