boston10_lee

advertisement
An Efficient Data Envelopment Analysis
with a large data set in Stata
15-16 July, 2010
Boston10 Stata Conference
Choonjoo Lee, Kyoung-Rok Lee
sarang90@kndu.ac.kr, bloom.rampike@gmail.com
Korea National Defense University
Contents
Part I. A Large Data Set in Stata/DEA
Large Data Set in DEA?
Computational Aspects of Large Data Set
The Scope of this Study
Efficiency Matters in Stata/DEA/Linear Programming
Tasks to be covered
Part II. Malmquist Index Analysis with the Panel Data
Basic Concept of Malmquist Index
The User Written Command “malmq”
Part I. A Large Data Set in Stata/DEA
Large Data Set in DEA?
Computational Aspects of Large Data Set
The Scope of this Study
Efficiency Matters in Stata/DEA/Linear Programming
Tasks to be covered
Large Data Set in DEA?
• Graphical illustration of DEA concept
Large Data Set in DEA?
• Variables and Observation Constraints by the Features of DEA
Domain Programs(Language)
– Statistical Package based DEA Programs
– Spreadsheet based DEA Programs
– Language based DEA Codes
• Performance of Linear Program(LP): Efficiency and Accuracy
– LP is the Critical Component of DEA Program
– Approaches to Solve LP: Simplex, Interior Point Methods(IPMs)
☞ Numerous Variants of the Basic LP Approach
• DEA Report Format(User Interface Design)
– Results(input, output)
– Graphical Display
– Log
Computational Aspects of Large Data Set
• Matrix Size for the Data Set in Matrix Format
– # of rows and columns(variables and observations) allowed by the Program
– The storage limit of the computer memory
 upgrade of computer technology, the way to access the data in the memory
• Matrix Density
– # of nonzeros of the matrix
– How many zero elements in the matrix?
• A Computationally Demanding Procedure of DEA due to the LP
– The number of iterations needed to solve a problem grows exponentionally as a
function of variables and observations
• Numerical Difficulties
– Inaccuracy and inefficiency due to the Floating Point Arithmetic with finite
precision
– Numerical Precision due to the binary representation of number
The Scope of this Study
• Performance of DEA code
– Linear Program/Simplex Method
– Computational Technique
– Illustration
• Panel Data in DEA
– Malmquist Index Analysis
Efficiency Matters in Stata/DEA/LP
• DEA program demands heavy computation
– Computation time heavily depends on the number of
observations(DMUs), variables(inputs, outputs), LP
process, etc.
• Stata uses RAM(memory) to store data
– The memory size matters for the large data set
Efficiency Matters in Stata/DEA/LP
• The performance of Input Oriented DEA models
Model
Computation
(sec)
Memory
5-2-2-V1
~20
1G
5-2-2-V2 (released)
<2
<300M
Basic feasible solution
5-5-5-V3
<1
<300M
Revised Simplex
Method
365-1-5-V1
?
6G
365-1-5-V2*
~14600
6G
Two-stage LP
365-1-5-V3*
(under development)
20
<300M
Mata, Tolerance
※ Stata SE
Major Areas Revised
Efficiency Matters in Stata/DEA/LP
• Understanding the difference of computation
Method
Tableau
Simplex
Revised
Simplex
Operation
Pivoting
Pricing
Total
Multiplication,
Division
(m+1)(n-m+1)
m(n-m)+n+1
Addition,
Subtraction
m(n-m+1)
m(n-m+1)
Multiplication,
Division
(m+1)2
m(n-m)
m(n-m)+(m+1)2
Addition,
Subtraction
m(m+1)
m(n-m)
m(n+1)
– if the number of observations(n) becomes significantly
larger than the number of variables(m)?
Efficiency Matters in Stata/DEA/LP
• Tableau and Revised Simplex in DEA/LP
– Data
Input Data
Output Data
Store
Employee
Area
Sales
Profit
A
10
20
70
6
B
15
15
100
3
C
20
30
80
5
D
25
15
100
2
E
12
9
90
8
Source: Cooper et al.(2006), table3-7
Efficiency Matters in Stata/DEA/LP
• Tableau and Revised Simplex in DEA/LP
– For DMU A
Store
A
Input Data
Employee
Area
10
20
Output Data
Sales
Profit
70
6
– The Basic DEA Models
Orientation
Constant Return to Scale
Variable Returns to Scale
Input
Oriented
Min θ
s.t. θxA - Xλ ≥ 0
Yλ -yA ≥ 0
λ≥0
Min θ
s.t. θxA - Xλ ≥ 0
Yλ -yA ≥ 0
eλ=1
λ≥0
Output
Oriented
Max η
s.t. xA - Xμ ≥ 0
ηyA -yμ ≤ 0
μ≥0
Max η
s.t. xA - Xμ ≥ 0
ηyA -yμ ≤ 0
eλ=1
μ≥0
Efficiency Matters in Stata/DEA/LP
• Program Structure
DATA
Stata/DEA
DEA Options:
RESULT
Basic, Variants
Data conversion:
S caling, Tolerance
Input &
Output
Variables
data file
DEA Loop:
RTS , O RT
Linear
Programming:
S implex Method
Basic Solution
Generating
DEA result Report
Files of
Efficiency
& Lambdas
Efficiency Matters in Stata/DEA/LP
• Program Syntax
dea ivars = ovars [if] [in] [, rts(crs | vrs | drs | irs) ort(in
| out) stage(1 | 2) trace saving(filename)]
– rts(crs | vrs | drs | irs) specifies the returns to scale. The default,
rts(crs), specifies constant returns to scale.
– ort(in | out) specifies the orientation. The default is ort(in),
meaning input-oriented DEA.
– stage(1 | 2) specifies the way to identify all efficiency slacks.
The default is stage(2), meaning two-stage DEA.
– trace specifies to save all the sequences displayed in the Results
window in the dea.log file. The default is to save the final
results in the dea.log file.
– saving(filename) specifies that the results be saved in
filename.dta.
Efficiency Matters in Stata/DEA/LP
• Develop the Basic Data Bank(input oriented CRS)
– Canonical form
Min θ
s.t. 10θ - 10λA - 15λB - 20λC 20θ - 20λA - 15λB - 30λC -
15λD - 9λE
70λA+ 100λB + 80λC + 100λD + 90λE
6λA +3λB + 5λC +
≥0
25λD - 12λE
2λD + 8λE
≥0
≥ 70
≥6
– Standard form
Min θ
s.t. 10θ - 10λA - 15λB - 20λC - 25λD - 12λE - S120θ - 20λA - 15λB - 30λC - 15λD - 9λE
70λA + 100λB + 80λC + 100λD + 90λE
6λA + 3λB + 5λC +
2λD + 8λE
+ x1
- S2 -
=0
+ x2
- S1 +
-S2 +
=0
+ x3
= 70
+x4 = 6
Efficiency Matters in Stata/DEA/LP
• Model V1: Tableau DEA
x1
x2
x3
x4
X
1
0
0
0
0
θ
0
10
20
0
0
λA
0
-10
-20
70
6
λB
0
-15
-15
100
3
λC
0
-20
-30
80
5
λD
0
-25
-15
100
2
λE
0
-12
-9
90
8
S10
-1
0
0
0
S20
0
-1
0
0
S1+
0
0
0
-1
0
S2+
0
0
0
0
-1
x1
-1
1
0
0
0
x2
-1
0
1
0
0
x3
-1
0
0
1
0
x4
-1
0
0
0
1
RHS
0
0
0
70
6
x1
x2
x3
x4
1
0
0
0
0
30
10
20
0
0
46
-10
-20
70
6
73
-15
-15
100
3
35
-20
-30
80
5
62
-25
-15
100
2
77
-12
-9
90
8
-1
-1
0
0
0
-1
0
-1
0
0
-1
0
0
-1
0
-1
0
0
0
-1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
76
0
0
70
6
×
×
70/90
6/8
Ⅰ
x1
x2
x3
λE
1
0
0
0
0
30
10
20
0
0
-47/4 353/8 -105/8 171/4
-1
-21/2 -25/2 -22
-53/4 -93/8 -195/8 -51/4
5/2 265/4 95/4 155/2
6/8
3/8
5/8
2/8
0
0
0
0
1
-1
-1
0
0
0
-1
0
-1
0
0
-1
0
0
-1
0
69/8
-3/2
-9/8
45/4
-1/8
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
-77/8
3/2
9/8
-45/4
1/8
73/4
9
27/4
5/2
6/8
×
×
10/265
1/2
MRT
Efficiency Matters in Stata/DEA/LP
• Model V1: Tableau DEA
Z
θ
λA
λB
λC
λD
λE
S1-
S2-
S1+
S 2+
RHS
Ⅴ
1
0
0
-11/70
-32/35
-89/70
0
-39/350
1/175
-1/70
0
1
λA
0
0
1
1/7
6/21
-33/21
0
-6/35
3/35
-1/70
0
1
35/3
θ
0
1
0
-11/70
-32/35
-267/210
0
-39/350
1/175
-1/70
0
1
175/1
S2+
0
0
0
41/7
43/21
152/21
0
4/105
-2/105 -159/1855
1
0
×
λE
0
0
0
49/8
59/24
182/21
1
1/6
-1/12
-159/2120
0
0
×
Ⅵ
1
0
-1/15
-1/6
-14/15
-7/6
0
-1/10
0
-1/75
0
14/15
S2-
0
0
35/3
5/3
10/3
-55/3
0
-2
1
-1/6
0
35/3
θ
0
1
-1/15
-1/6
-14/15
-7/6
0
-1/10
0
-1/15
0
14/15
S2+
0
0
2/9
53/9
19/9
62/9
0
0
0
-4/45
1
2/9
λE
0
0
35/36
451/72
177/72
257/36
1
0
0
-4/45
0
35/36
– Efficiency score(θ) of DMU A is 14/15
MRT
Efficiency Matters in Stata/DEA/LP
• Model V3: Revised DEA
c
0
0
A
I
b
cB
cN
0
B
N
b
0
cN-cBB-1N
I
B-1N
cBB-1b
B-1b
Efficiency Matters in Stata/DEA/LP
• Model V3: Revised DEA
cN
cB
X
1
θ
0
λA
0
λB
0
λC
0
λD
0
λE
0
S10
S20
S1+
0
S2+
0
x1
-1
x2
-1
x3
-1
x4
-1
RHS
0
x1
0
10
-10
-15
-20
-25
-12
-1
0
0
0
1
0
0
0
0
x2
0
20
-20
-15
-30
-15
-9
0
-1
0
0
0
1
0
0
0
x3
0
0
70
100
80
100
90
0
0
-1
0
0
0
1
0
70
x4
0
0
6
3
5
2
8
0
0
0
-1
0
0
0
1
6
N
– Step1: Set up the initial tableau factors.
B
– Step2: Find entering variable.
– Step3: Find leaving variable.
– Step4: Update the tableau. (Update the basis.)
b
Efficiency Matters in Stata/DEA/LP
• Model V3: Revised DEA
- 1st step: The initial tableau factors.
B=
xB=
CBB-1=
CB=
- 2nd step: Finding entering variable
cN -cBB-1N: Max value is selected as a entering variable
θ
λA
λB
λC
λD
λE
S1 -
S2 -
S1 +
S2 +
30
46
73
35
62
77
-1
-1
-1
-1
Max
- 3rd step: Finding entering variable
B-1N =
Min{xB/(B-1N)} ={×, ×, 70/90, 6/8} = 6/8 (←x4)
Efficiency Matters in Stata/DEA/LP
• Model V3: Revised DEA
cN
- 4th step: Update the tableau
cB
X
1
θ
0
λA
0
λB
0
λC
0
λD
0
λE
0
S10
S20
S1+
0
S2+
0
x1
-1
x2
-1
x3
-1
x4
-1
RHS
0
x1
0
10
-10
-15
-20
-25
-12
-1
0
0
0
1
0
0
0
0
x2
0
20
-20
-15
-30
-15
-9
0
-1
0
0
0
1
0
0
0
x3
0
0
70
100
80
100
90
0
0
-1
0
0
0
1
0
70
x4
0
0
6
3
5
2
8
0
0
0
-1
0
0
0
1
6
N
B
b
X
1
θ
0
λA
0
λB
0
λC
0
λD
0
x4
-1
S10
S20
S1+
0
S2+
0
x1
-1
x2
-1
x3
-1
x4
0
RHS
0
x1
0
10
-10
-15
-20
-25
0
-1
0
0
0
1
0
0
-12
0
x2
0
20
-20
-15
-30
-15
0
0
-1
0
0
0
1
0
-9
0
x3
0
0
70
100
80
100
0
0
0
-1
0
0
0
1
90
70
λE
0
0
6
3
5
2
1
0
0
0
-1
0
0
0
8
6
Tasks to be covered
• Computational Accuracy
– Example: Obtaining Inverse Matrix
• Matrix D
1 1.341099143 -61.13394928 0.4455321 1.883781314
0
0
0 0.0588235
0
2.587946653
0
0 0.116421975 -6.672515869 -0.110761 0.495342732 -0.097138606
0 -0.172319263 -19.71403694 -0.262333
0.074690066
1.54739666
0 -0.046367686 -4.060891628 -0.082268
0.009800959
0.25169459
0 0.105886854
0.015884314
0.037229143
4.651313305 0.1136269
Tasks to be covered
• Computational Accuracy
– Example: Obtaining Inverse Matrix
• Inverse matrix D by Stata/Mata “luinv (D)”
1
162470623.2
-4.022811871 -81235306
487411816.6
81235289.98
0
-147760451.4
-0.087162294
73880208 -443281245.5
-73880196.74
0
3410527.559
0.007873073
-1705264
10231581.38
1705263.517
0
16.99999999
-2.96E-17
-2.77E-08
1.66E-07
2.77E-08
0
86785601.44
2.18378179 -43392792
260356746.7
43392788.04
0
31184842.39
0.196004759 -15592418
93554511.28
15592419.02
Tasks to be covered
• Computational Accuracy
– Example: Obtaining Inverse Matrix
• Inverse matrix D by Stata/Mata “luinv (D)”
. mata
mata (type end to exit)
: st_view(X=.,.,(" a1"," a2"," a3"," a4"," a5","a6"))
: b=luinv(X)
: b
1
2
3
4
5
6
1
2
3
4
5
1
0
0
0
0
0
162470623.2
-147760451.4
3410527.559
16.99999999
86785601.44
31184842.39
-4.022811871
-.0871622935
.0078730725
-2.95716e-17
2.18378179
.1960047586
-81235305.55
73880208.39
-1705263.586
-2.76977e-08
-43392791.54
-15592418.13
487411816.6
-443281245.5
10231581.38
1.66186e-07
260356746.7
93554511.28
6
1
2
3
4
5
6
81235289.98
-73880196.74
1705263.517
2.76977e-08
43392788.04
15592419.02
Tasks to be covered
• Computational Accuracy
– Example: Obtaining Inverse Matrix
• D*D-1 in Stata/Mata(default tolerance)
1
5.96E-08
2.36E-08 -3.73E-08
5.96E-08
-7.45E-08
0
1.000000003
-1.74E-18 -1.63E-09
9.78E-09
1.63E-09
0
4.66E-10
1 -1.63E-09
-2.98E-08
-3.96E-09
0
-1.49E-08
1.81E-09
0
-7.45E-09
0
-2.79E-09
2.95E-10
4.66E-10 0.999999989
-1.40E-09
0
4.66E-09
1
3.84E-11 -1.28E-09
 Should it be Identity Matrix?
7.45E-09
1.000000001
Tasks to be covered
• Computational Accuracy
– Example: Obtaining Inverse Matrix
• D*D-1 in Excel
1
5.96046E-08 -7.77156E-16
0
0.999999999
2.72414E-17
0
7.31257E-09
0
0
4.19095E-09
1
6.98492E-10
1.49012E-08
7.21775E-09
0
1.49012E-08
0
0.999999996
0
0
0
9.31323E-10 -3.46945E-17 -4.65661E-10
0 -4.88944E-09
4.85723E-17
7.45058E-09 -5.96046E-08 -1.49012E-08
0.999999996 -9.31323E-10
4.19095E-09 -2.42144E-08
 Where the computational inaccuracy comes from?
1
Tasks to be covered
• Computational Accuracy
– One of the possible reasons: Decimal and Binary numbers
17(decimal number)
• 17 / 2 = 1
0.75(decimal)  0.11(binary)
• 8/2=0
0.7(decimal)  0.101100110011(binary)
• 4/2=0
0.6(decimal)  0.100110011001(binary)
• 2/2=0
0.10(decimal)  0.000110011001(binary)
• 1/2=1
0.05(decimal)  0.000011001100(binary)
= 10001(binary number)
 How computer saves a=0.75, b=0.7+0.05, c=0.6+0.1+0.05?
Tasks to be covered
• Accuracy
– Tolerance
• to set upper or lower limit on the number of iterations.
• to stop an unattended run if the algorithm falls into a cycle
– Preprocessing: Scaling
• to improve the numerical gap and get a safe solution.
Ex) Rank(D)
Part II. Malmquist Index Analysis with the Panel Data
Basic Concept of Malmquist Index
The User Written Command “malmq”
Basic Concept of Malmquist Index
• Malmquist Productivity Index(MPI) measures
the productivity changes along with time
variations and can be decomposed into changes
in efficiency and technology.
Basic Concept of Malmquist Index
Basic Concept of Malmquist Index
The input oriented MPI can be expressed in terms of input oriented
CRS efficiency as Equation 1 and 2 using the observations at time t
and t+1.
Basic Concept of Malmquist Index
The input oriented geometric mean of MPI can be decomposed
using the concept of input oriented technical change and input
oriented efficiency change as given in equation 4.
The User written command “malmq”
• Program Syntax
malmq ivars = ovars [if] [in] [, ort(in | out)
period(varname) trace saving(filename)]
– ort(in | out) specifies the orientation. The default is ort(in),
meaning input-oriented DEA.
– period(varname) identifies the time variable.
– trace specifies to save all the sequences displayed in the Results
window in the malmq.log file. The default is to save the final
results in the malmq.log file.
– saving(filename) specifies that the results be saved in
filename.dta.
The User written command “malmq”
• Example
– Data
The User written command “malmq”
• Example
– Result
The User written command “malmq”
• Example
– Result
Notes
• The data and code related to the presentation will
be available from the Conference website.
References
• Cooper, W. W., Seiford, L. M., & Tone, A. (2006). Introduction to
Data Envelopment Analysis and Its Uses, Springer
Science+Business Media.
• Ji, Y., & Lee, C. (2010). “Data Envelopment Analysis”, The Stata
Journal, 10(no.2), pp.267-280.
• Lee, C., & Ji, Y. (2009). “Data Envelopment Analysis in Stata”,
DC09 Stata Conference.
• Maros, Istvan. (2003). Computational techniques of the simplex
method, Kluwer Academic Publishers.
Download