Efficiency and Productivity Measurement: Bootstrapping DEA Scores D.S. Prasada Rao School of Economics The University of Queensland, Australia 1 Measures of Reliability for DEA Scores As DEA is a non-parametric and non-stochastic approach, efficiency scores from DEA have been treated as nonstochastic. However, there are attempts to see how DEA scores are affected by changes in data – mainly to see the effect of outliers. Simar and Wilson have been working on the problem of generating standard errors for DEA scores using “bootstrap” technique. An alternative to the bootstrap technique is the technique of “jackknife” which is a simpler technique. 2 Jackknife Technique • Run DEA and get efficiency scores for each of the DMUs in the data set. • Drop one DMU at a time and use the remaining data to compute DEA scores for the remaining DMUs. • Repeat this until the full sample is covered. At this stage, we will have M-1 efficiency scores for each of the M DMUs in the sample. • Compute standard deviation for each of the efficiency scores using M-1 different estimates. • This is a fairly mechanical procedure, but provides an indication about the presence of outliers – in such cases dropping a DMU may change the scores significantly. 3 THE DEA BOOTSTRAP Monte Carlo simulation experiments are often used to estimate the sampling distributions of econometric estimators. Such experiments typically involve several steps: Specify a data generating process (DGP) 1. Use the DGP to generate data (i.e., simulate). 2. Apply the estimator to the generated data. 3. Repeat from Step 2. The distribution of the estimates obtained in step 3 approximates the sampling distribution of the estimator. The bootstrap is a form of Monte Carlo experiment where the DGP is unknown. 4 Alternative DEA Bootstrap Methods Methods for conducting a DEA bootstrap have been suggested by • Ferrier and Hirschberg (1997) • Lothgren and Tambour (1997) • Simar and Wilson (1998) We only discuss the Lothgren-Tambour (LT) method because • Simar and Wilson (1997) identify theoretical problems with the Ferrier-Hirschberg (FH) method. • Lothgren (1998) provides evidence that the LT method outperforms the Simar-Wilson (SW) method. • the LT method is relatively straightforward. 5 The DGP Let us consider input-oriented DEA models where the output vectors q1, …, qI are treated as fixed. We need to specify a DGP that will allow us to generate data on x1, …, xI. Let i D I ( x i , q i ). Then x *i x i i is a technically-efficient input combination capable of producing qi. Suppose the process generating the distances for all firms is ( 1 ,..., I ) ~ iid F . Then a DGP for x1, …, xI is completely characterised by x 1* ,..., x *I , q1, …, qI and F. 6 Example (x2/q) 5 . x =ρ x 4 2 * 2 2 ρ2 = 2 = (2, 4) 3 2 . x = (1, 2) * 2 1 0 q=1 1 2 3 4 5 (x1/q) 7 Estimating the DGP Let ˆ i denote the DEA estimate of ρi (computed as the inverse of the optimised value of the DEA objective function). We estimate x 1* ,..., x *I by projecting xi onto the estimated frontier: * xˆ i x i ˆ i i = 1, …, I, We estimate F using the empirical distribution function (EDF) of the ˆ i : I 1 if i ˆ i , i 1,..., I Fˆ ( i ) 0 otherw ise 8 Example cont. (x2/q) 5 . 4 . 3 ˆ 2 1 / 0.682 x2 = (2, 4) * xˆ 2 x 2 ˆ 2 (1.364, 2.728) 2 q=1 1 0 1 2 3 4 5 (x1/q) 9 The Bootstrap Algorithm To obtain B bootstrap samples: Use the observed data to estimate the input-oriented DEA model, and project the observed data points onto the frontier using xˆ *i x i ˆ i . Set b = 1. b b 1. Draw 1 ,..., I independently from Fˆ and generate the bootstrap sample x 1b ,..., x bI using x bi ib xˆ *i . 3. Use the bootstrap sample to estimate the DEA frontier. Set b = b + 1. 4. Repeat from Step 2 until b = B. These B bootstrap samples can be used to construct confidence intervals. 10 Example cont. In the hospital example ( ˆ 1 , ˆ 1 , ˆ 1 , ˆ 1 ) (1, 1.45, 1, 1.33) and * xˆ 1 (1, 2) * xˆ 3 (3, 1.5) * xˆ 2 (1.364, 2.728) * xˆ 4 (1.5, 3.75) To illustrate generation of the first bootstrap sample, suppose 4 drawings from the U(0,1) distribution happen to be 0.46, 0.76, 0.18 and 0.92. This implies ( 11 , 21 , 31 , 41 ) (1.45, 1.33, 1, 1.33) and 1 1 * xˆ 1 1 xˆ 1 (1.45, 2.90) 1 1 * xˆ 3 3 xˆ 3 (3, 1.5) 1 1 * xˆ 2 2 xˆ 2 (1.81, 3.63) 1 1 * xˆ 4 4 xˆ 4 (2, 5) We then solve the DEA problem using this data. 11 Bias and SE’s for DEA Scores Let ˆ i be the computed DEA score for firm i in the sample. Suppose ˆ i1 ,..., ˆ iB be the scores generated from the bootstrapped sampling procedure which is conducted B times. Then we can compute bias and SE as: Est. Bias ˆ i 1 B B b ˆ i ˆ i b 1 2 1 B B 1 B b Est. S.D. ˆ i ˆ i ˆ i B b 1 B 1 b 1 1/ 2 12 Some remarks • It is a computationally intensive exercise to compute bias and standard errors for DEA scores but the idea is quite simple. • The analytical aspects involved in proving that the bootstrapped bias and standard errors are consistent are quite difficult. That is where much of the work is focused. • The model we have looked at simply generates technical efficiency scores using a simple random sample without replacement – this ignores any firm-specific characteristics that may drive inefficiencies. • It may be possible to make use of a second stage regression and residuals from the regression to bootstrap after taking into account firm specific characteristics. 13