Estimation of Managerial Efficiency in Baseball: A Bayesian Approach Kwang-shin Choi Arizona State University This Version: May, 2011 Abstract Stochastic frontier model can be used for measuring inefficiency of production. In this paper, stochastic frontier function is used for measuring the efficiency of baseball manager. This paper uses stochastic frontier analysis with a Bayesian approach because it shows better efficiency with baseball dataset compared to the confidence interval from multiple comparison with the best method used by classical approach. Using result from estimation, this paper try to answer some interesting questions in baseball which have been discussed often but not yet answered clearly. 1. Introduction In this article, we estimate the managerial efficiency in baseball game using stochastic frontier function with a Bayesian approach. To explain the goal and the organization of this paper, we start from briefly introducing the concept of stochastic frontier function and the reason of using a Bayesian approach. Stochastic frontier analysis is widely used in the estimation of firm efficiency and productivity. This method is first introduced by Aigner, Lovell, and Schmidt (1977). In the paper, they suggest an approach to the estimation of frontier production functions. They define frontier function for firm i , and it is given by: yi f ( xi ; ) (1) where yi is the maximum level of output, xi is a vector of input and is an unknown parameter vector. Then they assume there exists a technical inefficiency as deviation of actual from maximum level of output. With existence of technical inefficiency, frontier production function is given by: yi f ( xi ; ) i (2) where 0 i 1 is a measure of firm specific inefficiency. Stochastic frontier function comes from equation (2) by adding random shock which is outside the control of the firm. Then it is given by: yi f ( xi ; ) i eui (3) where eui is error term. By using error term, we can fix the critical problem shared by all the deterministic frontier estimation models. That problem is any deviation of an observation from the frontier must be attributed to inefficiency because deterministic model does not assume the existence of statistical noise or measurement error. Schmidt and Sickles (1984) provide the way to solve some difficulties originally presented in Aigner, Lovell, and Schmidt (1977) by using panel data. In this paper, they show that if we use panel data and the number of T is large enough, we can avoid three difficulties: 1. Consistency question in the estimation of technical inefficiency 2. Choice of distributional assumption for the distribution of technical inefficiency 3.Probable correlation between regressors and inefficiency. A Bayesian approach to stochastic frontier analysis is introduced by Broeck, Koop, Osiewalski and Steel (1994). They show that when T is not large enough, the difficulty from the choice of 1 distributional assumption of technical efficiency can be reduced by Bayesian approach. In the paper, we additionally show that estimation output from a Bayesian approach shows better efficiency compared to other approaches. We compare the interval of estimated technical efficiency estimation from a Bayesian approach with the Multiple Comparison with the Best interval provided by Horrace and Schmidt (2000). Next, here are the reasons we choose the managerial efficiency in baseball game as the target of estimation. The first reason is that the role of manager on the field is exactly same with the role of factory manager who is controlling the technical efficiency in turning limited input into output. The second reason is that there is no stat to describe the ability of a manager. Using technical efficiency, we can obtain the way to compare the ability of managers. Baseball data is used for this paper for the following reason: Baseball is the most quantified among all the sports and baseball data is easy to be approached. In other sports, their outputs are not clearly linked to inputs at the level of baseball. The strong linkage of data set to the real game makes baseball the ideal target of estimation to test stochastic frontier analysis. In the sports economics, there have been works on estimating technical efficiency of coach or organization in team sports using stochastic frontier function. Dawson, Dobson and Gerrard (2000) estimate efficiency of coach in English Premier League from 1992 to 1998 using panel data stochastic frontier model. But this paper is limited by the unclear correlation between output and input variable. Input variables used in the papers are player age, league experience, career goals,number of previous club, goals in previous season, and player's previous division. Those input variables are not directly related to the output: winning percentage. Rimler, Song and Yi (2010) estimate technical efficiency in Atlantic 10 conference in NCAA Basketball game. They argue that managerial efficiency difference is trivial and focus more on the contribution of player stat on winning percentage. In baseball, Porter and Scully (1982) use frontier model to estimate managerial efficiency and Ruggiero, Hadley and Gustafson (1996) evaluate managerial efficiency using Data Envelopment Analysis method. Both papers are using non-stochastic model so they inevitably have the drawbacks all the deterministic frontier models share. This paper is organized as follows. Section 2 describes the model used for estimation. Section3 explains the data we use for this paper. Section 4will answer some interesting questions in baseball using the results from estimation. Some concluding remarks follow in Section 5. 2 2. Model Model for estimating managerial efficiency comes from equation (3). Let's assume production function f (; ) is Cobb-Douglas production function .Using log linear transformation, equation (3) becomes: ln yi ln x 'i uit zi (4) where zi log i . We will use panel data for the estimation. Model used for panel data analysis can be given easily form equation (4) like following: ln yit ln x 'it ui zit , i 1, , N , t 1, (5) ,T Here i indexes teams and t indexes time periods while 1 period is 1 season in baseball. The value yit is output (for firm i in time t ), whereas X it is vector of K inputs. The uit ’s are error terms and uncorrelated with regressors. The zi represents technical inefficiency over all the period and positive for all i . From equation (5), we make the model for estimating technical efficiency and ' s like following: ln yit ln x'o,it o ln x'd ,it d uit zi , i 1, , N , t 1, (6) ,T Here yit is the ratio of run made by team i over run allowed by team i in season t: Runmade,it Runallowed ,it . xo ,it is the vector of offensive stats for team i in season t , including single, double, triple, HR, steal, walk, K. xd ,it is vector of defensive stats including single allowed ,HR allowed, walk allowed, K made, error, double play. Each data values are yearly summation of each team's regular season games. o and d are coefficients for offensive and defensive stats. Most of the previous papers estimating technical efficiency in team sports use winning percentage as the value of yit . But we use run ratio as the value of yit for two reasons. At first, run production and run allowance are directly linked to offensive and defensive stats used as regressors. Their linkage is more intuitional compared to that of winning percentage to regressors. Still, run ratio has the strong correlation with winning percentage. R-square value in simple regression which use winning percentage as dependent variable and run ratios as only 3 regressor is 0.871 over 1969~2010 season data. It means our result from using run ratio as output value will show little difference compared to the analysis where winning percentage is used as output value. For stochastic frontier analysis with a Bayesian approach, we add following assumptions to equation (6): For i 1, 1. uit ,N N (0, h 1 ) and the uit ’s are independent to each other; 2 2. h Gamma( s , ) 3. uit and zi are independent to one another 4. zi Exp( z ) 5. N ( ,V ) 1 where s , , , and V are hyper parameters. Among all the assumptions, the most critical one is the assumption of zi Exp( z ) because choice for the distribution of technical efficiency always has been the most difficult part in the application of stochastic frontier analysis. Broeck et al. (1994) found the exponential is the least sensitive to changes in prior assumptions in a study of the most commonly used models and we follow that paper. From equation (6) and additional assumptions, likelihood function is given as: T N p( y | , h, z ) i 1 h2 (2 ) T 2 h ' exp ( yi X i ziT ) ( yi X i ziT ) , 2 (7) where X i [ X i1 X iT ] , X it [ x'o,it x'd ,it ] and [ o ,it d ,it ] Using likelihood function and distributional assumptions, we are ready to make posterior distribution of parameters. Starting from , we multiply likelihood function with assumption N ( ,V ) then, posterior distribution of is give as: | y, h, z N ( ,V ) , (8) 4 1 N 1 Where V V h X 'i X i , i 1 N 1 And V V h X i ' yi ziT i 1 2 For the posterior distribution of h , we use likelihood function and h Gamma( s , ) . Then distribution is given as: h | y, , z , z 1 Gamma( s , ) , (9) where TN , N 2 and s y z i i 1 X i yi ziT X i s ' i T 2 In the same way, we can find the posterior distribution of z 1 rather than z as: 1 z 1 | y, , h, z Gamma( z , ) , (10) where z 2 N z , 2 N z 1 and z N 2 zi z z . i 1 After finding all the posterior distributions, we are ready to make distribution to generate technical inefficiencies. Using Bayes' theorem, we know p( z | y, , h, z ) p( y | z, , h, z ) p( z | , h, z ) , We already have likelihood function and assumes zi Exp( z ) . So, it is easy to find posterior distribution of zi : p( zi | yi , X i , , h, z ) zi | X i y i Thz , Th 1 1 I z 0 , i (11) T where yi y t 1 T it and X i is a row vector containing the average value of each explanatory 5 variable. I ( zi 0) is the indicator function. With equation (8) ~ (11), we are ready to find estimates for coefficient 's, parameter values h , z and technical efficiencies zi 's using Gibbs sampling. 3. Data For the estimation, the data are taken from Major League Baseball over 1969~2010 season. They are collected from www.baseball-reference.com. During the period, 1972, 1981, 1994 and 1995 seasons are omitted because the number of games is reduced severely due to the labor dispute between players and MLB organization on those seasons. The number of teams has changed over the period because new teams have joined over time. Here is the summary of the number of teams Table 1:The number of teams over 1969~2010 seasons Year Number of seasons Number of teams Team added 1998~2010 13 30 AZD, TBR 1993~1997 3 (except for 94,95) 28 FLA, COL 1977~1992 15 (except for 81) 26 SEA, TOR 1969~1976 7 (except for 72) 24 Because there is different number of teams over period, the estimation procedure for panel data should be the unbalanced case. In the data set, there are three kind of values used for the analysis. 1. Output values - Run ratio. As briefly described in section 2, run ratio is the fraction composed of numerator, the summation of run produced by a team in a season and denumerator, the summation of run produced by a team in a season. Run ratio is used as output value rather than run difference because we cannot use log on run difference. 2. Input values 2.a. Offensive input for run production - Single, Double, Triple, HR, Steal, Walk, and K are used as the values. Batting average, on base percentage (OBP), slugging percentage (SLG), and OPS (OBP + SLG) are not used to prevent multicollinearity problem. 2.b. Defensive input for run allowance - Single allowed, HR allowed, Walk allowed, K made, 6 Error, Double play. Due to multiplicity concern, ERA, WHIP are omitted from inputs. Descriptive statistics of output and input data is provided in table 2. We can find some characteristic change over time in the table. At first, the number of extra base hits is increasing. There is huge increase in the average number of double (38%) and homerun (41.26%) compared to single (7.3%). Second, the number of errors decreased. While other defensive stats are standing still over long period, the number of error decreased 29.38%. Table 2:Descriptive stats of input and output data (Yearly value) 1969~2010 1969~1976 2003~2010 Mean SD Mean SD Mean SD Run ratio 1.0102 0.1428 1.0131 0.1644 1.010 0.1368 Run 726.40 88.36 666.70 73.73 758.00 76.30 Single 1444.74 83.08 1394.59 81.17 1469.46 74.63 Double 262.62 39.69 214.99 23.79 296.58 27.30 Triple 33.95 10.28 36.15 10.32 30.46 8.91 Homerun 147.30 39.83 119.61 30.70 168.96 33.24 Steal 107.53 41.36 92.99 43.03 92.25 30.48 Walk 538.53 69.64 544.93 70.34 533.25 68.26 K 953.30 144.87 855.24 104.51 1074.58 119.18 Error 122.71 23.45 143.41 20.50 101.27 15.59 Double play 151.84 18.26 154.78 15.94 152.12 16.76 Output Input To ensure the normality of data set, we perform graphical check procedure using QQ plot. In figure 1, all the variables show normality. 7 Figure 1:QQ plot of input and output data (1969~2010) 4. Estimation Results and Discussion 8 4.1 Comparison of estimation efficiency between classical approach and a Bayesian approach In classical stochastic frontier analysis, confidence interval of technical efficiency has not been provided so often until Horrace and Schmidt (2000). One of the advantages of using a Bayesian method is in the ease of getting interval of parameter. In this section, we compare: (1) The estimated values of technical efficiency (2) The estimated values of 's (3) The estimated interval of technical efficiencies (1) Estimated value of technical efficiencies Comparison of technical efficiencies over different approach of stochastic frontier analysis is presented in Table 3. Classical approach result is based on the method following Battese and Coelli (1995) and a Bayesian approach result is the method from Koop, Poirier, and Tobias (2007) Table 3:Technical efficiency of MLB teams over 2003~2010 Classical approach Tea efficienc m y ATL 0.9936 Bayesian approach rk Tea 2 efficienc m y CHW 0.9867 rk Tea 1 efficienc R Tea efficienc rk m y k m y ATL 0.9977 2 CHW 0.9812 8 CHC 0.9546 25 CLE 0.9621 2 2 CHC 0.9799 2 CLE 0.9818 0 CIN 0.9906 4 1 8 DET 0.9798 1 2 CIN 0.9815 7 DET 0.9607 1 HOU 0.9853 1 KCR 0.9879 5 LAD 0.9797 2 2 3 1 HOU 0.9633 19 KCR 0.9742 0 1 1 LAA 0.9942 1 LAD 0.9612 22 LAA 0.9997 1 MIN 0.9882 9 MIL 0.9440 27 MIN 0.9766 1 2 MIL 0.9765 2 7 NYM 0.9928 3 0 NYY 0.9786 2 NYM 0.9963 3 NYY 0.9654 5 PHI 0.9900 5 OAK 0.9887 1 6 8 PHI 9 0.9784 9 OAK 0.9840 5 PIT 0.9879 1 TEX 0.9892 6 PIT 0.9677 14 TEX 0.9825 6 SEA 0.9781 2 SDP 0.9252 29 SEA 0.9671 1 1 SDP 0.9621 2 9 SFG 0.9857 1 6 TOR 0.9835 4 STL 0.9887 7 1 5 SFG 0.9678 13 TOR 0.9621 7 COL 0.9792 2 0 STL 0.9892 4 COL 0.9505 3 WSN 0.9858 1 FLA 0.9849 3 BAL 0.9803 1 ARI 0.9528 0.9640 2 3 WSN 0.9683 12 FLA 0.9638 0.9786 8 2 1 8 BAL 0.9648 17 ARI 0.9022 0 TBR 2 6 6 9 BOS 1 2 3 0 BOS 0.9315 28 TBR 0.9552 4 2 4 In the table 3, the value of technical efficiencies depends on the method of analysis. But the rank of technical efficiencies is roughly kept over two different methods. Especially, the teams with top 3 technical efficiencies and the bottom 3 technical efficiencies do not change their rank over different method. This result shows that when we use the result from stochastic frontier analysis, more concentration should be given on the order rather than the value itself. (2) Estimated value of 's With the same method used for the results in table 3, we estimated the values of 's. The results provided in table 4. Table 4:Estimated value of 's over 2003~2010 Classical approach Bayesian approach Coefficient Estimation (sd) Coefficient Estimation (sd) Single 0.8811 (0.0904) Single 0.7747 (0.0804) Double 0.1291 (0.0438) Double 0.1727 (0.0409) Triple 0.0376 (0.0109) Triple 0.0557 (0.0108) Homerun 0.2191 (0.0189) Homerun 0.2197 (0.0186) 10 Steal 0.0189 (0.0099) Steal 0.0097 (0.0098) Walk 0.1944(0.0281) Walk 0.2053 (0.0272) K -0.0269 (0.0354) K -0.0122 (0.0337) Singleallowed -1.1633 (0.0781) Singleallowed -1.0337 (0.0730) HRallowed -0.1923 (0.0288) HRallowed -0.1825 (0.0277) Walkallowed -0.3382 (0.0264) Walkallowed -0.3401 (0.0263) Kmade 0.0138 (0.0439) Kmade 0.0620 (0.0403) Error -0.1087 (0.0207) Error -0.0974 (0.0199) Double play 0.0686 (0.0310) Double play 0.0586 (0.0298) The results from two different methods are not showing much difference. Mean and standard variable values are similar. From this result, we find that use of different approach on the analysis does not make much difference in the estimation of coefficients. (3) The estimated interval of technical efficiencies Figure 2 provides the comparison of technical inefficiencies over two different method using box plot. As you find on the result (1), the order of team’s efficiencies does not change over two different approaches. But the length of each box plot is much longer when we use classical approach. In the upper plot, all the upper bound of interval is touching technical efficiency level of 1. It means that less efficient team cannot be found with 95% confidence level. On the other hand, when a Bayesian approach is used, 23 teams have upper limit of 95% confidence interval which is less than efficiency of 1. We can say that the run ratio of those 23 teams is below frontier level and higher output can be achieved by the better strategy of manager on the field. Bottom line of this comparison is a Bayesian approach shows more efficiency when we analyze baseball data on this period that we have more power to explain the level of technical efficiency, when we use a Bayesian approach. When we also compare the efficiency level using box plot over period of 1969~1976 and 1969~2010, the conclusion is not different from this result. Figure 2:Box plot of managerial efficiencies: Classical and Bayesian approach 2003~2010 11 4.2 The analysis of estimated coefficient: Old time baseball (1969~1976) vs. Modern baseball (2003~2010) Figure 3: Distribution of coefficient of offensive and defensive stats over 2003~2010 12 13 The qualitative implications of the estimation of coefficients result do not appear to be inconsistent with the common knowledge in baseball. Figure 3, based on 4500 times of Gibbs sampling, provides the distribution of coefficients in the baseball over 2003~2010 seasons. Among offensive stats, single, homerun, and steal has critical impact on run ratio. On average from the table 2, the 10% increase of the number of single, 1469.46 10% 147 ,will increase run ratio of a team by 7.6% ( 10% 0.7662 ). Similarly, the 10% increase of homerun by 17 will lead to the or the increase the run ratio by 2.0% and the 10 % increase of steal by 54 will result in the 2.1% higher run ratio. More interesting interpretation of coefficient table can be achieved by comparing coefficients between old time baseball (1969~1976) and current baseball (2003~2010). Following table provides the number of 10 % increase in offensive stat and the resulting increase in the run ratio. Table 5: Comparison of estimated value of 's between 1969~1976 and 2003~2010 1969~1976 2003~2010 # of 10% Inc. Run ratio Inc. 10% increase Run increase Single 139 8.9% Single 147 7.2% Double 21 1.0% Double 30 1.7% Homerun 12 1.8% Homerun 17 2.2% Walk 54 3.7% Walk 53 2.1% Steal 54 0.005% Steal 53 0.1% K 86 -0.7% K 107 -0.1% Table 5 shows several interesting comparison results over time. At first, the impact of extra base hit on making higher run ratio increases in modern baseball. It is easily verified by comparing the coefficients of double and homerun. Both of them show significant increase in modern baseball. On the other hand, the coefficients of walk and strike out are reduced. The importance of getting more number of walk is reduced because it requires higher patience for batters. With higher patience, batter has less chance of making extra base hit and thus has negative effect on making higher run ratio. Similarly, the higher number of strike out is one of the costs for more production of extra base hit for batters. So, higher number of strike out can be compensated by the increased number of extra base hit in the production of higher run ratio. So, all the comparison result in table 14 5 show that extra base hits are more important in current baseball compared to the past. Small conclusion for this analysis is that scouting should be more concentrated on aggressive hitter in modern baseball. How can we apply this result on the management of baseball organization? We illustrate example using the recent free agent transaction. After 2010 season, Boston Red Sox acquired outfielder Carl Crawford at the annual average salary of $2.1 million and Washington Nationals made the contract with outfielder Jayson Worth at the annual average salary of $1.8 million. Both of them are considered to have good defensive skill so we assume their defensive skills are at the same level. On depth chart of Red Sox and Nationals, Carl Crawford is over Darnell McDonald as a left fielder and Jayson Worth is over Jerry Hairston as a right fielder. The additional number of offensive production from Crawford and Worth is in table 6. For the numbers 2010 season stats are used. The difference in salary over replacement player is also provided Table 6: Additional offensive production of Crawford and Worth over replacement player Boston Red Sox Washington Nationals Crawford. McDonald Difference Worth Hairston Difference Single 184 86 98 Single 164 105 59 Double 30 18 12 Double 46 13 33 Triple 13 3 10 Triple 2 2 0 Homerun 19 9 10 Homerun 27 10 17 Walk 46 30 16 Walk 82 31 51 Steal 47 9 38 Steal 13 9 4 K 104 85 19 K 147 54 93 Salary 21 mil 0.47 mil 20.53 mil Salary 18 mil 2 mil 16 mil From table 6, additional offensive production provided by Crawford increases the run ratio by 5.35% when coefficient estimates over 2003~2010 season are used. With same estimates, Worth make run ratio higher by 4.65%. Boston Red Sox spent 3.84 million to increase 1% higher run ratio, whileWashington Nationals invested 3.44 million for 1% higher run ratio. So, we can conclude that in the two biggest contracts for offensive player made after 2010 season, the investment from nationals is more cost efficient. 15 4.3 Steroid era and Moneyball In a book “Moneyball: the art of wining an unfair game ” (2003) by Lewis, Billy Beane, the general manager of Oakland Athletics, hires Art Howe as a manager who would understand that he is not the boss to implement the ideas of front office with full control. The idea of Beane is anything that increases the offense’s chance of making an out is bad. So, offensive strategies including sacrifice bunt, hit and run, and steal are considered to be harmful and the manager is required to have extremely passive stance in the way of operating the offense. Additionally, Beane has the model where an extra point of on-base percentage is worth three times an extra point of slugging percentage. Based on this model, Athletics front office shows an obsession for a player’s ability to get on base. Art Howe manages Athletics over 1996~2002. Figure 4 has the distribution of managerial efficiencies over this period. Figure 4: Box plot of managerial efficiencies: 1996~2002 It is shown that Athletics shows much better managerial efficiency in this period. This period is called steroid era in baseball history and shows different characteristic compared to other periods. Table 7 is provided to show the characteristic of steroid era. Table 7: Descriptive stats of input and output data (Yearly value) Steroid era Mean SD 1969~1976 Mean 16 SD 2003~2010 Mean SD Output Run ratio 1.0121 .1530 1.0131 0.1644 1.010 0.1368 Run 791.27 86.15 666.70 73.73 758.00 76.30 Single 1485.46 81.87 1394.59 81.17 1469.46 74.63 Double 290.71 26.05 214.99 23.79 296.58 27.30 Triple 30.92 8.48 36.15 10.32 30.46 8.91 Homerun 176.72 34.49 119.61 30.70 168.96 33.24 Steal 106.94 33.57 92.99 43.03 92.25 30.48 Walk 564.98 75.47 544.93 70.34 533.25 68.26 K 1055.39 91.87 855.24 104.51 1074.58 119.18 Error 114.44 17.06 143.41 20.50 101.27 15.59 Double play 152.91 19.16 154.78 15.94 152.12 16.76 Input It is easily verified that the number of double and homerun are increased over steroid era compared to other two periods. It is possible that increase of extra base hits come from the use of steroid. These differences lead to critical change on the coefficients of offensive inputs provided in table 8. Table 8: Comparison estimated value of 's over steroid era and 2003~2010 Steroid era 2003~2010 Coefficient Estimation (sd) Coefficient Estimation (sd) Single 0.9931 Single 0.7747 (0.0804) Double 0.1274 Double 0.1727 (0.0409) Triple 0.0323 Triple 0.0557 (0.0108) Homerun 0.1642 Homerun 0.2197 (0.0186) Steal 0.0364 Steal 0.0097 (0.0098) Walk 0.3064 Walk 0.2053 (0.0272) K -0.0457 K -0.0122 (0.0337) Single allowed -1.0920 Single allowed -1.0337 (0.0730) HR allowed -0.1711 HR allowed -0.1825 (0.0277) Walk allowed -0.2742 Walk allowed -0.3401 (0.0263) 17 K made 0.0381 K made 0.0620 (0.0403) Error -0.0598 Error -0.0974 (0.0199) Double play 0.0314 Double play 0.0586 (0.0298) Due to the higher number of extra base hits, the estimated coefficients of all the extra base hits are smaller in steroid era. On the other hand, the coefficients of single and walk are critically higher in steroid era. By the way, the single and walk are the two most important factors that decide the on base percentage which is regarded as the most important among all the offensive stats in the strategy of Athletics. Especially, Athletics general manager Beane praised players for their walks and criticized for swinging at pitches out of the strike zone. Their higher number of walks works extremely well in the steroid era which has very high estimated coefficient value for walk. But Beane, who has worked as general manager since 1996, is criticized for the declining performance of Athletics after steroid era. One of the reasons can be found in the reduced managerial efficiency shown in figure 5. Figure 5: Box plot of managerial efficiencies: 2003~2010 In figure 5, the managerial efficiency of Athletics is not dominating other teams like in the figure 4. As we shown from the table 8, one possible explanation for this change of performance is the increased coefficient value of extra base hits and decreased impact of walk. 4.4 The evolution of managerial efficiency: the case study of Tony LaRussa 18 This section compares the managerial efficiency of Tony LaRussa, the current manger of St. Louis Cardinals. Tony LaRussa started his job as a MLB manager in 1979 for White Sox. In his career, he has managed three MLB teams, White Sox, Athletics, and Cardinals and the length of his service is 35 years. By comparing the managerial efficiency in his career over different periods, we show that the efficiency from a same manger can vary over time. Figure 6 provide the managerial efficiency over the period over 1979~1985 seasons when Tony LaRussa managed White Sox. Figure 6: Box plot of managerial efficiencies: 1979~1985 In figure 6, the managerial efficiency of White Sox is placed among the lower class and ranked at 21st place and during 1986 season, he is acquired by Athletics. Figure 7 shows the managerial efficiency of Tony LaRussa with Athletics. Figure 7: Box plot of managerial efficiencies: 1987~1992 19 In this period, Tony LaRussa shows very good performance and Athletics is ranked at 5th place in managerial efficiency. Before Tony LaRussa era, the managerial efficiency of Athletics is ranked at 19th among team. During this period, Tony LaRussa and Athletics made three appearances on World Series out of 6 years. After 1992, team owner Walter Haas Jr. who paid even highest payroll in baseball went away and new owners of Athletics started to tighten payroll. Tony LaRussa was then already one of the most acclaimed managers on the field and acquired by Cardinals. He has been the manager of Cardinals since 1996and figure 8 provides the managerial efficiency over 1996~2010 seasons. Figure 6: Box plot of managerial efficiencies: 1996~2010 20 In this period, the managerial efficiency of Cardinals is ranked at 19th. Even though, LaRussa has made two playoff births and won the World Series in 2006 with Cardinals, but his level of managerial efficiency is not placed at the top level among the teams. From the longitudinal analysis of LaRussa's managerial efficiencies over three teams, we find that the efficiency of a manager can vary over time and it is highly affected by the characteristic of the team. 4.5 Estimation of efficiency based on daily data- the case of 2010 season. In this section, we estimate the contribution of manager to run ratio in 2010 season data. The data used for this section is daily game stats of 2010 season for 30 MLB teams. The same method is applied for this analysis, but each team has 162 data sets for estimation. To apply log transformation of data for the use of Cobb-Douglas production function, all the data are added by 1. Among stats used for the analysis of yearly data, steal and double play are omitted because they showed little impact on run ratio. Following Table 9 provides the estimation of managerial efficiency in 2010 season. Table 9:Estimated managerial efficiency in season 2010 Team efficiency Rank Team efficiency Rank ATL 0.9633 15 CHW 0.9953 10 CHC 0.9457 19 CLE 0.9456 20 CIN 0.9545 18 DET 0.9165 27 HOU 0.9982 9 KCR 0.8734 30 LAD 0.9417 21 LAA 0.9928 7 MIL 0.9132 28 MIN 0.9907 11 NYM 0.9999 3 NYY 0.9627 16 PHI 0.9996 6 OAK 0.9997 5 PIT 0.9231 24 TEX 0.9755 13 SDP 0.9999 2 SEA 0.9405 22 SFG 0.9761 12 TOR 0.9660 14 STL 0.9990 8 COL 0.9193 25 WSN 0.9532 17 FLA 0.9999 4 21 BAL 0.9129 29 ARI 0.9279 23 BOS 0.9173 26 TBR 0.9999 1 The interesting part of this result is that even though there is no official record for the efficiency of manager, this result explains the real contract situation in the 2010 season very well. Managers located among 21 ~ 30th place in the table are fired with high rate of 70%. Considering only 15% of managers ranked 1 ~ 20th are fired, this is very high rate. Especially the managers for Baltimore and Kansas City Royals, who showed the two lowest efficiencies, are fired during the season. This clear relationship between the efficiency estimation result and contract situation of managers provides possibility that efficiency estimation can be used as a tool to help the decision making of a manager for a team. 5. Concluding remarks In this paper we have estimated the efficiency of baseball manager using stochastic frontier model with a Bayesian approach. Data used for estimation are yearly data over 1969~2010 season and daily data in 2010 season. This paper combines earlier works by Aigner, Lovell, and Schmidt (1977), Schmidt and Sickles (1984) and Broeck, Koop, Osiewalski and Steel (1994). We find that stochastic frontier model provide more efficient interval for the coefficient of offensive data compared to classical method used by Horrace and Schmidt (2000). It also provide reasonable explanation for the change of team efficiency over time using the case of Oakland Athletics in steroid era and the case study of Tony LaRussa. Especially, the estimation of daily data in 2010 season provide the evidence that this estimation of efficiency has strong relationship with the work of front office in real baseball world in the direction. The managers who shows low efficiency are replaced with very high rate by front office. It means that this estimation model has power to provide tools needed by front office of team for decision making. Still, this paper has limitation in the application. After estimation of efficiencies, we try to find the way to evaluate those efficiencies in the monetary value. At first, the first trial method is the evaluation procedure from the comparison of contribution from free agent player with the contribution from manager. The reason for this method is we regard managers as the kind of free agent in team operation. But this method failed because free agent players do not show contribution better than average player in 22 baseball. So, we cannot estimate the value for their contribution. Second method tried to find the value of efficiency start from the contribution of money for higher run ratio. But this one failed early, because additional payroll does not help teams to raise their run ratio in 2010 season. So future work for this paper should be finding appropriate evaluation method for managerial efficiency. 23 References Aigner, D., Lovell, C. A. K., and P. Schmidt, 1977, Formulation and estimation of stochastic frontier production function models, Journal of Econometrics, 6, 21-37. Battese, G. E., and T. J. Coelli, 1995, A Model for technical inefficiency effects in a stochastic frontier production function for panel data, Empirical Economics, 20, 325-332. Broeck, J. V. D., Koop, G., Osiewalski, J., and M. F. J. Steel, 1994, Stochastic frontier models: A Bayesian perspective, Journal of Econometrics, 61, 273-303. Dawson, P., S. Dobson and B. Gerrard, 2000, Stochastic frontiers and the temporal structure of managerial efficiency in English soccer, Journal of Sports Economics, 1(4), 341-362. Horrace, W. C., and P. Schmidt, 2000, Multiple comparisons with the best, with economic applications, Journal of applied econometrics, 15, 1-26. Koop, G., Poirier, D. J., and J. L. Tobias, 2007, Bayesian econometric methods, Cambridge University Press, 236-239. Lewis, M., 2003, Moneyball: the art of winning an unfair game, W. W. Norton, New York, NY Porter, P., and G. Scully, 1982, Measuring managerial efficiency - the case of baseball, Southern Economic Journal, 19, 642-650. Rimler, M. S., Song, S., and D. T. Yi, 2009, Estimating production efficiency in men's NCAA college basketball: A Bayesian approach, Journal of Sports Economics, 11(3), 287-315. Ruggiero, J., Hadley, L., and E. Gustafson, 1996, in J. Fizel, E. Gustafson, L. Hadley (editors) Baseball economics: Current research, Praeger, Westport, CT Schmidt, P., and R. C. Sickles, 1984, Production frontiers and panel data, Journal of Business and Economic Statistics, 2(4), 367-374. 24