Estimation of Managerial Efficiency in Baseball: A Bayesian Approach

advertisement
Estimation of Managerial Efficiency in Baseball: A Bayesian
Approach
Kwang-shin Choi
Arizona State University
This Version: May, 2011
Abstract
Stochastic frontier model can be used for measuring inefficiency of production. In this paper,
stochastic frontier function is used for measuring the efficiency of baseball manager. This paper
uses stochastic frontier analysis with a Bayesian approach because it shows better efficiency with
baseball dataset compared to the confidence interval from multiple comparison with the best
method used by classical approach. Using result from estimation, this paper try to answer some
interesting questions in baseball which have been discussed often but not yet answered clearly.
1. Introduction
In this article, we estimate the managerial efficiency in baseball game using stochastic frontier
function with a Bayesian approach. To explain the goal and the organization of this paper, we start
from briefly introducing the concept of stochastic frontier function and the reason of using a
Bayesian approach.
Stochastic frontier analysis is widely used in the estimation of firm efficiency and productivity.
This method is first introduced by Aigner, Lovell, and Schmidt (1977). In the paper, they suggest
an approach to the estimation of frontier production functions. They define frontier function for
firm i , and it is given by:
yi  f ( xi ;  )
(1)
where yi is the maximum level of output, xi is a vector of input and  is an unknown parameter
vector. Then they assume there exists a technical inefficiency as deviation of actual from maximum
level of output. With existence of technical inefficiency, frontier production function is given by:
yi  f ( xi ;  ) i
(2)
where 0   i  1 is a measure of firm specific inefficiency.
Stochastic frontier function comes from equation (2) by adding random shock which is outside the
control of the firm. Then it is given by:
yi  f ( xi ;  ) i eui
(3)
where eui is error term. By using error term, we can fix the critical problem shared by all the
deterministic frontier estimation models. That problem is any deviation of an observation from the
frontier must be attributed to inefficiency because deterministic model does not assume the
existence of statistical noise or measurement error.
Schmidt and Sickles (1984) provide the way to solve some difficulties originally presented in
Aigner, Lovell, and Schmidt (1977) by using panel data. In this paper, they show that if we use
panel data and the number of T is large enough, we can avoid three difficulties:
1. Consistency question in the estimation of technical inefficiency
2. Choice of distributional assumption for the distribution of technical inefficiency
3.Probable correlation between regressors and inefficiency.
A Bayesian approach to stochastic frontier analysis is introduced by Broeck, Koop, Osiewalski
and Steel (1994). They show that when T is not large enough, the difficulty from the choice of
1
distributional assumption of technical efficiency can be reduced by Bayesian approach. In the
paper, we additionally show that estimation output from a Bayesian approach shows better
efficiency compared to other approaches. We compare the interval of estimated technical
efficiency estimation from a Bayesian approach with the Multiple Comparison with the Best
interval provided by Horrace and Schmidt (2000).
Next, here are the reasons we choose the managerial efficiency in baseball game as the target
of estimation. The first reason is that the role of manager on the field is exactly same with the role
of factory manager who is controlling the technical efficiency in turning limited input into output.
The second reason is that there is no stat to describe the ability of a manager. Using technical
efficiency, we can obtain the way to compare the ability of managers.
Baseball data is used for this paper for the following reason: Baseball is the most quantified
among all the sports and baseball data is easy to be approached. In other sports, their outputs are
not clearly linked to inputs at the level of baseball. The strong linkage of data set to the real game
makes baseball the ideal target of estimation to test stochastic frontier analysis.
In the sports economics, there have been works on estimating technical efficiency of coach or
organization in team sports using stochastic frontier function. Dawson, Dobson and Gerrard (2000)
estimate efficiency of coach in English Premier League from 1992 to 1998 using panel data
stochastic frontier model. But this paper is limited by the unclear correlation between output and
input variable. Input variables used in the papers are player age, league experience, career
goals,number of previous club, goals in previous season, and player's previous division. Those
input variables are not directly related to the output: winning percentage. Rimler, Song and Yi
(2010) estimate technical efficiency in Atlantic 10 conference in NCAA Basketball game. They
argue that managerial efficiency difference is trivial and focus more on the contribution of player
stat on winning percentage.
In baseball, Porter and Scully (1982) use frontier model to estimate managerial efficiency and
Ruggiero, Hadley and Gustafson (1996) evaluate managerial efficiency using Data Envelopment
Analysis method. Both papers are using non-stochastic model so they inevitably have the
drawbacks all the deterministic frontier models share.
This paper is organized as follows. Section 2 describes the model used for estimation. Section3
explains the data we use for this paper. Section 4will answer some interesting questions in baseball
using the results from estimation. Some concluding remarks follow in Section 5.
2
2. Model
Model for estimating managerial efficiency comes from equation (3). Let's assume production
function f (; ) is Cobb-Douglas production function .Using log linear transformation, equation (3)
becomes:
ln yi  ln x 'i   uit  zi
(4)
where zi   log  i .
We will use panel data for the estimation. Model used for panel data analysis can be given easily
form equation (4) like following:
ln yit  ln x 'it   ui  zit , i  1,
, N , t  1,
(5)
,T
Here i indexes teams and t indexes time periods while 1 period is 1 season in baseball. The value
yit is output (for firm i in time t ), whereas X it is vector of K inputs. The uit ’s are error terms and
uncorrelated with regressors. The zi represents technical inefficiency over all the period and
positive for all i .
From equation (5), we make the model for estimating technical efficiency and  ' s like
following:
ln yit  ln x'o,it o  ln x'd ,it d  uit  zi , i  1,
, N , t  1,
(6)
,T
Here yit is the ratio of run made by team i over run allowed by team i in season t:
Runmade,it
Runallowed ,it
.
xo ,it is the vector of offensive stats for team i in season t , including single, double, triple, HR,
steal, walk, K. xd ,it is vector of defensive stats including single allowed ,HR allowed, walk
allowed, K made, error, double play. Each data values are yearly summation of each team's
regular season games.  o and  d are coefficients for offensive and defensive stats.
Most of the previous papers estimating technical efficiency in team sports use winning
percentage as the value of yit . But we use run ratio as the value of yit for two reasons. At first,
run production and run allowance are directly linked to offensive and defensive stats used as
regressors. Their linkage is more intuitional compared to that of winning percentage to
regressors. Still, run ratio has the strong correlation with winning percentage. R-square value in
simple regression which use winning percentage as dependent variable and run ratios as only
3
regressor is 0.871 over 1969~2010 season data. It means our result from using run ratio as output
value will show little difference compared to the analysis where winning percentage is used as
output value.
For stochastic frontier analysis with a Bayesian approach, we add following assumptions to
equation (6):
For i  1,
1. uit
,N
N (0, h 1 ) and the uit ’s are independent to each other;
2
2. h Gamma( s , )
3. uit and zi are independent to one another
4. zi
Exp( z )
5. 
N (  ,V )
1
where s , ,  , and V are hyper parameters.
Among all the assumptions, the most critical one is the assumption of zi
Exp( z ) because
choice for the distribution of technical efficiency always has been the most difficult part in the
application of stochastic frontier analysis. Broeck et al. (1994) found the exponential is the least
sensitive to changes in prior assumptions in a study of the most commonly used models and we
follow that paper.
From equation (6) and additional assumptions, likelihood function is given as:
T
N
p( y |  , h, z )  
i 1
h2
(2 )
T
2

 h

'
exp   ( yi  X i   ziT ) ( yi  X i   ziT )   ,
 2


(7)
where X i  [ X i1
X iT ] , X it  [ x'o,it
x'd ,it ] and   [  o ,it
 d ,it ]
Using likelihood function and distributional assumptions, we are ready to make posterior
distribution of parameters. Starting from  , we multiply likelihood function with assumption

N (  ,V ) then, posterior distribution of  is give as:
 | y, h, z
N (  ,V ) ,
(8)
4
1
N
 1

Where V  V  h X 'i X i  ,
i 1


N
 1

And   V  V   h X i '  yi  ziT  
i 1


2
For the posterior distribution of h , we use likelihood function and h Gamma( s , ) . Then
distribution is given as:
h | y,  , z ,  z
1
Gamma( s , ) ,
(9)
where  TN  ,
N
2
and s 
 y  z
i
i 1
 X i    yi  ziT  X i    s
'
i T
2

In the same way, we can find the posterior distribution of  z 1 rather than  z as:
1
 z 1 | y,  , h, z Gamma(  z , ) ,
(10)
where z  2 N  z ,
2 N  z
1
and  z 
N
2 zi   z  z
.
i 1
After finding all the posterior distributions, we are ready to make distribution to generate
technical inefficiencies. Using Bayes' theorem, we know
p( z | y,  , h, z )  p( y | z,  , h,  z ) p( z |  , h,  z ) ,
We already have likelihood function and assumes zi
Exp( z ) . So, it is easy to find posterior
distribution of zi :

p( zi | yi , X i ,  , h, z )   zi | X i   y i  Thz  , Th 
1
1
 I  z  0 ,
i
(11)
T
where
yi 
y
t 1
T
it
and X i is a row vector containing the average value of each explanatory
5
variable. I ( zi  0) is the indicator function.
With equation (8) ~ (11), we are ready to find estimates for coefficient  's, parameter values
h ,  z and technical efficiencies zi 's using Gibbs sampling.
3. Data
For the estimation, the data are taken from Major League Baseball over 1969~2010 season. They
are collected from www.baseball-reference.com. During the period, 1972, 1981, 1994 and 1995
seasons are omitted because the number of games is reduced severely due to the labor dispute
between players and MLB organization on those seasons. The number of teams has changed over
the period because new teams have joined over time. Here is the summary of the number of teams
Table 1:The number of teams over 1969~2010 seasons
Year
Number of seasons
Number of teams
Team added
1998~2010
13
30
AZD, TBR
1993~1997
3 (except for 94,95)
28
FLA, COL
1977~1992
15 (except for 81)
26
SEA, TOR
1969~1976
7 (except for 72)
24
Because there is different number of teams over period, the estimation procedure for panel data
should be the unbalanced case.
In the data set, there are three kind of values used for the analysis.
1. Output values - Run ratio. As briefly described in section 2, run ratio is the fraction composed
of numerator, the summation of run produced by a team in a season and denumerator, the
summation of run produced by a team in a season. Run ratio is used as output value rather than run
difference because we cannot use log on run difference.
2. Input values
2.a. Offensive input for run production - Single, Double, Triple, HR, Steal, Walk, and K are used
as the values. Batting average, on base percentage (OBP), slugging percentage (SLG), and OPS
(OBP + SLG) are not used to prevent multicollinearity problem.
2.b. Defensive input for run allowance - Single allowed, HR allowed, Walk allowed, K made,
6
Error, Double play. Due to multiplicity concern, ERA, WHIP are omitted from inputs.
Descriptive statistics of output and input data is provided in table 2. We can find some
characteristic change over time in the table. At first, the number of extra base hits is increasing.
There is huge increase in the average number of double (38%) and homerun (41.26%) compared
to single (7.3%). Second, the number of errors decreased. While other defensive stats are
standing still over long period, the number of error decreased 29.38%.
Table 2:Descriptive stats of input and output data (Yearly value)
1969~2010
1969~1976
2003~2010
Mean
SD
Mean
SD
Mean
SD
Run ratio
1.0102
0.1428
1.0131
0.1644
1.010
0.1368
Run
726.40
88.36
666.70
73.73
758.00
76.30
Single
1444.74
83.08
1394.59
81.17
1469.46
74.63
Double
262.62
39.69
214.99
23.79
296.58
27.30
Triple
33.95
10.28
36.15
10.32
30.46
8.91
Homerun
147.30
39.83
119.61
30.70
168.96
33.24
Steal
107.53
41.36
92.99
43.03
92.25
30.48
Walk
538.53
69.64
544.93
70.34
533.25
68.26
K
953.30
144.87
855.24
104.51
1074.58
119.18
Error
122.71
23.45
143.41
20.50
101.27
15.59
Double play
151.84
18.26
154.78
15.94
152.12
16.76
Output
Input
To ensure the normality of data set, we perform graphical check procedure using QQ plot. In figure
1, all the variables show normality.
7
Figure 1:QQ plot of input and output data (1969~2010)
4. Estimation Results and Discussion
8
4.1 Comparison of estimation efficiency between classical approach and a Bayesian approach
In classical stochastic frontier analysis, confidence interval of technical efficiency has not been
provided so often until Horrace and Schmidt (2000). One of the advantages of using a Bayesian
method is in the ease of getting interval of parameter. In this section, we compare:
(1) The estimated values of technical efficiency
(2) The estimated values of  's
(3) The estimated interval of technical efficiencies
(1) Estimated value of technical efficiencies
Comparison of technical efficiencies over different approach of stochastic frontier analysis is
presented in Table 3. Classical approach result is based on the method following Battese and Coelli
(1995) and a Bayesian approach result is the method from Koop, Poirier, and Tobias (2007)
Table 3:Technical efficiency of MLB teams over 2003~2010
Classical approach
Tea
efficienc
m
y
ATL
0.9936
Bayesian approach
rk Tea
2
efficienc
m
y
CHW
0.9867
rk Tea
1
efficienc
R
Tea
efficienc
rk
m
y
k
m
y
ATL
0.9977
2
CHW
0.9812
8
CHC
0.9546
25
CLE
0.9621
2
2
CHC
0.9799
2
CLE
0.9818
0
CIN
0.9906
4
1
8
DET
0.9798
1
2
CIN
0.9815
7
DET
0.9607
1
HOU
0.9853
1
KCR
0.9879
5
LAD
0.9797
2
2
3
1
HOU
0.9633
19
KCR
0.9742
0
1
1
LAA
0.9942
1
LAD
0.9612
22
LAA
0.9997
1
MIN
0.9882
9
MIL
0.9440
27
MIN
0.9766
1
2
MIL
0.9765
2
7
NYM
0.9928
3
0
NYY
0.9786
2
NYM
0.9963
3
NYY
0.9654
5
PHI
0.9900
5
OAK
0.9887
1
6
8
PHI
9
0.9784
9
OAK
0.9840
5
PIT
0.9879
1
TEX
0.9892
6
PIT
0.9677
14
TEX
0.9825
6
SEA
0.9781
2
SDP
0.9252
29
SEA
0.9671
1
1
SDP
0.9621
2
9
SFG
0.9857
1
6
TOR
0.9835
4
STL
0.9887
7
1
5
SFG
0.9678
13
TOR
0.9621
7
COL
0.9792
2
0
STL
0.9892
4
COL
0.9505
3
WSN
0.9858
1
FLA
0.9849
3
BAL
0.9803
1
ARI
0.9528
0.9640
2
3
WSN
0.9683
12
FLA
0.9638
0.9786
8
2
1
8
BAL
0.9648
17
ARI
0.9022
0
TBR
2
6
6
9
BOS
1
2
3
0
BOS
0.9315
28
TBR
0.9552
4
2
4
In the table 3, the value of technical efficiencies depends on the method of analysis. But the rank
of technical efficiencies is roughly kept over two different methods. Especially, the teams with top
3 technical efficiencies and the bottom 3 technical efficiencies do not change their rank over
different method. This result shows that when we use the result from stochastic frontier analysis,
more concentration should be given on the order rather than the value itself.
(2) Estimated value of  's
With the same method used for the results in table 3, we estimated the values of  's. The results
provided in table 4.
Table 4:Estimated value of  's over 2003~2010
Classical approach
Bayesian approach
Coefficient
Estimation (sd)
Coefficient
Estimation (sd)
Single
0.8811 (0.0904)
Single
0.7747 (0.0804)
Double
0.1291 (0.0438)
Double
0.1727 (0.0409)
Triple
0.0376 (0.0109)
Triple
0.0557 (0.0108)
Homerun
0.2191 (0.0189)
Homerun
0.2197 (0.0186)
10
Steal
0.0189 (0.0099)
Steal
0.0097 (0.0098)
Walk
0.1944(0.0281)
Walk
0.2053 (0.0272)
K
-0.0269 (0.0354)
K
-0.0122 (0.0337)
Singleallowed
-1.1633 (0.0781)
Singleallowed
-1.0337 (0.0730)
HRallowed
-0.1923 (0.0288)
HRallowed
-0.1825 (0.0277)
Walkallowed
-0.3382 (0.0264)
Walkallowed
-0.3401 (0.0263)
Kmade
0.0138 (0.0439)
Kmade
0.0620 (0.0403)
Error
-0.1087 (0.0207)
Error
-0.0974 (0.0199)
Double play
0.0686 (0.0310)
Double play
0.0586 (0.0298)
The results from two different methods are not showing much difference. Mean and standard
variable values are similar. From this result, we find that use of different approach on the analysis
does not make much difference in the estimation of coefficients.
(3) The estimated interval of technical efficiencies
Figure 2 provides the comparison of technical inefficiencies over two different method using box
plot. As you find on the result (1), the order of team’s efficiencies does not change over two
different approaches. But the length of each box plot is much longer when we use classical
approach. In the upper plot, all the upper bound of interval is touching technical efficiency level
of 1. It means that less efficient team cannot be found with 95% confidence level. On the other
hand, when a Bayesian approach is used, 23 teams have upper limit of 95% confidence interval
which is less than efficiency of 1. We can say that the run ratio of those 23 teams is below frontier
level and higher output can be achieved by the better strategy of manager on the field. Bottom line
of this comparison is a Bayesian approach shows more efficiency when we analyze baseball data
on this period that we have more power to explain the level of technical efficiency, when we use
a Bayesian approach. When we also compare the efficiency level using box plot over period of
1969~1976 and 1969~2010, the conclusion is not different from this result.
Figure 2:Box plot of managerial efficiencies: Classical and Bayesian approach 2003~2010
11
4.2 The analysis of estimated coefficient: Old time baseball (1969~1976) vs. Modern baseball
(2003~2010)
Figure 3: Distribution of coefficient of offensive and defensive stats over 2003~2010
12
13
The qualitative implications of the estimation of coefficients result do not appear to be inconsistent
with the common knowledge in baseball. Figure 3, based on 4500 times of Gibbs sampling,
provides the distribution of coefficients in the baseball over 2003~2010 seasons.
Among offensive stats, single, homerun, and steal has critical impact on run ratio. On average from
the table 2, the 10% increase of the number of single, 1469.46 10%  147 ,will increase run ratio
of a team by 7.6% ( 10%  0.7662 ). Similarly, the 10% increase of homerun by 17 will lead to the
or the increase the run ratio by 2.0% and the 10 % increase of steal by 54 will result in the 2.1%
higher run ratio.
More interesting interpretation of coefficient table can be achieved by comparing coefficients
between old time baseball (1969~1976) and current baseball (2003~2010). Following table
provides the number of 10 % increase in offensive stat and the resulting increase in the run ratio.
Table 5: Comparison of estimated value of  's between 1969~1976 and 2003~2010
1969~1976
2003~2010
# of 10% Inc. Run ratio Inc.
10% increase Run increase
Single
139
8.9%
Single
147
7.2%
Double
21
1.0%
Double
30
1.7%
Homerun
12
1.8%
Homerun
17
2.2%
Walk
54
3.7%
Walk
53
2.1%
Steal
54
0.005%
Steal
53
0.1%
K
86
-0.7%
K
107
-0.1%
Table 5 shows several interesting comparison results over time. At first, the impact of extra base
hit on making higher run ratio increases in modern baseball. It is easily verified by comparing the
coefficients of double and homerun. Both of them show significant increase in modern baseball.
On the other hand, the coefficients of walk and strike out are reduced. The importance of getting
more number of walk is reduced because it requires higher patience for batters. With higher
patience, batter has less chance of making extra base hit and thus has negative effect on making
higher run ratio. Similarly, the higher number of strike out is one of the costs for more production
of extra base hit for batters. So, higher number of strike out can be compensated by the increased
number of extra base hit in the production of higher run ratio. So, all the comparison result in table
14
5 show that extra base hits are more important in current baseball compared to the past. Small
conclusion for this analysis is that scouting should be more concentrated on aggressive hitter in
modern baseball.
How can we apply this result on the management of baseball organization? We illustrate example
using the recent free agent transaction. After 2010 season, Boston Red Sox acquired outfielder
Carl Crawford at the annual average salary of $2.1 million and Washington Nationals made the
contract with outfielder Jayson Worth at the annual average salary of $1.8 million. Both of them
are considered to have good defensive skill so we assume their defensive skills are at the same
level. On depth chart of Red Sox and Nationals, Carl Crawford is over Darnell McDonald as a left
fielder and Jayson Worth is over Jerry Hairston as a right fielder. The additional number of
offensive production from Crawford and Worth is in table 6. For the numbers 2010 season stats
are used. The difference in salary over replacement player is also provided
Table 6: Additional offensive production of Crawford and Worth over replacement player
Boston Red Sox
Washington Nationals
Crawford.
McDonald
Difference
Worth
Hairston
Difference
Single
184
86
98
Single
164
105
59
Double
30
18
12
Double
46
13
33
Triple
13
3
10
Triple
2
2
0
Homerun
19
9
10
Homerun
27
10
17
Walk
46
30
16
Walk
82
31
51
Steal
47
9
38
Steal
13
9
4
K
104
85
19
K
147
54
93
Salary
21 mil
0.47 mil
20.53 mil
Salary
18 mil
2 mil
16 mil
From table 6, additional offensive production provided by Crawford increases the run ratio by 5.35%
when coefficient estimates over 2003~2010 season are used. With same estimates, Worth make
run ratio higher by 4.65%. Boston Red Sox spent 3.84 million to increase 1% higher run ratio,
whileWashington Nationals invested 3.44 million for 1% higher run ratio. So, we can conclude
that in the two biggest contracts for offensive player made after 2010 season, the investment from
nationals is more cost efficient.
15
4.3 Steroid era and Moneyball
In a book “Moneyball: the art of wining an unfair game ” (2003) by Lewis, Billy Beane, the general
manager of Oakland Athletics, hires Art Howe as a manager who would understand that he is not
the boss to implement the ideas of front office with full control. The idea of Beane is anything that
increases the offense’s chance of making an out is bad. So, offensive strategies including sacrifice
bunt, hit and run, and steal are considered to be harmful and the manager is required to have
extremely passive stance in the way of operating the offense. Additionally, Beane has the model
where an extra point of on-base percentage is worth three times an extra point of slugging
percentage. Based on this model, Athletics front office shows an obsession for a player’s ability to
get on base. Art Howe manages Athletics over 1996~2002. Figure 4 has the distribution of
managerial efficiencies over this period.
Figure 4: Box plot of managerial efficiencies: 1996~2002
It is shown that Athletics shows much better managerial efficiency in this period. This period is
called steroid era in baseball history and shows different characteristic compared to other periods.
Table 7 is provided to show the characteristic of steroid era.
Table 7: Descriptive stats of input and output data (Yearly value)
Steroid era
Mean
SD
1969~1976
Mean
16
SD
2003~2010
Mean
SD
Output
Run ratio
1.0121
.1530
1.0131
0.1644
1.010
0.1368
Run
791.27
86.15
666.70
73.73
758.00
76.30
Single
1485.46
81.87
1394.59
81.17
1469.46
74.63
Double
290.71
26.05
214.99
23.79
296.58
27.30
Triple
30.92
8.48
36.15
10.32
30.46
8.91
Homerun
176.72
34.49
119.61
30.70
168.96
33.24
Steal
106.94
33.57
92.99
43.03
92.25
30.48
Walk
564.98
75.47
544.93
70.34
533.25
68.26
K
1055.39
91.87
855.24
104.51
1074.58
119.18
Error
114.44
17.06
143.41
20.50
101.27
15.59
Double play
152.91
19.16
154.78
15.94
152.12
16.76
Input
It is easily verified that the number of double and homerun are increased over steroid era compared
to other two periods. It is possible that increase of extra base hits come from the use of steroid.
These differences lead to critical change on the coefficients of offensive inputs provided in table
8.
Table 8: Comparison estimated value of  's over steroid era and 2003~2010
Steroid era
2003~2010
Coefficient
Estimation (sd)
Coefficient
Estimation (sd)
Single
0.9931
Single
0.7747 (0.0804)
Double
0.1274
Double
0.1727 (0.0409)
Triple
0.0323
Triple
0.0557 (0.0108)
Homerun
0.1642
Homerun
0.2197 (0.0186)
Steal
0.0364
Steal
0.0097 (0.0098)
Walk
0.3064
Walk
0.2053 (0.0272)
K
-0.0457
K
-0.0122 (0.0337)
Single allowed
-1.0920
Single allowed
-1.0337 (0.0730)
HR allowed
-0.1711
HR allowed
-0.1825 (0.0277)
Walk allowed
-0.2742
Walk allowed
-0.3401 (0.0263)
17
K made
0.0381
K made
0.0620 (0.0403)
Error
-0.0598
Error
-0.0974 (0.0199)
Double play
0.0314
Double play
0.0586 (0.0298)
Due to the higher number of extra base hits, the estimated coefficients of all the extra base hits are
smaller in steroid era. On the other hand, the coefficients of single and walk are critically higher
in steroid era. By the way, the single and walk are the two most important factors that decide the
on base percentage which is regarded as the most important among all the offensive stats in the
strategy of Athletics. Especially, Athletics general manager Beane praised players for their walks
and criticized for swinging at pitches out of the strike zone. Their higher number of walks works
extremely well in the steroid era which has very high estimated coefficient value for walk. But
Beane, who has worked as general manager since 1996, is criticized for the declining performance
of Athletics after steroid era. One of the reasons can be found in the reduced managerial efficiency
shown in figure 5.
Figure 5: Box plot of managerial efficiencies: 2003~2010
In figure 5, the managerial efficiency of Athletics is not dominating other teams like in the figure
4. As we shown from the table 8, one possible explanation for this change of performance is the
increased coefficient value of extra base hits and decreased impact of walk.
4.4 The evolution of managerial efficiency: the case study of Tony LaRussa
18
This section compares the managerial efficiency of Tony LaRussa, the current manger of St. Louis
Cardinals. Tony LaRussa started his job as a MLB manager in 1979 for White Sox. In his career,
he has managed three MLB teams, White Sox, Athletics, and Cardinals and the length of his
service is 35 years. By comparing the managerial efficiency in his career over different periods,
we show that the efficiency from a same manger can vary over time. Figure 6 provide the
managerial efficiency over the period over 1979~1985 seasons when Tony LaRussa managed
White Sox.
Figure 6: Box plot of managerial efficiencies: 1979~1985
In figure 6, the managerial efficiency of White Sox is placed among the lower class and ranked at
21st place and during 1986 season, he is acquired by Athletics. Figure 7 shows the managerial
efficiency of Tony LaRussa with Athletics.
Figure 7: Box plot of managerial efficiencies: 1987~1992
19
In this period, Tony LaRussa shows very good performance and Athletics is ranked at 5th place in
managerial efficiency. Before Tony LaRussa era, the managerial efficiency of Athletics is ranked
at 19th among team. During this period, Tony LaRussa and Athletics made three appearances on
World Series out of 6 years. After 1992, team owner Walter Haas Jr. who paid even highest payroll
in baseball went away and new owners of Athletics started to tighten payroll. Tony LaRussa was
then already one of the most acclaimed managers on the field and acquired by Cardinals. He has
been the manager of Cardinals since 1996and figure 8 provides the managerial efficiency over
1996~2010 seasons.
Figure 6: Box plot of managerial efficiencies: 1996~2010
20
In this period, the managerial efficiency of Cardinals is ranked at 19th. Even though, LaRussa has
made two playoff births and won the World Series in 2006 with Cardinals, but his level of
managerial efficiency is not placed at the top level among the teams.
From the longitudinal analysis of LaRussa's managerial efficiencies over three teams, we find that
the efficiency of a manager can vary over time and it is highly affected by the characteristic of the
team.
4.5 Estimation of efficiency based on daily data- the case of 2010 season.
In this section, we estimate the contribution of manager to run ratio in 2010 season data. The data
used for this section is daily game stats of 2010 season for 30 MLB teams. The same method is
applied for this analysis, but each team has 162 data sets for estimation. To apply log
transformation of data for the use of Cobb-Douglas production function, all the data are added by
1. Among stats used for the analysis of yearly data, steal and double play are omitted because they
showed little impact on run ratio.
Following Table 9 provides the estimation of managerial efficiency in 2010 season.
Table 9:Estimated managerial efficiency in season 2010
Team
efficiency
Rank
Team
efficiency
Rank
ATL
0.9633
15
CHW
0.9953
10
CHC
0.9457
19
CLE
0.9456
20
CIN
0.9545
18
DET
0.9165
27
HOU
0.9982
9
KCR
0.8734
30
LAD
0.9417
21
LAA
0.9928
7
MIL
0.9132
28
MIN
0.9907
11
NYM
0.9999
3
NYY
0.9627
16
PHI
0.9996
6
OAK
0.9997
5
PIT
0.9231
24
TEX
0.9755
13
SDP
0.9999
2
SEA
0.9405
22
SFG
0.9761
12
TOR
0.9660
14
STL
0.9990
8
COL
0.9193
25
WSN
0.9532
17
FLA
0.9999
4
21
BAL
0.9129
29
ARI
0.9279
23
BOS
0.9173
26
TBR
0.9999
1
The interesting part of this result is that even though there is no official record for the efficiency
of manager, this result explains the real contract situation in the 2010 season very well. Managers
located among 21 ~ 30th place in the table are fired with high rate of 70%. Considering only 15%
of managers ranked 1 ~ 20th are fired, this is very high rate. Especially the managers for Baltimore
and Kansas City Royals, who showed the two lowest efficiencies, are fired during the season. This
clear relationship between the efficiency estimation result and contract situation of managers
provides possibility that efficiency estimation can be used as a tool to help the decision making of
a manager for a team.
5. Concluding remarks
In this paper we have estimated the efficiency of baseball manager using stochastic frontier model
with a Bayesian approach. Data used for estimation are yearly data over 1969~2010 season and
daily data in 2010 season. This paper combines earlier works by Aigner, Lovell, and Schmidt
(1977), Schmidt and Sickles (1984) and Broeck, Koop, Osiewalski and Steel (1994). We find that
stochastic frontier model provide more efficient interval for the coefficient of offensive data
compared to classical method used by Horrace and Schmidt (2000). It also provide reasonable
explanation for the change of team efficiency over time using the case of Oakland Athletics in
steroid era and the case study of Tony LaRussa. Especially, the estimation of daily data in 2010
season provide the evidence that this estimation of efficiency has strong relationship with the work
of front office in real baseball world in the direction. The managers who shows low efficiency are
replaced with very high rate by front office. It means that this estimation model has power to
provide tools needed by front office of team for decision making. Still, this paper has limitation in
the application. After estimation of efficiencies, we try to find the way to evaluate those
efficiencies in the monetary value. At first, the first trial method is the evaluation procedure from
the comparison of contribution from free agent player with the contribution from manager. The
reason for this method is we regard managers as the kind of free agent in team operation. But this
method failed because free agent players do not show contribution better than average player in
22
baseball. So, we cannot estimate the value for their contribution. Second method tried to find the
value of efficiency start from the contribution of money for higher run ratio. But this one failed
early, because additional payroll does not help teams to raise their run ratio in 2010 season. So
future work for this paper should be finding appropriate evaluation method for managerial
efficiency.
23
References
Aigner, D., Lovell, C. A. K., and P. Schmidt, 1977, Formulation and estimation of stochastic
frontier production function models, Journal of Econometrics, 6, 21-37.
Battese, G. E., and T. J. Coelli, 1995, A Model for technical inefficiency effects in a stochastic
frontier production function for panel data, Empirical Economics, 20, 325-332.
Broeck, J. V. D., Koop, G., Osiewalski, J., and M. F. J. Steel, 1994, Stochastic frontier models:
A Bayesian perspective, Journal of Econometrics, 61, 273-303.
Dawson, P., S. Dobson and B. Gerrard, 2000, Stochastic frontiers and the temporal structure of
managerial efficiency in English soccer, Journal of Sports Economics, 1(4), 341-362.
Horrace, W. C., and P. Schmidt, 2000, Multiple comparisons with the best, with economic
applications, Journal of applied econometrics, 15, 1-26.
Koop, G., Poirier, D. J., and J. L. Tobias, 2007, Bayesian econometric methods, Cambridge
University Press, 236-239.
Lewis, M., 2003, Moneyball: the art of winning an unfair game, W. W. Norton, New York, NY
Porter, P., and G. Scully, 1982, Measuring managerial efficiency - the case of baseball, Southern
Economic Journal, 19, 642-650.
Rimler, M. S., Song, S., and D. T. Yi, 2009, Estimating production efficiency in men's NCAA
college basketball: A Bayesian approach, Journal of Sports Economics, 11(3), 287-315.
Ruggiero, J., Hadley, L., and E. Gustafson, 1996, in J. Fizel, E. Gustafson, L. Hadley (editors)
Baseball economics: Current research, Praeger, Westport, CT
Schmidt, P., and R. C. Sickles, 1984, Production frontiers and panel data, Journal of Business
and Economic Statistics, 2(4), 367-374.
24
Download