An Empirical Analysis of Factors Affecting Major League Baseball

advertisement
Busser 1
An Empirical Analysis of Factors Affecting Major League
Baseball Salaries: A Cross-Sectional Study
By: Joshua Busser
Submitted to Dr. Jacqueline Khorassani
ECON 421 – Empirical Research
April 15, 2009
Abstract:
This paper examines the effects of performance and non-performance variables attributed to Major League
Baseball players on the salary differences among a sample of 400 players in 2006. Using economic theory in
relation to specific knowledge of the Major League Baseball labor market, general concepts in labor
economics, and past empirical works related to this topic, variables are identified to create two empirical
models (one for pitchers and one for position players). The models are then estimated using the OLS
procedure. The estimation results reveal that years of experience in the league and eligibility for salary
arbitration have a positive and significant effect on salary differences among both groups of players. For
position players, the number of runs scored, and the size of the market in which his team plays have a
positive and significant effect while batting average has a negative and significant effect on salary differences.
For pitchers, the number of innings pitched, the ratio of strikeouts to walks, and eligibility for free agency
have a positive and significant effect while winning percentage has a negative and significant effect on salary
differences.
Busser 2
Introduction
The question that is the focus of this paper is “What determines the salaries of Major League
Baseball players?” Players in the league today are being signed to larger and larger salaries regularly, often
seemingly with no reason or rhyme to the sums being paid out. Fans of the game and people unfamiliar with
the labor market within the sport often wonder exactly what makes those particular players worth the
millions of dollars they are being paid annually.
This paper addresses the above question using a similar approach to other empirical studies, by
developing models containing variables that are theorized to be related to how a player’s salary is
determined. These variables can be placed into one of two categories: performance-based (factors and
statistics that are a direct result of the player’s on-field talent) or non-performance related (factors that are
either out of the player’s control, such as the market size of the team a player works for, or unrelated to
direct on the field performance by the player.)
In this paper, a background section will present some historical background on Major League
Baseball as a whole, the evolution of the labor market within the league, and how the league’s labor market
differs from other major sporting leagues operating in North America. An empirical survey of other works
similar to this paper follows, with attention being paid to what others have found significant in their studies
on salary determinants of major league baseball players. Following this, a theoretical section explains indepth the labor market in Major League Baseball and factors that can play a role in salary determination.
Next, I present my model and the rationale behind its construction, noting why I chose specific variables
and other things like data sets and functional forms. A section on the observations used in the equations and
the descriptive statistics related to those observations is presented following the presentation of the model.
Sections on multicollinearity and heteroskedasticity are then presented, addressing the problems that each
present to formulating the model and what action (if any) could or needs to be taken to account for or
Busser 3
correct these concerns. Next, the regression results for the equations developed are presented, along with a
discussion of the results and how they compare to previous works. Finally, some concluding remarks and
implications of this work will be discussed.
Background
Major League Baseball traces its roots to the late 19th century, with various leagues and teams that
existed in other forms slowly merging together into larger entities until an organized league of professional
baseball teams was formed under one unified structure. Most point to the “official” start of the league as it is
structured today in 1903 with the “National Agreement,” which for the first time defined teams and a league
structure out of which the modern Major League Baseball construct has risen. Today, Major League Baseball has
thirty clubs operating under the league’s rules and regulations, 29 of which are in the United States and the other
club operating in Toronto, Ontario, Canada. These teams are divided into two leagues, the American League and
the National League. These league designations carry little meaning today, although they were significant in the
earlier days of baseball as a sport (the leagues were, at one point, separate legal entities, although that designation
has since been done away.) Within each league are three divisions of teams, separated geographically. The method
of dividing teams in Major League Baseball over the years has changed – originally, no divisions existed in the
leagues, but as teams were added, East and West divisions were instituted in 1969, and in 1994, a third Central
division was added to each league.
Players in Major League Baseball are considered today to be independent contractors, able to seek out
employment with teams on their own (though, more commonly with the aid of a professional agent). This has,
however, not always been the case with the league. In the early days of Major League Baseball, player movement
was stifled by the existence of the Reserve Clause in player contracts. This clause gave a team the “rights” to a
player that was under contract with said team, even after the contract expired. This meant that players were
restricted from seeking out contracts with other teams on their own, as they were still technically controlled by
the team who previously had them under contract. This type of control over players was argued to be in violation
Busser 4
of the Sherman Act, which stated that two or more non-affiliated businesses engaged in interstate commerce
cannot collude to fix prices, or in this case player salaries. Major League Baseball was given a pass from this in
Federal Baseball Club v. National League, a Supreme Court case in 1922 that stated baseball was an “amusement” and
not commerce that was subject to the regulations of the Sherman Act. Resultantly, teams operating in baseball
exerted this control over players, working to suppress salaries and player movement for the following five and a
half decades.
The labor market shifted for players starting in 1953 with the formation of the current Major League
Baseball Players Association, or MLBPA. The new union struck out with the intention of raising wages for
professional baseball players in the major league labor pool, and worked to develop a Collective Bargaining
Agreement, or CBA, with the league to define what players’ rights were. The first CBA was developed in 1968,
an effort that worked to raise the minimum salary for league players to a more reasonable wage. The union
managed two years later to come to an agreement for the league to use arbitration between players and teams in
the case of salary disputes. By and large, however, arguably the greatest movement by the union was to dispel the
Reserve Clause control that teams held over players. In 1975, an arbitrator working on the cases of two players
struck down the Reserve Clause as a mechanism for controlling players, although it wasn’t until the conclusion
of the 1976 season that the power of the Clause was completely dissolved.
The act of striking down the Reserve Clause helped create the current climate by which player
movement is dictated in Major League Baseball. Much of the movement of players between teams is now through
the mechanism of free agency. Starting with the off-season of 1977, free agency allowed players who were no
longer under contract with a team to freely solicit offers from other teams (as well as the team that previously
held their contract) for their services. The growth of free agency over time has been pointed to as the driving
force of the rising salaries for Major League Baseball players in the recent decades. However Player movement as
a result of free agency, as noted in Hylan, Hage, and Treglia (1996), has not increased much for pitchers, and is
unknown as far as position players are concerned, so one could draw the conclusion that free agency hasn’t
profoundly changed player movement. The arbitration system in baseball allows teams and players to take salary
Busser 5
disputes to an independent negotiator who will rule on what payment should be given to a player. In baseball
arbitration cases, teams and players will bring their perceived best offers to the arbitrator, who will then rule in
favor of one side or the other based on their own determination of a player’s value. As mentioned above, an
arbitrator can also serve the purposes of settling disputes over the legitimacy of a contract or certain stipulations,
especially in cases where the union may feel a player has been wronged by a team or a breach of contract has
occurred.
In his paper considering the labor market in baseball, John Vrooman (1996) makes some observations as
to the nature of the labor market in baseball that illustrates these differences. He theorizes that Major League
Baseball has been operating to some degree as a monopsony, given that there is no major league anywhere else in
the world that can attract the level of talent that major league teams are able to sign. Because of this, American
baseball has been operating with a labor market that is uncontested by any other league worldwide. Many other
smaller leagues have seen a decline in their top level quality of players, as those players have migrated to the
American baseball market to seek bigger paychecks and a chance to play at a perceived higher level of
competition.
Take the case of Ichiro Suzuki, for instance. As a player in the Japanese “major leagues,” Suzuki was
arguably the best pure hitter statistically (he earned the single season record for hits and held an exceptional .353
career batting average) but was earning wages similar to that of a top level minor league player in the United
States. Suzuki was able to talk to major league teams, notably the Seattle Mariners (who have had Japanese
ownership for the last couple of decades) and earned a contract for $14 million over three years, which was far
more than he would have earned in Japan over that same period.
Despite its monopsony power, Major league baseball lacks controls on player salary that exist in other
major sports, most notably a salary cap. In other professional leagues, a cap on the total amount of salaries that
can be paid to players on a roster is in place, though the caps can be hard (as is the case with the NFL and NHL)
or soft (like the NBA, where a host of exceptions exist to allow teams to skirt the cap to some degree). In
Busser 6
addition, the NBA has a “max salary” provision in their Collective Bargaining Agreement limiting the total
compensation that a player can earn.
Empirical Survey
How can we gain a true measure of what is considered in the setting of salaries for major league
baseball players? A number of models have been developed empirically, each offering some insight into the
salary setting mechanisms that teams have used in this free agency era. In the models developed by Lackritz
(1990), Marburger (1994), Hakes and Sauer (2006), Hoagin and Velleman (1995), and Bollinger and
Hotchkiss (2003), their analysis of salaries all took a look at the independent variables that affected their
dependent variable of salary. In the case of Lackritz, salary was taken at its market value, but more recent
models presented by the other four sets of authors mentioned have used the natural logarithmic value of
salary. The data used in each of these models varies. Lackritz (1990) focused on select salary data from the
1985 and ’86 seasons. Marburger (1994) used a complete set of salary observations from 1991 and 1992.
Hoagin and Velleman used 436 salary data points from 1986. Bollinger and Hotchkiss (2003) used data from
1987 to 1993. Hakes and Sauer (2006) used data from 2000 to 2004. Table 1 summarizes the literature
referred to, along with information about the variables and data sets / estimations used in their research.
Busser 7
Table 1: Survey of Previous Empirical Works Related to this Paper
Paper Title /
Authors
“Salary Evaluation
for Professional
Baseball Players” –
Lackritz (1990)
Sample / Estimation
Method Used
1985 and 1986 Player
Data – OLS Estimation
“Bargaining Power
and the Structure of
Salaries in Major
League Baseball” –
Marburger (1994)
1991 and 1992 Player
Data – OLS Estimation
“A Critical Look at
Some Analyses of
Major League
Baseball Salaries” –
Hoaglin, Velleman
(1995)
“The Upside
Potential of Hiring
Risky Workers:
Evidence from the
Baseball Industry” –
Bollinger, Hotchkiss
(2003)
“An Economic
Evaluation of the
Moneyball
Hypothesis” –
Hakes, Sauer (2006)
436 Players’ Data from
1986 – OLS Estimation
1987 to 1993 Player
Data – OLS Estimation
Player Data from 2000
to 2004 – OLS
Estimation
* Indicates Significant at 10% or better
Dependent Independent Vars
Vars
Salary
Performance Measures:
(Offensive Average*, On Base Percentage*,
Stolen Bases, Strikeout to Walk Ratio*, [Hits /
Inning Pitched*], [Saves / Wins*], Fielding
Percentage*, Earned Runs / Innings Pitched*)
Log Salary
Hitter’s Model:
Performance Measures:
(Runs*, Home Runs, RBI*, Career Runs, Career
Home Runs, Career RBI*)
Non-Performance Measures:
(Experience, Experience^2*, Contract 1991
[dummy variable]*)
Log Salary
Log Salary
Ln Salary
Pitcher’s Model:
Performance Measures:
(Innings*, ERA*, Saves*, Career Innings*,
Career ERA*, Career Saves)
Non-Performance Measures: Same as Hitter
Performance Measures:
(Career Runs Scored / Years*, Career RBI /
Years*, √Runs scored in 1986*)
Non-Performance Measures:
(Years*,Years^2*,Years ≤7 [dummy variable]*)
Performance Measures:
Career Runs Scored / On Base Pct,
Career Hits / Career At Bats*,
Career Home Runs / Career At Bats*,
Career Walks / Career Plate Appearances*,
Career Strikeouts / Career Plate Appearances*,
Career Stolen Bases / Career On Base Pct*,
Career Caught Stealing / Career On Base Pct,
Career Fielding Runs / Career Games*
Non-Performance Measures:
(ln of Television Revenues, Stadium Capacity,
League Champion [dummy variable]*, Winning
Percentage of team*, Age*, Age^2*)
Performance Measures:
(On Base Percentage*, Slugging Percentage*,
Plate Appearances*)
Non-Performance Measures:
(Arbitration Eligible*, Free Agency Eligible*,
Catcher*, Infielder [all 4 are dummy variables]*)
Busser 8
Performance variables are a common thread between prior empirical works in estimating the salary
of a player. In the cases of Lackritz and Marburger, the variables are suited to use for both pitchers and
position players. In baseball, different performance measures are used for these two categories of players,
and in the case of these two authors, specific non-overlapping variables were used to create models for both
player categories. Lackritz used the ratio of Strikeouts to Walks, the ratio of Hits to the number of Innings
Pitched, the ratio of Earned Runs to the number of Innings Pitched, and the ratio of the number of Saves to
the number of Wins as his pitcher performance measures. Marburger’s analyses looks at the number of
innings pitched, Earned Run Average (abbreviated ERA), saves, and career marks in these three areas. In
Marburger’s case, ERA is indexed to the league average. In both cases, the use of specific measures for
pitchers acknowledges that these players have a different salary setting mechanism than position players.
Marburger’s data is considered more readily available, while Lackritz’s data is more derivative, although still
widely used by baseball statisticians as important performance measures.
All five papers considered here utilize measures for position players, although the measures differ in
number and quality. Lackritz uses a derivative measure called Offensive Average, which combines a number
of performance statistics into one indexed number. Bollinger and Hotchkiss use statistics all on the career
level, using measures all related to performance relative to career games, at bats, plate appearances or on
base percentage. Lackritz also uses on-base percentage, as do Hakes and Sauer. Most of the papers use the
idea of runs scored or batted in as a primary evaluation of performance.
In Marburger and Bollinger and Hotchkiss’ works, a measure of age or experience is used. As
players age, it is theorized by Vrooman that their salaries increase, primarily as they gain more experience
that can be considered independently from their recent performance. It is also theorized that as a player ages,
their performance is superseded by this career “body of work,” and their earnings therefore become based
more on their past performance rather than the latest statistical period. In the case of Marburger, Hakes and
Busser 9
Sauer, and Bollinger and Hotchkiss, career statistics are used to try to capture the value of an entire career
versus the latest statistics.
In these works, the outcomes that have come about differ as far as the relevance and significance of
certain variables and evaluations in the models. Lackritz (1990) found in his model that Offensive Average,
On Base Percentage, the ratio of Strikeouts to Walks, the ratio of Saves to Wins, the ratio of Earned Runs to
the number of Innings Pitched, and Fielding Percentage were significant at 10% for the American League.
Offensive Average, On Base Percentage, the ratio of Earned Runs to the number of Innings Pitched, Fielding
Percentage, and the ratio of Hits to the number of Innings Pitched were found to be significant at 10% for
the National League. In both models, Fielding Percentage and On Base Percentage had the largest absolute
coefficient values.
Marburger (1994) tested his variables in three models based on whether a player was not eligible
for arbitration or free agency, only able to go to arbitration, or able to have arbitration and free agency. For
position players, at 10% significance, the non-eligible model saw only RBI and Experience^2 as significant.
The arbitration-eligible model saw Runs, RBI, Career Home Runs, Career RBI, Fielding, Experience, and
the dummy variable as significant at the same level. For the third model, runs, RBI and the dummy variable
were the only significant statistics. Marburger also evaluated pitchers with the same three model breakdown.
For the ineligible for both model, the constant, innings, saves, career ERA, Experience^2 and the dummy
variable were significant at 10%. For the arbitration eligible model, all variables except for career saves were
found to be significant at 10%. For the free agent and arbitration eligible model, the constant, innings, saves,
career innings, career ERA, and the dummy variable were found to be significant.
Hoaglin and Velleman (1995) found in their model that all of their performance and nonperformance variables in their model were significant at 10%. This includes the career RBI variable that
Marburger found to be significant in his analysis.
Busser 10
Bollinger and Hotchkiss (2003) conducted their analysis like Marburger, in which they broke players
into three model groups based on their eligibility for free agency, having been traded, or were still under
contract with their current team. For those under initial contract with their original teams, all performance
statistics except for [Career Runs Scored / Career On-Base Percentage] and [Career Caught Stealing /
Career On-Base Percentage] were significant at 10%; among non performance stats, only the League
Champion and Number of Seasons Played variables were found to be significant. With the non-traded group,
the same performance stats were significant at 10% except for [Career Home Runs / Career At Bats], and
Age, Age^2, Winning Percentage, and Seasons Played were significant at the same level among nonperformance stats. For the free-agent eligible group, the same performance stats were significant at 10% as
the first model, but only the ln TV Revenues variable was significant among non-performance variables.
Hakes and Sauer (2006) found in their model that the variables were significant at 10% or better.
The significance of on-base percentage is consistent with Lackritz’s findings.
Part of the difficulty of making comparisons between these empirical studies is that each one uses
some general derivation of performance statistics suited to their analyses, but in general, these studies tend
to use commonly obtained variables, notably runs, on base percentage, and RBI in some fashion for position
players and ERA and innings pitched in some fashion for pitchers. These papers have also tried to connect
experience (whether it is through the use of career variables or experience/age measures) to salaries, with
mixed results.
Theoretical Analysis
Major League Baseball exists in North America as a monopsony buyer of labor. While minor
independent leagues exist in the US and Canada, and regional leagues exist in Caribbean and Central
American countries in this region, no other North American league can hire the same caliber of labor that
the MLB can. A monopsonist exists as the only buyer in a particular market for a product, and it can be
Busser 11
inferred that the product would be the highest ability players of baseball. The implications that this creates in
the labor market for baseball, and the structure of salaries, are that players would be paid below the
equilibrium wage rate that would exist if the MLB was in a competitive market – that is, competing with
several other leagues of the same size and quality for the same pool of players.
The era of the Reserve Clause, from the early 20th century until its dissolution in 1976 in Major
League Baseball, saw this market structure work to depress salaries of major league players. The
monopsonist outcome in this case would be to hire less labor than the competitive market (which, in this
case, is already finite due to the limited roster space available in the league as a whole, but economically
would be defined at the point where MLB’s marginal wage cost would intersect the labor demand curve.)
The pay that labor would receive in this type of market would be the wage set from the competitive labor
supply curve at the level of monopsonist labor employed. This wage is below the competitive wage for the
market, meaning that players’ salaries would be below what they should be earning given a competitive
market for baseball players. Figure 1 provides a visual representation of a typical monopsonist market.
Figure 1: A typical representation of a Monopsony labor market
Looking at available data, the theory of wages having been suppressed during the era of the Reserve
Clause holds true. Between 1950 and 1976, the final year of the Reserve Clause, salaries in MLB (in 2008
dollars) ranged between $120,527 and $195,793 on average - the nominal values were $13,300 in 1950 and
$51,000 in 1976. In 2008, meanwhile, the average wage was a nominal $2,824,751, which would still be
$735,795 in 1976 dollar terms, a marked increase over the wages paid at that time.
Busser 12
Though MLB still exists as a monopsonist, some factors have worked to bring salaries of players
higher over time. Elimination of the Reserve Clause and the creation of free agency has helped to elevate
salaries over the last three decades. Free agency in baseball allows the laborers (the players) to negotiate (by
themselves or with the help of a professional agent) with teams, allowing the two sides to engage in
negotiations to determine a fair contractual wage to be paid in return for a player’s labor for the team.
Not all players in MLB have agents, and observing what the salary implications for players who do
not use an agent would be an interesting angle to observe. Agents act as facilitators for communication
between players and teams in the league, but also work to cut through much of the complex legal procedure
in the contract negotiation and signing process on behalf of the player. Whether agents provide a benefit for
the players in terms of salary is unknown – the literature has not addressed this question explicitly.
Controlling for such a variable in a salary determination model could address some other effects that go into
determining what a player’s salary will be.
While free agency has worked to bring wages upward, the market for labor in baseball has given a
little more power to the players over time thanks to the strengthening of the player’s union and collective
bargaining. The MLBPA, or Major League Baseball Player’s Association, is the official representative of
players in dealing with the league, and part of the increase in wages can be attributed to some power of the
union. The maximizing nature of a labor union in baseball is focused on wages, as the amount of labor that
can be utilized is relatively fixed. Wages vary from player to player based on their individual contracts; the
union focuses on setting minimum wages for players in MLB rather than trying to create higher wages on
the upper end of the pay spectrum. In the Collective Bargaining Agreements (CBA) between the union and
the league, the minimum wage value is set explicitly – in 2008, that value was $390,000, and has been
increasing continually since the current CBA was signed in 2006.
The union has been a proponent of revenue sharing as a means of creating competitive balance, as it
has been proposed that having the league exist with a mix of large and smaller market teams, and having no
Busser 13
salary cap in place, could hinder the competitiveness of the league. General conception suggests that this
would have an ill effect on player salaries. Vrooman (1996) dubbed this idea of competitive imbalance the
“Steinbrenner effect” after the owner of the New York Yankees, a team which has consistently paid the
highest salary totals in the Majors over the last eight years in order to be competitive. Vrooman
acknowledges that this idea of larger market teams spending more to secure the services of better talented
players in the labor pool could eventually cause teams in smaller markets with more limited resources to
“settle” for second-tier players at lower wages, although the effect of this spending could be to pull the
overall average wage rates for all players in the labor pool so high that some organizations will be unable to
field a competitive team.
The player’s union has tried to counteract this through the inclusion of revenue sharing and luxury
tax provisions in the CBA, with the idea that revenue sharing of league income would allow teams in smaller
markets to benefit from the league’s success as a whole and be able to afford a competitive team on the field.
However, some owners have taken to pocketing these extra funds, instead fielding teams that may not be as
talented as could be afforded with revenue sharing funds. This activity is undertaken so that a team can profit
from the revenue sharing. The luxury tax was instituted as a way of limiting the spending of larger market
teams by placing a tax on salaries paid above a threshold. The tax money, much like the revenues shared,
would go to teams who did not exceed that threshold for the purpose of competitive balance. For owners
who view a baseball team as a business operation in the purest sense, profitability will likely always
supersede performance and product quality. Of course, should competitive imbalance widen in baseball,
those owners may be compelled to spend those earnings to field a quality minimum product to ensure
profitability in the future, hence creating a vicious cycle. This could create a two tiered wage system; in the
future, theoretically, players employed by the largest market teams would be paid wages that exceed
anything the teams in smaller markets could afford. These smaller market teams instead would pay wages for
less talented players that may be higher than would otherwise have been paid in a competitive market.
Busser 14
Arbitration is another salary setting idea that has worked to raise the amount paid to players in
recent years, and introduces another factor into the equation – the arbitrator. Many players who have been
with one club for a period of time - usually between three to six years of MLB service time – are eligible for
arbitration. There are some exceptions that allow for arbitration to be invoked after just two seasons, usually
in the case of exceptional player performance. These players may seek pay raises or contracts that are better
than those being offered to them through the use of the arbitration system. The system itself was developed
by Stevens (1966) as a means of settling salary disputes. Marburger (1994) explains in his paper that in this
system, players and teams are able to submit their contract demands / offers, and a form of final-offer
arbitration occurs, in which the arbitrator selects the salary offer they feel is more suited to the player
(based on a value that they develop for the player) and renders a decision as to the pay the player should
receive. Faurot and McAllister (1992) explain this system further, but for this paper, the importance of
arbitration is that it is another method of setting the salaries of players in this market.
Some have suggested that arbitration has a stronger importance to the market for players in baseball
and salary determination than free agency, and to some degree this can be inferred. Arbitration allows for
players and teams to determine a value for a player’s services, often using past arbitration cases and other
salaries of players with comparable performance as a reference. Most players who are eligible for arbitration
often use these values as a starting point for contract negotiations outside of the arbitration system. Having
this information available to both sides allows for a fair negotiation of salaries between players and teams,
and also allows for most cases that do go to arbitration to have a fair outcome. Not all players eligible for
arbitration proceed into the process with their teams – many use the information available to negotiate
contracts prior to reaching arbitration. Still, those salaries are typically higher than what a player previously
earned from a team.
Another situation in baseball’s labor market to consider is the player’s rent seeking behavior in the
labor market. Rent seeking behavior here can be seen as players trying to capitalize on factors other than
Busser 15
performance to obtain a higher wage rate than would otherwise be paid to them by a team. One of these
factors is the idea of “star power,” where a player can use their name and notoriety / past accomplishments
unrelated to current performance measures to obtain higher wages. A player, for instance, that has a popular
public persona may be paid a higher wage by a team (compared to a homogeneous talent) if a team sees that
player as having some extra ability to sell tickets. Players may also use their whole body of work as a player
to secure employment at high wage levels later in their career, despite their skills having deteriorated to the
point where another player may have better talent and a lower wage but is passed over for employment.
While teams are more than willing to pay rents for players who they perceive could bring them additional
some benefit off of the field, these players may not be able to produce enough on the field to justify their
salaries including high economic rents, especially if the player becomes a liability for the team in some way.
The labor union, in their practices to establish minimum salaries and various means of ensuring
teams are paying players higher wages over time, also engage in rent-seeking behavior. The union’s objectives
to raise the salaries are not associated with any increase in performance, or (as far as the owners are
concerned) any increase in the number of paying customers, but do seem to try to capitalize on increases in
total revenue the league earns on an annual basis. While this behavior has been justified in the past as being
compensation for the league working to suppress salaries through the Reserve Clause and other ownership
collusion in the 1980’s, many feel that this is not as justified today, and that the union should back off from
trying to set salary parameters for players. Because of the nature of a labor union, and the fact that baseball
is not a perfectly competitive market, there will always be some economic rent and subsequent rent seeking
behavior, but the amount of that rent might continue to rise unchecked if the union continues to be as
aggressive with regards to pay.
Marginal revenue product does play a role in salary setting, though it may not seem as important of
a consideration with some of the larger player contract amounts paid in baseball. Teams benefit from paying
players below their marginal revenue product (this is defined here the amount of revenue that the team is
Busser 16
expected to generate based on a player’s performance and their non-performance attributes, like that “star
power” idea mentioned above.) The risk that exists in paying players who have a large amount of this “star
power” the salary and the rents associated with that quality is that the player ultimately may see their value
change over the life of the new contract. A player could see their marketing power decline while their
performance remains at the same level or vice versa. The actual monetary benefit of a player to a team can
decline, causing the salary paid to the player to exceed the player’s MRP value. In developing salary figures
for a player a team would like to sign, they will actively conduct such a cost-benefit analysis and set that
MRP value, knowing that the team would like to sign the player for a salary that is less than or equal to that
MRP.
Empirical Model
In this study I use a cross sectional data set consisting of 200 position players and 200 pitchers in the
year 2006, developed using data found on the Major League Baseball web site and other sources. The models
are estimated using the Ordinary Least Squares (OLS) method. This method is a standard regression
estimation process, in which a regression line is found for the coefficient estimates working to minimize the
value of the squares of the residuals. The models’ structure is considered to be that of a semi-log function. In
this functional form, as it applies to my model, the natural logarithm works to transform the dependent
variable but the independent variables are not logarithmically transformed. The estimated coefficients in
the regression would represent an average percentage change in the dependent variable given a unit change
in the independent variable (Halcoussis 110.) The model I present for position players is:
Equation 1:
Log Salaryi = f (per Table 2) + errori
The model proposed for pitchers is:
Equation 2:
Log Salaryi = f (per Table 2) + errori
Busser 17
Table 2: Definitions for Independent Variables Used in Equations 1 and 2
and Their Expected Sign of Coefficients
Independent
Variable
Definition
Expected
Sign
Used in
Equation 1
Used in
Equation 2
Non-Performance Variables
Experience (Exp)
Number of seasons a player has in MLB
Positive
X
X
Large Market
(LgMkt)
Dummy variable equal to 1 if the player’s team is in the
top 5 MSAs, 0 otherwise
Ambiguous
X
X
Free Agency Eligible
(FAElg)
Dummy Variable equal to 1 if player was eligible for
free agency, 0 otherwise
Positive
X
X
Arbitration Eligible
(ArbElg)
Dummy variable equal to 1 if player was eligible for
salary arbitration, 0 otherwise
Positive
X
X
Performance Variables
Runs Scored (Runs)
Number of times a player scored as the result of their
own or other players’ actions
Positive
X
Runs Batted In (RBI)
Number of runs scored as the result of the player’s
offensive actions
Positive
X
On Base Percentage
(On Base)
Percentage of times a player’s at bats result in the player
reaching base
Positive
X
Hitter’s Strikeout to
Walk Ratio (HSOBB)
Ratio of times a player strikes out during their plate
appearances to the number of walks taken
Negative
X
Batting Average (BA)
Percentage of at bats that result in the player reaching
base due to a hit
Positive
X
Earned Run Average
(ERA)
Mean of earned runs given up by a pitcher per nine
innings pitched
Ambiguous
X
Innings Pitched (IP)
Number of official innings completed by a pitcher
Positive
X
Pitcher’s Strikeout to
Walk Ratio (PSOBB)
Ratio of the number of opposing batters a pitcher
strikes out to the number of walks surrendered
Positive
X
Batting Average
Against (BAA)
Percentage of at bats in which a pitcher allows batters
from the opposing team to reach base due to a hit
Negative
X
Winning Percentage
(WP)
Percentage of games won out of the total number of
decisions a pitcher was part of in a season
Positive
X
Fielding Percentage
(Field)
Rate of successful fielding chances by a player
Positive
X
X
Busser 18
The models that I am developing in finding what determines salary differences among players tie
into the theoretical ideas presented earlier in a number of ways. Performance is a primary focus of this
analysis, as the focus of the baseball labor market is on signing players whose performance is superior to that
of other peers in such a way that a team can be more successful by signing said players. As mentioned in the
empirical survey section above, there are five main components of performance (Runs, On Base Percentage,
and RBI for position players and ERA and Innings Pitched for pitchers) that are found to be statistically
significant across multiple datasets. Runs Scored and Runs Batted In are key components in illustrating the
offensive production of a player in terms of the overall game. In a baseball game, a player can have a lot of
hits without producing many RBI if his teammates are not in a position to score on his hits – those hits
become somewhat irrelevant to the outcome of the game if they don’t produce scoring. Runs scored is
another measure that has some relevance, as it implies that a team is successful enough to take advantage of
the offensive production by a player and turn that player into a run for the team. A player who has a lot of
Runs Scored not only is going to be successful offensively themselves, but acts as a proxy for the success of
the teammates that play around him, and that team itself. On Base Percentage examines the ability of a
player to reach base, but is not necessarily strictly related to offensive production, as there are multiple ways
to get on base (besides hits, a player can walk, reach on an error, reach on a dropped third strike, etc.)
On the pitching side, using earned run average makes sense as it measures the effectiveness of a
pitcher in keeping the opposing team from scoring runs. A high ERA is analogous to poor performance for
any pitcher, no matter what – a higher earned run average indicates that the pitcher is allowing the other
team to score due to the pitcher’s own performance. Because errors in fielding and other circumstances can
allow an opposing team to get runners on base and score, the ERA is important to use as a measure of
performance, as it automatically filters out “unearned runs,” those that occur independent of the pitcher.
Innings pitched also could be a significant variable, as teams are more seemingly more willing to pay for a
player who they perceive to be “durable” enough to pitch a high number of innings while contributing to the
Busser 19
success of a team. Effective pitchers who can shoulder a larger workload without injury risks command a
premium in many cases, even if their talent is less than that a comparable pitcher with questionable health.
Having dummy variables to indicate whether a player was a free agent or arbitration eligible is also
important to note, because these situations may cause the salary negotiated to be higher or lower than what
the player was seeking or what the true value of the player is based on their performance. Free agents may
get a lower wage than they otherwise would have (if they signed a brand new contract, the first year pay
might be lower than in subsequent years) and players who went through the arbitration process may have
“won” their case and earned a salary that may be more than the team was otherwise going to pay the player.
A player may still see their pay rise even if they lose their arbitration case, although the size of the difference
between the pre-arbitration salary and the post-arbitration salary can differ from case to case.
Market size is also something that can be considered as a reasonable variable to include in this model,
as it can account for some of the salary that a player is paid, especially if the player is an average player on a
large market team. A player in that case might theoretically be paid more overall than a player of similar
talent on a smaller market team due to the market size and the willingness of the owner of the larger market
team to “overpay” for the player. A situation could also arise where a great player on a smaller market team is
earning less than they otherwise would if they signed to a larger market team. A good measure to use in this
case is a dummy variable for Metropolitan Area Size – in this case, denoting if the player is on a team in one
of the 5 largest Metropolitan Statistical Areas in the United States. (In the case of players from the Toronto
Blue Jays, a Canadian team, the market size would be considered as outside of the large market range – if
Toronto were listed with the US MSAs, it would fall between 20th and 21st in size.)
In addition to those mentioned above, there are other statistics that were shown to be significant and
can be used for performance analysis. For all players, fielding percentage is shown to be significant, and does
have some relevance – if a player is unable to field their position well, they become a liability to the
performance of a team. Lower fielding percentages mean that a player will commit more errors and fail to
Busser 20
make plays that would result in getting an opposing player out. This, in turn, will increase the chances that a
player on the opposing team will score a run, thereby putting the fielding player’s team under pressure to
score an extra run in order to win a game. Poor fielding players also have a tendency to make fielding
difficult for their teammates, which can have an adverse effect on others’ performances. Also, for all players,
I believe that experience will be a measure that may come out to be significant, despite the mixed results
that were seen in the other empirical papers. I believe experience will show that as a player gains more time
in playing at the major league level, their skills may improve to some degree. Of course, after a certain point,
performance may slip due to age and decline, but overall I believe that experience will have an effect on
salary somewhat in part due to the rent seeking behavior of players.
For pitchers, I believe that winning percentage would be a relevant variable to consider, as teams
will be more willing to take a chance on signing a player who won a larger proportion of their games than
they lost over the previous season. Pitchers who lose a lot of games in a season tend to do so because of their
performance rather than the performance of those around them, and pitchers who win a lot of the games in
which they pitch often do so because of their performance. There are exceptions to this, of course, but in
general the better a talent a pitcher is, the higher his winning percentage will be.
Another performance statistic that can be significant for both position players and pitchers is the
ratio of strikeouts to walks. This statistic is meaningful for position players because it provides a measure of
positive versus negative outcomes that can occur without a hit occurring. A strikeout is the worst outcome a
player can have individually, as it prevents the player from getting on base as well as advancing the positions
of any players on base. A walk is a positive outcome, allowing the player to get on base without having to get
a hit. For a pitcher, the roles are reversed – a strikeout is a highly desirable outcome, as it prevents a player
from getting on base and prevents those on base from advancing, while a walk is an undesirable outcome.
Batting average is another two-way variable that pitchers and position players can be evaluated on.
For position players, a high batting average indicates a higher percentage of positive outcomes in reaching
Busser 21
base, while a lower batting average indicates a larger propensity to have a negative outcome at the plate. In
the pitchers’ case, batting average against indicates how successful batters are in hitting against pitchers –
higher values indicate lower success by a pitcher in getting a favorable outcome against hitters.
Descriptive Statistics
Table 3 lists the minimum, maximum, mean, median, and standard deviation values for the variables
included in the two equations. The table also lists which players in the dataset are associated with the
minimum or maximum values for each variable, if applicable, or lists “several” if multiple players’
observations share the same value.
Busser 22
Table 3: Descriptive Statistics for Included Variables
Variable
Minimum Value
Maximum Value
Mean Value
Log of Salary
(logSalary)
H: 5.579784 (Several)
H: 7.356189 (Alex Rodriguez)
H: 6.310912
P: 5.579784 (Several)
P: 7.204120 (Bartolo Colon)
P: 6.213695
Experience (Exp)
H:1 (Several)
H:18 (Omar Vizquel)
H:7.55
P: 1 (Several)
P: 22 (Jamie Moyer)
P: 7.58
Runs Scored (Runs)
7 (Kelly Shoppach)
131 (Grady Sizemore)
65.185
Runs Batted In (RBI)
6 (Luis Rodriguez)
149 (Ryan Howard)
61.39
On Base Percentage
(On Base)
.257 (Juan Uribe)
.431 (Manny Ramirez)
.344595
Hitter’s Strikeout to
Walk Ratio
(HSOBB)
.543480 (Albert Pujols)
16.500000 (Vance Wilson)
2.255439
Batting Average
(Percentage) (BA)
.197000 (Tony Clark)
.344 (Joe Mauer)
.278305
Earned Run Average
(ERA)
.890000 (Jonathan Papelbon)
8.14 (Russ Ortiz)
4.04845
Innings Pitched (IP)
50 (Several)
235 (Brandon Webb)
114.2498
Pitcher’s Strikeout
to Walk Ratio
(PSOBB)
.8571 (Chad Gaudin)
10.5455 (Ben Sheets)
2.516885
Batting Average
Against (Percentage)
(BAA)
.158 (Joe Nathan)
.333 (Russ Ortiz)
.25458
Winning Percentage
(WP)
0.00 (Several)
1.00 (Several)
.534565
Fielding Percentage
(Field)
H: .9155 (Edwin Encarnacion)
H: 1.00 (Several)
H: .979724
P: .667 (Ron Mahay)
P: 1.00 (Several)
P: .954215
Busser 23
The salary numbers indicate a healthy mix of players at all ends of the salary spectrum, from players
earning the minimum salary for a player (in both sets of observations, there were multiple players earning
the league minimum salary) to the highest paid players in real dollar terms for those years. The log of the
league average salary, for comparison, was equal to 6.4321, or about 2.89 million dollars. This value is
slightly higher than the averages in my dataset, although the league average doesn’t account for some players
who earn minimum salary. As far as the statistics are concerned, the values obtained for the averages are
right around the league averages for MLB as a whole. The league average for Batting Average was .269, eight
points below the average for the sample. The league on-base percentage was .337, seven points below the
average in this model. The overall league ERA was 4.54, which was higher than the mean of the dataset, but
doesn’t exclude outliers who may have incredibly high ERA values. The Batting Average Against measure was
somewhat below the league average of .269, but again could be explained by the exclusion of extreme
outliers from the dataset. The table also presents the names of the players, if only one individual achieved a
certain statistical value. From this, we can observe a number of different players being statistically superior
or inferior in some areas, but only one player (Russ Ortiz) repeating in more than one category. In his case,
he had the worst sampled ERA and Batting Average Against, but also was one of the players who earned a
league minimum salary. In some cases, several players were found to have the same value as the minimum or
maximum for a specific variable.
As other non performance variables are concerned, there are three dummy variables that are
included in both equations. Table 4 presents the descriptive statistics for these variables for each equation:
Busser 24
Table 4: Descriptive Statistics for Dummy Variables
Variable
Hitter % Yes
Pitcher % Yes
Arbitration Eligible (ArbElg)
16.5%
20%
Free Agency Eligible (FAElg)
17%
19%
Plays in Large Market (LgMkt)
24.5%
27%
The number of players who fit the criterion for arbitration or free agency in the sample is similar to the
overall rates observed in Major League Baseball. 151 players were arbitration eligible out of 750 possible
roster spots in the league, which calculates to around a 20% rate – similar to the rate found in the sample.
As far as free agency is concerned, the number of players who were eligible to file for free agency as major
league players was around 200, which comes out to around 26%, a higher rate than was found in the sample.
Eight out of the 30 teams fall under the umbrella of “Large Market” teams, meaning that around 27% of the
players overall in the league play for a team in a large market. This value is in line with the findings from the
samples used for both equations.
Test for Multicollinearity
In empirical studies, there is always the chance of multicollinearity occurring between variables.
Multicollinearity is the term used to describe a linear relationship, or correlation, between two or more
independent variables. In any regression, having multicollinearity can have consequences, as it can alter the
outcomes of the analysis. With multicollinearity, the standard error values for independent variables that are
affected will be high and the resulting t stats will be low, which can cause problems in hypothesis testing,
namely an increase in Type 2 errors (in which we fail to reject a null hypothesis that otherwise would be
false.)
Busser 25
To observe whether or not there is multicollinearity present in the two equations, a correlation
coefficient matrix can be developed. The values in the matrix show the correlation coefficient between two
independent variables. These coefficients take a value between -1 and 1. A value of 1 indicates perfect
positive multicollinearity – both variables move in the same direction. A value of - 1 indicates that the two
variables are perfectly correlated in the opposite directions. Optimally, we would like these values to be as
close to zero as possible. If these values are above |0.8|as a rough rule of thumb, we have a fair amount of
concern for the two variables exhibiting multicollinearity.
Tables 5 and 6 display the correlation coefficient matrices for independent variables of the two
equations.
Table 5: Correlation Coefficient Matrix for Equation 1
ARBELG
BA
EXPERIENCE
FAELG
FIELD
HSOBB
LGMKT
ONBASE
RBI
RUNS
ARBELG
BA
EXPERIENCE
FAELG
FIELD
HSOBB
LGMKT
ONBASE
RBI
RUNS
1.000
0.094
-0.039
-0.165
0.068
-0.048
-0.159
0.024
0.034
0.090
1.000
0.014
-0.010
0.090
-0.196
0.119
0.744
0.379
0.409
1.000
0.415
0.240
-0.228
-0.033
0.110
0.041
-0.001
1.000
0.059
-0.105
-0.010
-0.059
-0.034
-0.127
1.000
-0.042
-0.110
0.056
-0.007
-0.062
1.000
-0.043
-0.512
-0.314
-0.385
1.000
0.088
0.101
0.081
1.000
0.520
0.564
1.000
0.835
1.000
Busser 26
Table 6: Correlation Coefficient Matrix for Equation 2
ARBELG
BAA
ERA
ARBELG
BAA
ERA
EXPERIENCE
FAELG
FIELD
IP
LGMKT
PSOBB
WP
1.000
-0.044
-0.032
-0.131
-0.242
0.026
-0.116
-0.023
-0.030
-0.057
1.000
0.783
0.048
0.257
0.101
0.235
-0.034
-0.409
-0.220
1.000
-0.035
0.240
0.039
0.140
-0.080
-0.474
-0.438
1.000
0.409
-0.016
0.088
0.166
0.093
-0.003
1.000
0.044
0.074
0.021
-0.168
0.056
1.000
0.108
-0.036
-0.015
-0.027
1.000
-0.003
0.040
0.080
1.000
0.049
0.088
1.000
0.201
EXPERIENCE
FAELG
FIELD
IP
LGMKT
PSOBB
WP
1.000
As can be observed by the italicized values in Table 5, there is a pair of independent variables (Runs
and RBI) that exhibit a significant amount of multicollinearity, indicated by a positive correlation value over
the 0.8 threshold. While these high correlations are a concern, there are a few solutions available for use to
deal with this multicollinearity so that they do not have a detrimental effect on the regression output.
One solution to this problem would be to eliminate one of the variables if there is some sort of
redundancy present. This solution would work if both variables are essentially capturing the same
information in the equations. A second fix for multicollinearity would be to redesign the model,
transforming the variables into some other form that could be useful to the equation. This transformation
would work to put them in some form that doesn’t exhibit this multicollinearity. A third solution for this
would be to increase the sample size used for the regression of the equation. This solution is by far the best
solution if you suspect that the variables are not as highly correlated in larger samples. A fourth solution
would be to leave the highly correlated variables alone keeping in mind that the estimated coefficients on
these variables may not be reliable.
Busser 27
Given that the multicollinearity between Runs and RBI does not affect the estimated coefficients of
other independent variables, the best solution in dealing with these variables would be to leave them alone.
There is not more data from the given year to add into the sample to make it more robust. Additionally, the
two variables are different enough that we would lose some explanatory power by omitting one of them
from the equation. Transforming the variables into another form would not be especially useful, and such a
transformation would not be logically created. Leaving these variables as they are in the equations does pose
some issues with the interpretation of the results of the regression, however, as this high correlation would
likely have an effect on the size (and possibly the sign) of resulting coefficients in the regression.
Testing for Heteroskedasticity
The data set for this regression analysis contains observations of cross sectional data. This specific
type of data is susceptible to a problem called heteroskedasticity. For the Ordinary Least Squares, or OLS,
method of estimation to yield the best estimates, the error terms of Equations 1 and 2 must have a constant
variance (be homoskedastic) across observations. When heteroskedasticity is present, the variance of the
error terms varies across observations. Heteroskedasticity manifests itself in the results of a regression in a
couple of ways. It can cause the OLS results to show larger t-statistics, due to an underestimation of the
standard error of the coefficients. This can be a huge problem, as it can lead to Type 1 errors, in which null
hypotheses about the independent variables may be falsely rejected.
Heteroskedasticity can be tested for using the White test. This test uses multiple steps to create an
output that checks for several types of heteroskedasticity at one time. The test involves running the
regression on the original equations and saving the residuals (the difference between the predicted and
actual values of the dependent variable) for use in the White test. Those residuals are squared and used as
observations on the dependant variable in second regression model. The independent variables in the
second equations are the independent variables from the original equation; the squared values of each of the
Busser 28
independent variables from the original equation; and product of each two independent variables from the
original equation . The coefficient of determination ( R2) of this second equation multiplied by the number
of observations (n) has a chi square distribution. The critical chi-square is determined by the percentage of
significance we wish to use (typically, this value is 5%) and the number of degrees of freedom (which is the
number of independent variable included in the second equation). If the value of the nR2 statistic is less than
the chi square value we found, we fail to reject the null hypothesis of homoskedasticity. If the nR2 value is
greater than the chi square value, we reject the null hypothesis in favor of the alternative hypothesis, and
assume there is heteroskedasticity present.
For the hitter’s equation, the nR2 value of the White test is 54.58323. The chi square value for a 5%
level of significance and 62 degrees of freedom is equal to 81.38102. (This value was derived using the
function =CHIINV(.05,62) in Microsoft Excel 2007.) In this case, the value of nR2 is less than the chi
square critical value, so we fail to reject the null hypothesis and can assume that there is no
heteroskedasticity present in this equation.
The nR2 value of the White test for the pitcher’s equation is 92.96754. The chi square value for a
5% level of significance and 61 degrees of freedom is equal to 80.2321. (This value was determined using
the same function as in the first equation, except the value of 62 is changed to 61 [=CHIINV(.05,61)]. In
this case, the value of nR2 is greater than the chi square critical value, so we reject the null hypothesis in
favor of the alternative hypothesis. There is heteroskedasticity present in this equation.
As noted above, we have an equation (Equation 2) in which there is some level of heteroskedasticity
present. To minimize the effect of heteroskedasticty I follow the White procedure to correct the standard
errors values of the estimated coefficients in Equation 2 for heteroskedasticity. The method aims to obtain
better estimates of the standard errors to hone in the accuracy of the t-statistics. Typically, the outcome
results in there being higher calculated standard errors, but more accurate (albeit often lower) t-statistics.
Busser 29
Estimation Results
As discussed above, for the hitter’s equation (Equation 2), I can complete our estimation without
any adjustment from the model outlined previously. For the pitcher’s equation (Equation 2) I will utilize the
White method of correcting for heteroskedasticity in the estimation. Tables 7 and 8 present the results of the
hitter’s and pitcher’s equations.
Table 7: Regression Results for Hitter’s Equation
Independent
Variable
Coefficient
Value
Standard Error
T-Statistic
Significance
C
5.18553
1.479755
3.504316
99.94%
RUNS
0.006113
0.001588
3.850105
99.98%***
RBI
0.004486
0.001402
3.199664
99.84%***
ONBASE
0.846241
1.281671
0.660264
49.01%
HSOBB
-0.007213
0.019612
-0.367794
28.66%
BA
-2.957660
1.343628
-2.201249
97.11%**
FIELD
0.400801
1.510282
0.265382
20.90%
EXPERIENCE
0.073126
0.007130
10.25646
100.00%***
LGMKT
0.168603
0.057574
2.928452
99.62%***
FAELG
-0.046818
0.072723
-0.643782
47.95%
ARBELG
0.128306
0.067614
1.897620
94.07%*
Number of Observations = 200
Adjusted R2 = 0.612208
*** - Significant at 1% Error Level; ** - Significant at 5% Error Level; * - Significant at 10% Error Level
Busser 30
Table 8: Regression Results for Pitcher’s Equation
Independent
Variable
Coefficient Value
Standard Error
T-Statistic
Significance
C
5.097949
0.527007
9.673394
100.00%
ERA
-0.031529
0.046339
-0.680395
50.29%
IP
0.004257
0.000485
8.783339
100.00%***
PSOBB
0.089062
0.021297
4.181866
100.00%***
BAA
-0.106577
1.395424
-0.076376
6.08%
WP
-0.319201
0.171240
-1.864056
93.61%*
FIELD
0.259975
0.457814
0.567861
42.92%
EXPERIENCE
0.049797
0.005750
8.659995
100.00%***
LGMKT
0.038915
0.060691
0.641202
47.78%
FAELG
0.248311
0.079497
3.123540
99.79%***
ARBELG
0.236802
0.056654
4.179815
100.00%***
Number of Observations = 200
Adjusted R2 = 0.567136
*** - Significant at 1% Error Level; ** - Significant at 5% Error Level; * - Significant at 10% Error Level
With the regression results, a statistic known as Adjusted R2 is generated. This statistic is a
measurement of the “goodness of fit” of the equation, or how well the regression explains the fluctuation of
the dependent variables around its mean. The closer the adjusted R2 value is to 1, the better the fitness of
the equation. In the case of our two equations, the Adjusted R2 values are 0.612208 and 0.567136 for the
hitter’s and pitcher’s equations, respectively. This can be interpreted as meaning that the equations explain
61.77% and 56.7136% of the variation in the Log Salary around the mean Log Salary.
The regression results generated from our equations provide a few other useful statistics. One of the
most useful of these in terms of finding significance with the independent variables is the t-statistic. With the
t-statistic, we can conduct a T-test to analyze the significance of the coefficient on a particular variable. The
Busser 31
t-test works to analyze the slope coefficient estimates of each independent variable and compare that value
to zero.
This test works in the form of a hypothesis test, allowing us to use the quantitative output of the test
to make a determination about the true effect of each independent variable on the dependent variable. For
each variable, we formulate a null and alternative hypothesis. The alternative hypothesis for a given variable
is the expected sign for the slope coefficient, while the null hypothesis is the opposite sign of the expected
outcome as well as a zero value. To conduct the t-test, we take the absolute value of the t-statistic value for a
particular variable and compare it to a critical value determined by the number of degrees of freedom and
the level of significance we want to test for.
The null and alternative hypotheses mentioned earlier are for a one-sided t-test; a two sided test
exists in which the null hypothesis is zero and the alternative a non-zero value. Such a test is useful for a
variable in which we are less concerned about what sign the slope coefficient is, but rather whether the
variable has any effect at all on the dependent variable. The level of significance is the probability of type I
(rejecting a true null hypothesis) error. The degrees of freedom are determined by the equation (n-k-1), in
which n is the total number of observations used in the regression, and k being the number of independent
variables in the equation. If the absolute value of the t-statistic is greater than the critical value, we can
reject the null hypothesis in favor of the alternative hypothesis, and assume that for the level of significance
we designate, the value of the coefficient is significantly different from zero. Otherwise, we fail to reject the
null hypothesis and cannot be confident that the independent variable under study has a significant effect on
the dependent variable.
For example, we can conduct a t-test on the RUNS statistic from the hitter’s equation. We expect
that a player who scores more runs will see a positive change in the difference between his salary and that of
others. For this coefficient, we can conduct a test at a 10% level of significance, which is our minimum
threshold for a variable to be considered significant. In this case, our null hypothesis is that the RUNS
Busser 32
coefficient is negative or zero, while the alternative hypothesis is our expected positive coefficient on the
RUNS variable. Our t-statistic for the value (as given by our regression) is 3.850105. The critical value for
this regression (as a one sided test with a 10% level of significance, and (200-10-1) degrees of freedom) is
1.286047. We compare the t-statistic value to the critical value and find that our t-statistic is greater.
Therefore, we can say that the slope coefficient of the RUNS variable is significant at a 10% level of
significance.
For the equations developed, we can examine the individual coefficients and the significance of the
variables to examine what each of these results means for our dependent variable. In each case, a positive
coefficient value suggests that a one unit increase in a particular value would have a positive effect on the
Log Salary value. Table 9 examines the coefficients of the equations and what effect they have on Log Salary.
Busser 33
Table 9: Regression Coefficients and Their Effects on Log Salary
Dependent Variable
RUNS
Change in Value
Effect on Log Salary
1 Additional Run Scored
Increase by .006113
1 Additional RBI
Increase by .004486
10 Additional Points (1%)
Increase by .00846241
1 Additional Unit
Decrease by .007213
10 Additional Points (1%)
Decrease by .0295766
1 Additional Unit
Decrease by .031529
1 Additional Inning
Increase by .004257
1 Additional Unit
Increase by .089062
BAA
10 Additional Points (1%)
Decrease by .001065
WP
10 Additional Points (1%)
Decrease by .003192
FIELD
10 Additional Points (1%)
Hitters: Increase by .265382
RBI
ONBASE
HSOBB
BA
ERA
IP
PSOBB
Pitchers: Increase by .259975
EXPERIENCE
1 Additional Year
Hitters: Increase by .073126
Pitchers: Increase by .049797
LGMKT
FAELG
ARBELG
Change from Non-Large Market
team to Large Market team
Hitters: Increase by .168603
Change from Ineligible to
Eligible
Hitters: Decrease by .046818
Change from Ineligible to
Eligible
Hitters: Increase by .128306
Pitchers: Increase by .038915
Pitchers: Increase by .248311
Pitchers: Increase by .236802
Busser 34
For each equation, the coefficients of six variables are found to be significant at the 10% level or
better. In each case, the coefficients of three variables that are unique to each equation (performance
variables) are found to be significant, and the coefficients of three overlapping variables are significant.
The variable FIELD (Fielding Percentage) is found not to have a significant effect (at 10 percent or
better) on Log Salary in either equation. I expected this value to have relevance to the equation, but at the
same time understand that there is a logical reason for it possibly being insignificant. Some players are paid a
premium primarily for their defensive abilities in the field, even if their performance offensively is below
average. Judging by the result of this regression, however, this may be an exception rather than the norm.
For pitchers, the FIELD variable has a little less significance to begin with, but still had some theoretical
value to add to the equation. Again, it can be inferred by these results that fielding ability has little impact on
the overall salary differences among pitchers. Lackritz (1990) found this variable to be significant, but other
studies have been consistent in finding this variable to be insignificant.
In both equations, the coefficient of the EXPERIENCE variable is found to be highly significant. This
falls in line with the theoretical idea that players who are in the league for a longer period of time are paid a
premium for their body of work later in their career or, as discussed earlier, players with a number of years
of experience in the league are, in theory, able to collect some sort of economic rent. In both equations,
additional years of experience will increase the difference between that player’s salary and that of others.
This effect is almost twice as large for batters compared to pitchers, which could mean that those additional
years of experience are less valuable to a team in terms of pitching. This is in line with Bollinger and
Hotchkiss’ (2003) and Marburger’s (1994) findings, although in the case of the equations used here,
experience was found to be significant for all players, whereas Marburger found it to be significant only to
certain subsets of players.
In both equations, the ARBELG variable had some relevance in salary determination. Players who
are eligible for arbitration tend to see their salaries rise usually due to a combination of teams wanting to
Busser 35
secure these players under contract prior to reaching arbitration as well as the idea that the pay a player
should be earning at that point is usually higher than their current salary. In practice, arbitration is usually
avoided by teams, and contracts are usually offered to players prior to their being able to take a team into
the arbitration process. The increase being substantially larger for pitchers compared to batters is a bit
curious – there wasn’t necessarily anything mentioned in prior literature about a major difference in salary
change for pitchers compared to position players because of arbitration, and it doesn’t necessarily seem like
there are any specific reasons why this should be the case. Logically, there could be a premium placed on
pitchers who are younger – and given that arbitration is available as a tool for players with as little as two
seasons of experience, this could be the case.
The coefficient of the variable FAELG (Free Agency Eligible) is found to be significant for pitchers,
but not for hitters, in determining salary changes. Considering the mean salary for pitchers is lower than
that of batters, this is also a bit surprising. Hylan, Lage, and Treglia’s (1996) work does support the idea that
free agent eligibility works over time to increase the salaries paid to pitchers. In these cases, the theory
suggests that the free agent mechanism for pitchers can work to match players to teams offering the highest
value for their services faster. Why free agency would not be significant for position players as well is a bit
confusing, though, since the same market system is applicable to position players.
The coefficient of LGMKT was significant for position players, but not for pitchers. Market size, as
mentioned earlier, can play a role in paying higher salaries to players, as the teams can pull from larger
revenue resources and thus have more ability to pay a player a premium to secure their services. Vrooman
(1996) also finds that player movement from small market to large market teams over time, especially from
the start of 1990’s onward, has increased. Teams in large markets are taking advantage of their extra
revenues and securing talent that, theoretically, will allow them to field the most competitive team in the
league. The insignificance for pitchers may be attributed to a perceived premium paid for pitching by teams
of all market sizes. It could also indicate that while large market teams tend to spend more on salaries for
Busser 36
position players, there is a smaller gap in the spending on pitchers between those teams and their smaller
market brethren.
For batters, three of the five equation specific performance variables are found to have a significant
effect on Log Salary: RUNS, RBI, and BA. The two that were found to not have a significant effect on Log
Salary, ONBASE and HSOBB, are found to have especially low significance percentages. These findings are
in contrast to Lackritz’s (1990). I expected these two variables to have a significant effect on salary
differences, as they both emphasize rates of having positive outcomes from an at-bat, which theoretically
gives a team a better chance to score runs and thereby win more games. The coefficients on RUNS, RBI, and
BA are significant, in line with the findings of Lackritz (1990) and Marburger (1994). The coefficients on
RUNS and RBI have the expected sign and are in line with previous research, but the batting average
coefficient is nonsensical. Having the dependent variable decrease for an increase in the value makes no
sense in terms of the theory or the actual data. The signs and sizes of the coefficients of HSOBB as well as
those of the three significant variables are in line with expectations.
For pitchers, the three equation specific significant coefficients are those of variables IP, PSOBB, and
WP. However the coefficients of ERA and BAA are found to be insignificant. In the case of BAA, the variable
was found to be incredibly bad in predicting the dependent variable value. Recall that BAA seemingly has
some theoretical connection with performance, as a pitcher who has a higher BAA value allows an opposing
team more opportunities to score runs. The sign of the coefficient, for what it’s worth, is negative as
expected. The insignificance of the coefficient of ERA is a bit surprising, as it has been found to be
significant for Lackritz (1990) and for Marburger (1994). The sign of the coefficient is as expected. For the
three significant coefficients, the sign and size of the coefficient for WP is troubling, as it differs from the
expected sign. Seeing a decline in pay differnces for a higher winning percentage counters the theory that
players who win more games are more valuable. Teams may not pay much mind to winning percentage when
making hiring decisions, or may penalize some players who have a higher percentage of wins, possibly
Busser 37
hedging against some decline in form. For IP, the results are similar to those found by Marburger (1994).
For PSOBB, the results are similar to those found by Lackritz (1990). In the case of IP, the size and sign of
the estimated coefficient is consistent with t Marburger (1994) findings. As for the coefficient on PSOBB,
the sign differed from that of Lackritz (1990), although the theory supports the outcomes here as opposed
to those found in previous works. If a pitcher records more strikeouts than walks, especially in a higher ratio,
the pitcher will create more positive outcomes for his team. In theory, this would command a higher salary,
as a team could benefit from a higher percentage of these positive outcomes, and would be willing to pay a
pitcher a higher price to ensure that they are on the beneficial end of that pitcher’s performance.
Conclusion
While the models presented in this paper have their faults, and areas where the models could be
improved upon do exist, there are some positive outcomes that I can draw from the findings. The findings
show that there are some performance and non-performance variables identified in previous empirical
works that are significant in determining salary for Major League Baseball players in the contemporary
market. Given that much of the work completed by similar research used datasets of players from around
two decades before the observations used here, finding out whether the past research has some ability to
stand the test of time and changes in the labor market was one of the objectives in developing the models
presented in this paper. Indeed, given the fact that a few performance and non-performance variables were
viewed as significant leads to inferences that certain factors are universal in salary determination.
For hitters, the batting average, the number of runs scored, and the number of runs produced
variables all have a direct effect on the outcomes of games, and the significance of these is not questioned
based on the results. For pitchers, the number of innings pitched, strikeout to walk ratio, and winning
percentage are not as clearly defined measures of performance, but are nonetheless indicators of a successful
(or unsuccessful) pitcher. The finding of experience being a significant factor for both pitchers and batters
Busser 38
confirms that the market does have some sort of rent seeking behavior - a system in which players are paid
more as they remain in the league for longer periods. The existence of the free agent structure and the
arbitration systems also are shown to have significance in the setting of salaries for players, having a large
positive effect on the salary of a player eligible for one or both of those mechanisms. Finding that these
market systems have significance on how player salaries change can allow us to further explore how they
affect the labor market in baseball – specifically, how changes in how these systems work to affect player
movement and wage negotiations in the league.
As Major League Baseball is a multi-billion dollar enterprise, drawing fans, players, and revenue
from sources domestically and internationally, the salary paid to players have a large impact on the game.
The league is under some pressure to offer a product that will continue drawing fans into ballparks or to pay
for access to watch or listen to games, and the labor market implications of this study have an effect on how
teams view this labor and wage / revenue tradeoff. If wages rise too quickly, outpacing revenues, teams may
end up in poor financial situations, which could hurt the overall fiscal health of the league. Overpaying or
underpaying players may result in the wage rate for all players to rise or fall, and could affect the level of
talent willing to play in the league, and where those players would play.
As shown in this study, market size at the large end does affect salaries – batters who play for teams
in large markets have higher salary differentials than their counterparts in smaller markets. The struggle for
small market teams to keep up with rising slaries is becoming more and more visible, especially in times
where revenue streams are not increasing in cities like Cleveland, Pittsburgh, or Denver as fast as larger
markets like New York, Los Angeles, or Philadelphia (if at all.) Seeing market size factors combining with
other mechanisms like free agency could create another concern: smaller market teams with players
becoming eligible for arbitration or free agency being unable to pay the salary increases needed to retain the
players. Player retention may slowly become a concern, leading smaller market teams to theoretically have
Busser 39
to stock rosters with younger, generally inferior players to put a team on the field, while larger market
teams that can afford the higher salaries will have generally better talented players.
Because there are significant salary rises for players who are eligible to move into free agency,
arbitration, or can play in a larger market, the question of salary controls becomes relevant. As mentioned at
the outset, Major League Baseball lacks a salary cap, contract value caps, or other salary controls at the
higher end. There is a luxury tax, but teams are exceeding that on an annual basis, and the number of teams
that cross or come close to crossing this threshold is increasing annually. Does the league need to institute
controls now to head off a salary crisis over the long term? Given the size of the increases for salaries found
in this study for market mechanism variables, one could certainly make that argument. The effectiveness of
such controls would depend on where they were applied. A salary cap might work, but it would cause havoc
in the short and medium term depending on the amount. Contract limitations like maximum annual salary
or maximum pay raise scaling (in which a cap is placed on the amount the base salary can increase from one
year to the next) could work to control costs in the long term, but could lead to more players being offered
the maximum contract rather than a lower contract value that they might actually “deserve.”
Given that the models presented here leave out some factors, such as the aforementioned player
representation idea (in which it could be theorized that profession sports agents or players who act as their
own agents have some effect on their pay) or evaluating a individual player’s merchandising revenue, being
able to add some of these factors to the models could help to further explain the changes in salary for
players. Considering this paper only considers one year of player data, as do many analyses of this type,
multiple years worth of observations could be used to determine whether significant variables in one year
are significant over time, or if they are only significant in certain years. The possibility that insignificant
variables from this dataset are significant in other years also exists.
Future research in this area should focus on the effects on salaries and the labor market of some of
the aforementioned policy changes that could be brought to the labor market and project whether some of
Busser 40
these changes could control salaries in the league over time. Further identifications of factors that could
affect changes in salaries can help to explain changes over time. A number of ideas could play into salary
determination externally, and these non-performance variables may help to create a more explanative model.
Ultimately, the models presented here could be expanded upon through future research, to offer some
further observations on what factors are significant in the current state of the Major League Baseball labor
market.
Busser 41
Works Referenced
Bollinger, Christopher, and Julie Hotchkiss. "The Upside Potential of Hiring Risky Workers: Evidence from
the Baseball Industry." Journal of Labor Economics 21.4(2003): 923-44.
Faurot, David, and Stephen McAllister. "Salary Arbitration and Pre-Arbitration Negotiation in Major League
Baseball." Industrial and Labor Relations Review 45.4(1992): 697-710.
Hakes, Jahn, and Raymond Sauer. "An Economic Evaluation of the Moneyball Hypothesis." The Journal of
Economic Perspectives 20.3(2006): 173-86.
Halcoussis, Dennis. Understanding Econometrics. 1st. USA: Thompson South-Western, 2005.
Hoaglin, David, and Paul Velleman. "A Critical Look at Some Analyses of Major League Baseball Salaries."
The American Statistician 49.3(1995): 277-85.
Hylan, Timothy, Maureen J. Lage, and Michael Treglia. “The Coase Theorem, Free Agency, and Major League
Baseball: A Panel Study of Pitcher Mobility from 1961 to 1992.” Southern Economic Journal
62.4(1996): 1029-42.
Lackritz, James. "Salary Evaluation for Professional Baseball Players." The American Statistician 44.1(1990):
4-8.
Marburger, Daniel. "Bargaining Power and the Structure of Salaries in Major League Baseball." Managerial
and Decision Economics 15.5(1994): 433-41.
Sommers, Paul, and Noel Quinton. "Pay and Performance in Major League Baseball: The Case of the First
Family of Free Agents." The Journal of Human Resources 17.3(1982): 426-36.
Stevens, Carl. "Is Compulsory Arbitration Compatible with Bargaining?." Industrial Relations 5(1966): 3852.
Busser 42
Vrooman, John. "A Unified Theory of Capital and Labor Markets in Major League Baseball." Southern
Economic Journal 63.3(1997): 594-619.
Vrooman, John. "The Baseball Players' Labor Market Reconsidered." Southern Economic Journal
63.2(1996): 339-360.
Dataset information derived from data at the following sources: http://www.mlb.com/mlb/stats,
http://www.retrosheet.org, http;//www.baseball-reference.com, http://espn.go.com/mlb/stats
Download