Busser 1 An Empirical Analysis of Factors Affecting Major League Baseball Salaries: A Cross-Sectional Study By: Joshua Busser Submitted to Dr. Jacqueline Khorassani ECON 421 – Empirical Research April 15, 2009 Abstract: This paper examines the effects of performance and non-performance variables attributed to Major League Baseball players on the salary differences among a sample of 400 players in 2006. Using economic theory in relation to specific knowledge of the Major League Baseball labor market, general concepts in labor economics, and past empirical works related to this topic, variables are identified to create two empirical models (one for pitchers and one for position players). The models are then estimated using the OLS procedure. The estimation results reveal that years of experience in the league and eligibility for salary arbitration have a positive and significant effect on salary differences among both groups of players. For position players, the number of runs scored, and the size of the market in which his team plays have a positive and significant effect while batting average has a negative and significant effect on salary differences. For pitchers, the number of innings pitched, the ratio of strikeouts to walks, and eligibility for free agency have a positive and significant effect while winning percentage has a negative and significant effect on salary differences. Busser 2 Introduction The question that is the focus of this paper is “What determines the salaries of Major League Baseball players?” Players in the league today are being signed to larger and larger salaries regularly, often seemingly with no reason or rhyme to the sums being paid out. Fans of the game and people unfamiliar with the labor market within the sport often wonder exactly what makes those particular players worth the millions of dollars they are being paid annually. This paper addresses the above question using a similar approach to other empirical studies, by developing models containing variables that are theorized to be related to how a player’s salary is determined. These variables can be placed into one of two categories: performance-based (factors and statistics that are a direct result of the player’s on-field talent) or non-performance related (factors that are either out of the player’s control, such as the market size of the team a player works for, or unrelated to direct on the field performance by the player.) In this paper, a background section will present some historical background on Major League Baseball as a whole, the evolution of the labor market within the league, and how the league’s labor market differs from other major sporting leagues operating in North America. An empirical survey of other works similar to this paper follows, with attention being paid to what others have found significant in their studies on salary determinants of major league baseball players. Following this, a theoretical section explains indepth the labor market in Major League Baseball and factors that can play a role in salary determination. Next, I present my model and the rationale behind its construction, noting why I chose specific variables and other things like data sets and functional forms. A section on the observations used in the equations and the descriptive statistics related to those observations is presented following the presentation of the model. Sections on multicollinearity and heteroskedasticity are then presented, addressing the problems that each present to formulating the model and what action (if any) could or needs to be taken to account for or Busser 3 correct these concerns. Next, the regression results for the equations developed are presented, along with a discussion of the results and how they compare to previous works. Finally, some concluding remarks and implications of this work will be discussed. Background Major League Baseball traces its roots to the late 19th century, with various leagues and teams that existed in other forms slowly merging together into larger entities until an organized league of professional baseball teams was formed under one unified structure. Most point to the “official” start of the league as it is structured today in 1903 with the “National Agreement,” which for the first time defined teams and a league structure out of which the modern Major League Baseball construct has risen. Today, Major League Baseball has thirty clubs operating under the league’s rules and regulations, 29 of which are in the United States and the other club operating in Toronto, Ontario, Canada. These teams are divided into two leagues, the American League and the National League. These league designations carry little meaning today, although they were significant in the earlier days of baseball as a sport (the leagues were, at one point, separate legal entities, although that designation has since been done away.) Within each league are three divisions of teams, separated geographically. The method of dividing teams in Major League Baseball over the years has changed – originally, no divisions existed in the leagues, but as teams were added, East and West divisions were instituted in 1969, and in 1994, a third Central division was added to each league. Players in Major League Baseball are considered today to be independent contractors, able to seek out employment with teams on their own (though, more commonly with the aid of a professional agent). This has, however, not always been the case with the league. In the early days of Major League Baseball, player movement was stifled by the existence of the Reserve Clause in player contracts. This clause gave a team the “rights” to a player that was under contract with said team, even after the contract expired. This meant that players were restricted from seeking out contracts with other teams on their own, as they were still technically controlled by the team who previously had them under contract. This type of control over players was argued to be in violation Busser 4 of the Sherman Act, which stated that two or more non-affiliated businesses engaged in interstate commerce cannot collude to fix prices, or in this case player salaries. Major League Baseball was given a pass from this in Federal Baseball Club v. National League, a Supreme Court case in 1922 that stated baseball was an “amusement” and not commerce that was subject to the regulations of the Sherman Act. Resultantly, teams operating in baseball exerted this control over players, working to suppress salaries and player movement for the following five and a half decades. The labor market shifted for players starting in 1953 with the formation of the current Major League Baseball Players Association, or MLBPA. The new union struck out with the intention of raising wages for professional baseball players in the major league labor pool, and worked to develop a Collective Bargaining Agreement, or CBA, with the league to define what players’ rights were. The first CBA was developed in 1968, an effort that worked to raise the minimum salary for league players to a more reasonable wage. The union managed two years later to come to an agreement for the league to use arbitration between players and teams in the case of salary disputes. By and large, however, arguably the greatest movement by the union was to dispel the Reserve Clause control that teams held over players. In 1975, an arbitrator working on the cases of two players struck down the Reserve Clause as a mechanism for controlling players, although it wasn’t until the conclusion of the 1976 season that the power of the Clause was completely dissolved. The act of striking down the Reserve Clause helped create the current climate by which player movement is dictated in Major League Baseball. Much of the movement of players between teams is now through the mechanism of free agency. Starting with the off-season of 1977, free agency allowed players who were no longer under contract with a team to freely solicit offers from other teams (as well as the team that previously held their contract) for their services. The growth of free agency over time has been pointed to as the driving force of the rising salaries for Major League Baseball players in the recent decades. However Player movement as a result of free agency, as noted in Hylan, Hage, and Treglia (1996), has not increased much for pitchers, and is unknown as far as position players are concerned, so one could draw the conclusion that free agency hasn’t profoundly changed player movement. The arbitration system in baseball allows teams and players to take salary Busser 5 disputes to an independent negotiator who will rule on what payment should be given to a player. In baseball arbitration cases, teams and players will bring their perceived best offers to the arbitrator, who will then rule in favor of one side or the other based on their own determination of a player’s value. As mentioned above, an arbitrator can also serve the purposes of settling disputes over the legitimacy of a contract or certain stipulations, especially in cases where the union may feel a player has been wronged by a team or a breach of contract has occurred. In his paper considering the labor market in baseball, John Vrooman (1996) makes some observations as to the nature of the labor market in baseball that illustrates these differences. He theorizes that Major League Baseball has been operating to some degree as a monopsony, given that there is no major league anywhere else in the world that can attract the level of talent that major league teams are able to sign. Because of this, American baseball has been operating with a labor market that is uncontested by any other league worldwide. Many other smaller leagues have seen a decline in their top level quality of players, as those players have migrated to the American baseball market to seek bigger paychecks and a chance to play at a perceived higher level of competition. Take the case of Ichiro Suzuki, for instance. As a player in the Japanese “major leagues,” Suzuki was arguably the best pure hitter statistically (he earned the single season record for hits and held an exceptional .353 career batting average) but was earning wages similar to that of a top level minor league player in the United States. Suzuki was able to talk to major league teams, notably the Seattle Mariners (who have had Japanese ownership for the last couple of decades) and earned a contract for $14 million over three years, which was far more than he would have earned in Japan over that same period. Despite its monopsony power, Major league baseball lacks controls on player salary that exist in other major sports, most notably a salary cap. In other professional leagues, a cap on the total amount of salaries that can be paid to players on a roster is in place, though the caps can be hard (as is the case with the NFL and NHL) or soft (like the NBA, where a host of exceptions exist to allow teams to skirt the cap to some degree). In Busser 6 addition, the NBA has a “max salary” provision in their Collective Bargaining Agreement limiting the total compensation that a player can earn. Empirical Survey How can we gain a true measure of what is considered in the setting of salaries for major league baseball players? A number of models have been developed empirically, each offering some insight into the salary setting mechanisms that teams have used in this free agency era. In the models developed by Lackritz (1990), Marburger (1994), Hakes and Sauer (2006), Hoagin and Velleman (1995), and Bollinger and Hotchkiss (2003), their analysis of salaries all took a look at the independent variables that affected their dependent variable of salary. In the case of Lackritz, salary was taken at its market value, but more recent models presented by the other four sets of authors mentioned have used the natural logarithmic value of salary. The data used in each of these models varies. Lackritz (1990) focused on select salary data from the 1985 and ’86 seasons. Marburger (1994) used a complete set of salary observations from 1991 and 1992. Hoagin and Velleman used 436 salary data points from 1986. Bollinger and Hotchkiss (2003) used data from 1987 to 1993. Hakes and Sauer (2006) used data from 2000 to 2004. Table 1 summarizes the literature referred to, along with information about the variables and data sets / estimations used in their research. Busser 7 Table 1: Survey of Previous Empirical Works Related to this Paper Paper Title / Authors “Salary Evaluation for Professional Baseball Players” – Lackritz (1990) Sample / Estimation Method Used 1985 and 1986 Player Data – OLS Estimation “Bargaining Power and the Structure of Salaries in Major League Baseball” – Marburger (1994) 1991 and 1992 Player Data – OLS Estimation “A Critical Look at Some Analyses of Major League Baseball Salaries” – Hoaglin, Velleman (1995) “The Upside Potential of Hiring Risky Workers: Evidence from the Baseball Industry” – Bollinger, Hotchkiss (2003) “An Economic Evaluation of the Moneyball Hypothesis” – Hakes, Sauer (2006) 436 Players’ Data from 1986 – OLS Estimation 1987 to 1993 Player Data – OLS Estimation Player Data from 2000 to 2004 – OLS Estimation * Indicates Significant at 10% or better Dependent Independent Vars Vars Salary Performance Measures: (Offensive Average*, On Base Percentage*, Stolen Bases, Strikeout to Walk Ratio*, [Hits / Inning Pitched*], [Saves / Wins*], Fielding Percentage*, Earned Runs / Innings Pitched*) Log Salary Hitter’s Model: Performance Measures: (Runs*, Home Runs, RBI*, Career Runs, Career Home Runs, Career RBI*) Non-Performance Measures: (Experience, Experience^2*, Contract 1991 [dummy variable]*) Log Salary Log Salary Ln Salary Pitcher’s Model: Performance Measures: (Innings*, ERA*, Saves*, Career Innings*, Career ERA*, Career Saves) Non-Performance Measures: Same as Hitter Performance Measures: (Career Runs Scored / Years*, Career RBI / Years*, √Runs scored in 1986*) Non-Performance Measures: (Years*,Years^2*,Years ≤7 [dummy variable]*) Performance Measures: Career Runs Scored / On Base Pct, Career Hits / Career At Bats*, Career Home Runs / Career At Bats*, Career Walks / Career Plate Appearances*, Career Strikeouts / Career Plate Appearances*, Career Stolen Bases / Career On Base Pct*, Career Caught Stealing / Career On Base Pct, Career Fielding Runs / Career Games* Non-Performance Measures: (ln of Television Revenues, Stadium Capacity, League Champion [dummy variable]*, Winning Percentage of team*, Age*, Age^2*) Performance Measures: (On Base Percentage*, Slugging Percentage*, Plate Appearances*) Non-Performance Measures: (Arbitration Eligible*, Free Agency Eligible*, Catcher*, Infielder [all 4 are dummy variables]*) Busser 8 Performance variables are a common thread between prior empirical works in estimating the salary of a player. In the cases of Lackritz and Marburger, the variables are suited to use for both pitchers and position players. In baseball, different performance measures are used for these two categories of players, and in the case of these two authors, specific non-overlapping variables were used to create models for both player categories. Lackritz used the ratio of Strikeouts to Walks, the ratio of Hits to the number of Innings Pitched, the ratio of Earned Runs to the number of Innings Pitched, and the ratio of the number of Saves to the number of Wins as his pitcher performance measures. Marburger’s analyses looks at the number of innings pitched, Earned Run Average (abbreviated ERA), saves, and career marks in these three areas. In Marburger’s case, ERA is indexed to the league average. In both cases, the use of specific measures for pitchers acknowledges that these players have a different salary setting mechanism than position players. Marburger’s data is considered more readily available, while Lackritz’s data is more derivative, although still widely used by baseball statisticians as important performance measures. All five papers considered here utilize measures for position players, although the measures differ in number and quality. Lackritz uses a derivative measure called Offensive Average, which combines a number of performance statistics into one indexed number. Bollinger and Hotchkiss use statistics all on the career level, using measures all related to performance relative to career games, at bats, plate appearances or on base percentage. Lackritz also uses on-base percentage, as do Hakes and Sauer. Most of the papers use the idea of runs scored or batted in as a primary evaluation of performance. In Marburger and Bollinger and Hotchkiss’ works, a measure of age or experience is used. As players age, it is theorized by Vrooman that their salaries increase, primarily as they gain more experience that can be considered independently from their recent performance. It is also theorized that as a player ages, their performance is superseded by this career “body of work,” and their earnings therefore become based more on their past performance rather than the latest statistical period. In the case of Marburger, Hakes and Busser 9 Sauer, and Bollinger and Hotchkiss, career statistics are used to try to capture the value of an entire career versus the latest statistics. In these works, the outcomes that have come about differ as far as the relevance and significance of certain variables and evaluations in the models. Lackritz (1990) found in his model that Offensive Average, On Base Percentage, the ratio of Strikeouts to Walks, the ratio of Saves to Wins, the ratio of Earned Runs to the number of Innings Pitched, and Fielding Percentage were significant at 10% for the American League. Offensive Average, On Base Percentage, the ratio of Earned Runs to the number of Innings Pitched, Fielding Percentage, and the ratio of Hits to the number of Innings Pitched were found to be significant at 10% for the National League. In both models, Fielding Percentage and On Base Percentage had the largest absolute coefficient values. Marburger (1994) tested his variables in three models based on whether a player was not eligible for arbitration or free agency, only able to go to arbitration, or able to have arbitration and free agency. For position players, at 10% significance, the non-eligible model saw only RBI and Experience^2 as significant. The arbitration-eligible model saw Runs, RBI, Career Home Runs, Career RBI, Fielding, Experience, and the dummy variable as significant at the same level. For the third model, runs, RBI and the dummy variable were the only significant statistics. Marburger also evaluated pitchers with the same three model breakdown. For the ineligible for both model, the constant, innings, saves, career ERA, Experience^2 and the dummy variable were significant at 10%. For the arbitration eligible model, all variables except for career saves were found to be significant at 10%. For the free agent and arbitration eligible model, the constant, innings, saves, career innings, career ERA, and the dummy variable were found to be significant. Hoaglin and Velleman (1995) found in their model that all of their performance and nonperformance variables in their model were significant at 10%. This includes the career RBI variable that Marburger found to be significant in his analysis. Busser 10 Bollinger and Hotchkiss (2003) conducted their analysis like Marburger, in which they broke players into three model groups based on their eligibility for free agency, having been traded, or were still under contract with their current team. For those under initial contract with their original teams, all performance statistics except for [Career Runs Scored / Career On-Base Percentage] and [Career Caught Stealing / Career On-Base Percentage] were significant at 10%; among non performance stats, only the League Champion and Number of Seasons Played variables were found to be significant. With the non-traded group, the same performance stats were significant at 10% except for [Career Home Runs / Career At Bats], and Age, Age^2, Winning Percentage, and Seasons Played were significant at the same level among nonperformance stats. For the free-agent eligible group, the same performance stats were significant at 10% as the first model, but only the ln TV Revenues variable was significant among non-performance variables. Hakes and Sauer (2006) found in their model that the variables were significant at 10% or better. The significance of on-base percentage is consistent with Lackritz’s findings. Part of the difficulty of making comparisons between these empirical studies is that each one uses some general derivation of performance statistics suited to their analyses, but in general, these studies tend to use commonly obtained variables, notably runs, on base percentage, and RBI in some fashion for position players and ERA and innings pitched in some fashion for pitchers. These papers have also tried to connect experience (whether it is through the use of career variables or experience/age measures) to salaries, with mixed results. Theoretical Analysis Major League Baseball exists in North America as a monopsony buyer of labor. While minor independent leagues exist in the US and Canada, and regional leagues exist in Caribbean and Central American countries in this region, no other North American league can hire the same caliber of labor that the MLB can. A monopsonist exists as the only buyer in a particular market for a product, and it can be Busser 11 inferred that the product would be the highest ability players of baseball. The implications that this creates in the labor market for baseball, and the structure of salaries, are that players would be paid below the equilibrium wage rate that would exist if the MLB was in a competitive market – that is, competing with several other leagues of the same size and quality for the same pool of players. The era of the Reserve Clause, from the early 20th century until its dissolution in 1976 in Major League Baseball, saw this market structure work to depress salaries of major league players. The monopsonist outcome in this case would be to hire less labor than the competitive market (which, in this case, is already finite due to the limited roster space available in the league as a whole, but economically would be defined at the point where MLB’s marginal wage cost would intersect the labor demand curve.) The pay that labor would receive in this type of market would be the wage set from the competitive labor supply curve at the level of monopsonist labor employed. This wage is below the competitive wage for the market, meaning that players’ salaries would be below what they should be earning given a competitive market for baseball players. Figure 1 provides a visual representation of a typical monopsonist market. Figure 1: A typical representation of a Monopsony labor market Looking at available data, the theory of wages having been suppressed during the era of the Reserve Clause holds true. Between 1950 and 1976, the final year of the Reserve Clause, salaries in MLB (in 2008 dollars) ranged between $120,527 and $195,793 on average - the nominal values were $13,300 in 1950 and $51,000 in 1976. In 2008, meanwhile, the average wage was a nominal $2,824,751, which would still be $735,795 in 1976 dollar terms, a marked increase over the wages paid at that time. Busser 12 Though MLB still exists as a monopsonist, some factors have worked to bring salaries of players higher over time. Elimination of the Reserve Clause and the creation of free agency has helped to elevate salaries over the last three decades. Free agency in baseball allows the laborers (the players) to negotiate (by themselves or with the help of a professional agent) with teams, allowing the two sides to engage in negotiations to determine a fair contractual wage to be paid in return for a player’s labor for the team. Not all players in MLB have agents, and observing what the salary implications for players who do not use an agent would be an interesting angle to observe. Agents act as facilitators for communication between players and teams in the league, but also work to cut through much of the complex legal procedure in the contract negotiation and signing process on behalf of the player. Whether agents provide a benefit for the players in terms of salary is unknown – the literature has not addressed this question explicitly. Controlling for such a variable in a salary determination model could address some other effects that go into determining what a player’s salary will be. While free agency has worked to bring wages upward, the market for labor in baseball has given a little more power to the players over time thanks to the strengthening of the player’s union and collective bargaining. The MLBPA, or Major League Baseball Player’s Association, is the official representative of players in dealing with the league, and part of the increase in wages can be attributed to some power of the union. The maximizing nature of a labor union in baseball is focused on wages, as the amount of labor that can be utilized is relatively fixed. Wages vary from player to player based on their individual contracts; the union focuses on setting minimum wages for players in MLB rather than trying to create higher wages on the upper end of the pay spectrum. In the Collective Bargaining Agreements (CBA) between the union and the league, the minimum wage value is set explicitly – in 2008, that value was $390,000, and has been increasing continually since the current CBA was signed in 2006. The union has been a proponent of revenue sharing as a means of creating competitive balance, as it has been proposed that having the league exist with a mix of large and smaller market teams, and having no Busser 13 salary cap in place, could hinder the competitiveness of the league. General conception suggests that this would have an ill effect on player salaries. Vrooman (1996) dubbed this idea of competitive imbalance the “Steinbrenner effect” after the owner of the New York Yankees, a team which has consistently paid the highest salary totals in the Majors over the last eight years in order to be competitive. Vrooman acknowledges that this idea of larger market teams spending more to secure the services of better talented players in the labor pool could eventually cause teams in smaller markets with more limited resources to “settle” for second-tier players at lower wages, although the effect of this spending could be to pull the overall average wage rates for all players in the labor pool so high that some organizations will be unable to field a competitive team. The player’s union has tried to counteract this through the inclusion of revenue sharing and luxury tax provisions in the CBA, with the idea that revenue sharing of league income would allow teams in smaller markets to benefit from the league’s success as a whole and be able to afford a competitive team on the field. However, some owners have taken to pocketing these extra funds, instead fielding teams that may not be as talented as could be afforded with revenue sharing funds. This activity is undertaken so that a team can profit from the revenue sharing. The luxury tax was instituted as a way of limiting the spending of larger market teams by placing a tax on salaries paid above a threshold. The tax money, much like the revenues shared, would go to teams who did not exceed that threshold for the purpose of competitive balance. For owners who view a baseball team as a business operation in the purest sense, profitability will likely always supersede performance and product quality. Of course, should competitive imbalance widen in baseball, those owners may be compelled to spend those earnings to field a quality minimum product to ensure profitability in the future, hence creating a vicious cycle. This could create a two tiered wage system; in the future, theoretically, players employed by the largest market teams would be paid wages that exceed anything the teams in smaller markets could afford. These smaller market teams instead would pay wages for less talented players that may be higher than would otherwise have been paid in a competitive market. Busser 14 Arbitration is another salary setting idea that has worked to raise the amount paid to players in recent years, and introduces another factor into the equation – the arbitrator. Many players who have been with one club for a period of time - usually between three to six years of MLB service time – are eligible for arbitration. There are some exceptions that allow for arbitration to be invoked after just two seasons, usually in the case of exceptional player performance. These players may seek pay raises or contracts that are better than those being offered to them through the use of the arbitration system. The system itself was developed by Stevens (1966) as a means of settling salary disputes. Marburger (1994) explains in his paper that in this system, players and teams are able to submit their contract demands / offers, and a form of final-offer arbitration occurs, in which the arbitrator selects the salary offer they feel is more suited to the player (based on a value that they develop for the player) and renders a decision as to the pay the player should receive. Faurot and McAllister (1992) explain this system further, but for this paper, the importance of arbitration is that it is another method of setting the salaries of players in this market. Some have suggested that arbitration has a stronger importance to the market for players in baseball and salary determination than free agency, and to some degree this can be inferred. Arbitration allows for players and teams to determine a value for a player’s services, often using past arbitration cases and other salaries of players with comparable performance as a reference. Most players who are eligible for arbitration often use these values as a starting point for contract negotiations outside of the arbitration system. Having this information available to both sides allows for a fair negotiation of salaries between players and teams, and also allows for most cases that do go to arbitration to have a fair outcome. Not all players eligible for arbitration proceed into the process with their teams – many use the information available to negotiate contracts prior to reaching arbitration. Still, those salaries are typically higher than what a player previously earned from a team. Another situation in baseball’s labor market to consider is the player’s rent seeking behavior in the labor market. Rent seeking behavior here can be seen as players trying to capitalize on factors other than Busser 15 performance to obtain a higher wage rate than would otherwise be paid to them by a team. One of these factors is the idea of “star power,” where a player can use their name and notoriety / past accomplishments unrelated to current performance measures to obtain higher wages. A player, for instance, that has a popular public persona may be paid a higher wage by a team (compared to a homogeneous talent) if a team sees that player as having some extra ability to sell tickets. Players may also use their whole body of work as a player to secure employment at high wage levels later in their career, despite their skills having deteriorated to the point where another player may have better talent and a lower wage but is passed over for employment. While teams are more than willing to pay rents for players who they perceive could bring them additional some benefit off of the field, these players may not be able to produce enough on the field to justify their salaries including high economic rents, especially if the player becomes a liability for the team in some way. The labor union, in their practices to establish minimum salaries and various means of ensuring teams are paying players higher wages over time, also engage in rent-seeking behavior. The union’s objectives to raise the salaries are not associated with any increase in performance, or (as far as the owners are concerned) any increase in the number of paying customers, but do seem to try to capitalize on increases in total revenue the league earns on an annual basis. While this behavior has been justified in the past as being compensation for the league working to suppress salaries through the Reserve Clause and other ownership collusion in the 1980’s, many feel that this is not as justified today, and that the union should back off from trying to set salary parameters for players. Because of the nature of a labor union, and the fact that baseball is not a perfectly competitive market, there will always be some economic rent and subsequent rent seeking behavior, but the amount of that rent might continue to rise unchecked if the union continues to be as aggressive with regards to pay. Marginal revenue product does play a role in salary setting, though it may not seem as important of a consideration with some of the larger player contract amounts paid in baseball. Teams benefit from paying players below their marginal revenue product (this is defined here the amount of revenue that the team is Busser 16 expected to generate based on a player’s performance and their non-performance attributes, like that “star power” idea mentioned above.) The risk that exists in paying players who have a large amount of this “star power” the salary and the rents associated with that quality is that the player ultimately may see their value change over the life of the new contract. A player could see their marketing power decline while their performance remains at the same level or vice versa. The actual monetary benefit of a player to a team can decline, causing the salary paid to the player to exceed the player’s MRP value. In developing salary figures for a player a team would like to sign, they will actively conduct such a cost-benefit analysis and set that MRP value, knowing that the team would like to sign the player for a salary that is less than or equal to that MRP. Empirical Model In this study I use a cross sectional data set consisting of 200 position players and 200 pitchers in the year 2006, developed using data found on the Major League Baseball web site and other sources. The models are estimated using the Ordinary Least Squares (OLS) method. This method is a standard regression estimation process, in which a regression line is found for the coefficient estimates working to minimize the value of the squares of the residuals. The models’ structure is considered to be that of a semi-log function. In this functional form, as it applies to my model, the natural logarithm works to transform the dependent variable but the independent variables are not logarithmically transformed. The estimated coefficients in the regression would represent an average percentage change in the dependent variable given a unit change in the independent variable (Halcoussis 110.) The model I present for position players is: Equation 1: Log Salaryi = f (per Table 2) + errori The model proposed for pitchers is: Equation 2: Log Salaryi = f (per Table 2) + errori Busser 17 Table 2: Definitions for Independent Variables Used in Equations 1 and 2 and Their Expected Sign of Coefficients Independent Variable Definition Expected Sign Used in Equation 1 Used in Equation 2 Non-Performance Variables Experience (Exp) Number of seasons a player has in MLB Positive X X Large Market (LgMkt) Dummy variable equal to 1 if the player’s team is in the top 5 MSAs, 0 otherwise Ambiguous X X Free Agency Eligible (FAElg) Dummy Variable equal to 1 if player was eligible for free agency, 0 otherwise Positive X X Arbitration Eligible (ArbElg) Dummy variable equal to 1 if player was eligible for salary arbitration, 0 otherwise Positive X X Performance Variables Runs Scored (Runs) Number of times a player scored as the result of their own or other players’ actions Positive X Runs Batted In (RBI) Number of runs scored as the result of the player’s offensive actions Positive X On Base Percentage (On Base) Percentage of times a player’s at bats result in the player reaching base Positive X Hitter’s Strikeout to Walk Ratio (HSOBB) Ratio of times a player strikes out during their plate appearances to the number of walks taken Negative X Batting Average (BA) Percentage of at bats that result in the player reaching base due to a hit Positive X Earned Run Average (ERA) Mean of earned runs given up by a pitcher per nine innings pitched Ambiguous X Innings Pitched (IP) Number of official innings completed by a pitcher Positive X Pitcher’s Strikeout to Walk Ratio (PSOBB) Ratio of the number of opposing batters a pitcher strikes out to the number of walks surrendered Positive X Batting Average Against (BAA) Percentage of at bats in which a pitcher allows batters from the opposing team to reach base due to a hit Negative X Winning Percentage (WP) Percentage of games won out of the total number of decisions a pitcher was part of in a season Positive X Fielding Percentage (Field) Rate of successful fielding chances by a player Positive X X Busser 18 The models that I am developing in finding what determines salary differences among players tie into the theoretical ideas presented earlier in a number of ways. Performance is a primary focus of this analysis, as the focus of the baseball labor market is on signing players whose performance is superior to that of other peers in such a way that a team can be more successful by signing said players. As mentioned in the empirical survey section above, there are five main components of performance (Runs, On Base Percentage, and RBI for position players and ERA and Innings Pitched for pitchers) that are found to be statistically significant across multiple datasets. Runs Scored and Runs Batted In are key components in illustrating the offensive production of a player in terms of the overall game. In a baseball game, a player can have a lot of hits without producing many RBI if his teammates are not in a position to score on his hits – those hits become somewhat irrelevant to the outcome of the game if they don’t produce scoring. Runs scored is another measure that has some relevance, as it implies that a team is successful enough to take advantage of the offensive production by a player and turn that player into a run for the team. A player who has a lot of Runs Scored not only is going to be successful offensively themselves, but acts as a proxy for the success of the teammates that play around him, and that team itself. On Base Percentage examines the ability of a player to reach base, but is not necessarily strictly related to offensive production, as there are multiple ways to get on base (besides hits, a player can walk, reach on an error, reach on a dropped third strike, etc.) On the pitching side, using earned run average makes sense as it measures the effectiveness of a pitcher in keeping the opposing team from scoring runs. A high ERA is analogous to poor performance for any pitcher, no matter what – a higher earned run average indicates that the pitcher is allowing the other team to score due to the pitcher’s own performance. Because errors in fielding and other circumstances can allow an opposing team to get runners on base and score, the ERA is important to use as a measure of performance, as it automatically filters out “unearned runs,” those that occur independent of the pitcher. Innings pitched also could be a significant variable, as teams are more seemingly more willing to pay for a player who they perceive to be “durable” enough to pitch a high number of innings while contributing to the Busser 19 success of a team. Effective pitchers who can shoulder a larger workload without injury risks command a premium in many cases, even if their talent is less than that a comparable pitcher with questionable health. Having dummy variables to indicate whether a player was a free agent or arbitration eligible is also important to note, because these situations may cause the salary negotiated to be higher or lower than what the player was seeking or what the true value of the player is based on their performance. Free agents may get a lower wage than they otherwise would have (if they signed a brand new contract, the first year pay might be lower than in subsequent years) and players who went through the arbitration process may have “won” their case and earned a salary that may be more than the team was otherwise going to pay the player. A player may still see their pay rise even if they lose their arbitration case, although the size of the difference between the pre-arbitration salary and the post-arbitration salary can differ from case to case. Market size is also something that can be considered as a reasonable variable to include in this model, as it can account for some of the salary that a player is paid, especially if the player is an average player on a large market team. A player in that case might theoretically be paid more overall than a player of similar talent on a smaller market team due to the market size and the willingness of the owner of the larger market team to “overpay” for the player. A situation could also arise where a great player on a smaller market team is earning less than they otherwise would if they signed to a larger market team. A good measure to use in this case is a dummy variable for Metropolitan Area Size – in this case, denoting if the player is on a team in one of the 5 largest Metropolitan Statistical Areas in the United States. (In the case of players from the Toronto Blue Jays, a Canadian team, the market size would be considered as outside of the large market range – if Toronto were listed with the US MSAs, it would fall between 20th and 21st in size.) In addition to those mentioned above, there are other statistics that were shown to be significant and can be used for performance analysis. For all players, fielding percentage is shown to be significant, and does have some relevance – if a player is unable to field their position well, they become a liability to the performance of a team. Lower fielding percentages mean that a player will commit more errors and fail to Busser 20 make plays that would result in getting an opposing player out. This, in turn, will increase the chances that a player on the opposing team will score a run, thereby putting the fielding player’s team under pressure to score an extra run in order to win a game. Poor fielding players also have a tendency to make fielding difficult for their teammates, which can have an adverse effect on others’ performances. Also, for all players, I believe that experience will be a measure that may come out to be significant, despite the mixed results that were seen in the other empirical papers. I believe experience will show that as a player gains more time in playing at the major league level, their skills may improve to some degree. Of course, after a certain point, performance may slip due to age and decline, but overall I believe that experience will have an effect on salary somewhat in part due to the rent seeking behavior of players. For pitchers, I believe that winning percentage would be a relevant variable to consider, as teams will be more willing to take a chance on signing a player who won a larger proportion of their games than they lost over the previous season. Pitchers who lose a lot of games in a season tend to do so because of their performance rather than the performance of those around them, and pitchers who win a lot of the games in which they pitch often do so because of their performance. There are exceptions to this, of course, but in general the better a talent a pitcher is, the higher his winning percentage will be. Another performance statistic that can be significant for both position players and pitchers is the ratio of strikeouts to walks. This statistic is meaningful for position players because it provides a measure of positive versus negative outcomes that can occur without a hit occurring. A strikeout is the worst outcome a player can have individually, as it prevents the player from getting on base as well as advancing the positions of any players on base. A walk is a positive outcome, allowing the player to get on base without having to get a hit. For a pitcher, the roles are reversed – a strikeout is a highly desirable outcome, as it prevents a player from getting on base and prevents those on base from advancing, while a walk is an undesirable outcome. Batting average is another two-way variable that pitchers and position players can be evaluated on. For position players, a high batting average indicates a higher percentage of positive outcomes in reaching Busser 21 base, while a lower batting average indicates a larger propensity to have a negative outcome at the plate. In the pitchers’ case, batting average against indicates how successful batters are in hitting against pitchers – higher values indicate lower success by a pitcher in getting a favorable outcome against hitters. Descriptive Statistics Table 3 lists the minimum, maximum, mean, median, and standard deviation values for the variables included in the two equations. The table also lists which players in the dataset are associated with the minimum or maximum values for each variable, if applicable, or lists “several” if multiple players’ observations share the same value. Busser 22 Table 3: Descriptive Statistics for Included Variables Variable Minimum Value Maximum Value Mean Value Log of Salary (logSalary) H: 5.579784 (Several) H: 7.356189 (Alex Rodriguez) H: 6.310912 P: 5.579784 (Several) P: 7.204120 (Bartolo Colon) P: 6.213695 Experience (Exp) H:1 (Several) H:18 (Omar Vizquel) H:7.55 P: 1 (Several) P: 22 (Jamie Moyer) P: 7.58 Runs Scored (Runs) 7 (Kelly Shoppach) 131 (Grady Sizemore) 65.185 Runs Batted In (RBI) 6 (Luis Rodriguez) 149 (Ryan Howard) 61.39 On Base Percentage (On Base) .257 (Juan Uribe) .431 (Manny Ramirez) .344595 Hitter’s Strikeout to Walk Ratio (HSOBB) .543480 (Albert Pujols) 16.500000 (Vance Wilson) 2.255439 Batting Average (Percentage) (BA) .197000 (Tony Clark) .344 (Joe Mauer) .278305 Earned Run Average (ERA) .890000 (Jonathan Papelbon) 8.14 (Russ Ortiz) 4.04845 Innings Pitched (IP) 50 (Several) 235 (Brandon Webb) 114.2498 Pitcher’s Strikeout to Walk Ratio (PSOBB) .8571 (Chad Gaudin) 10.5455 (Ben Sheets) 2.516885 Batting Average Against (Percentage) (BAA) .158 (Joe Nathan) .333 (Russ Ortiz) .25458 Winning Percentage (WP) 0.00 (Several) 1.00 (Several) .534565 Fielding Percentage (Field) H: .9155 (Edwin Encarnacion) H: 1.00 (Several) H: .979724 P: .667 (Ron Mahay) P: 1.00 (Several) P: .954215 Busser 23 The salary numbers indicate a healthy mix of players at all ends of the salary spectrum, from players earning the minimum salary for a player (in both sets of observations, there were multiple players earning the league minimum salary) to the highest paid players in real dollar terms for those years. The log of the league average salary, for comparison, was equal to 6.4321, or about 2.89 million dollars. This value is slightly higher than the averages in my dataset, although the league average doesn’t account for some players who earn minimum salary. As far as the statistics are concerned, the values obtained for the averages are right around the league averages for MLB as a whole. The league average for Batting Average was .269, eight points below the average for the sample. The league on-base percentage was .337, seven points below the average in this model. The overall league ERA was 4.54, which was higher than the mean of the dataset, but doesn’t exclude outliers who may have incredibly high ERA values. The Batting Average Against measure was somewhat below the league average of .269, but again could be explained by the exclusion of extreme outliers from the dataset. The table also presents the names of the players, if only one individual achieved a certain statistical value. From this, we can observe a number of different players being statistically superior or inferior in some areas, but only one player (Russ Ortiz) repeating in more than one category. In his case, he had the worst sampled ERA and Batting Average Against, but also was one of the players who earned a league minimum salary. In some cases, several players were found to have the same value as the minimum or maximum for a specific variable. As other non performance variables are concerned, there are three dummy variables that are included in both equations. Table 4 presents the descriptive statistics for these variables for each equation: Busser 24 Table 4: Descriptive Statistics for Dummy Variables Variable Hitter % Yes Pitcher % Yes Arbitration Eligible (ArbElg) 16.5% 20% Free Agency Eligible (FAElg) 17% 19% Plays in Large Market (LgMkt) 24.5% 27% The number of players who fit the criterion for arbitration or free agency in the sample is similar to the overall rates observed in Major League Baseball. 151 players were arbitration eligible out of 750 possible roster spots in the league, which calculates to around a 20% rate – similar to the rate found in the sample. As far as free agency is concerned, the number of players who were eligible to file for free agency as major league players was around 200, which comes out to around 26%, a higher rate than was found in the sample. Eight out of the 30 teams fall under the umbrella of “Large Market” teams, meaning that around 27% of the players overall in the league play for a team in a large market. This value is in line with the findings from the samples used for both equations. Test for Multicollinearity In empirical studies, there is always the chance of multicollinearity occurring between variables. Multicollinearity is the term used to describe a linear relationship, or correlation, between two or more independent variables. In any regression, having multicollinearity can have consequences, as it can alter the outcomes of the analysis. With multicollinearity, the standard error values for independent variables that are affected will be high and the resulting t stats will be low, which can cause problems in hypothesis testing, namely an increase in Type 2 errors (in which we fail to reject a null hypothesis that otherwise would be false.) Busser 25 To observe whether or not there is multicollinearity present in the two equations, a correlation coefficient matrix can be developed. The values in the matrix show the correlation coefficient between two independent variables. These coefficients take a value between -1 and 1. A value of 1 indicates perfect positive multicollinearity – both variables move in the same direction. A value of - 1 indicates that the two variables are perfectly correlated in the opposite directions. Optimally, we would like these values to be as close to zero as possible. If these values are above |0.8|as a rough rule of thumb, we have a fair amount of concern for the two variables exhibiting multicollinearity. Tables 5 and 6 display the correlation coefficient matrices for independent variables of the two equations. Table 5: Correlation Coefficient Matrix for Equation 1 ARBELG BA EXPERIENCE FAELG FIELD HSOBB LGMKT ONBASE RBI RUNS ARBELG BA EXPERIENCE FAELG FIELD HSOBB LGMKT ONBASE RBI RUNS 1.000 0.094 -0.039 -0.165 0.068 -0.048 -0.159 0.024 0.034 0.090 1.000 0.014 -0.010 0.090 -0.196 0.119 0.744 0.379 0.409 1.000 0.415 0.240 -0.228 -0.033 0.110 0.041 -0.001 1.000 0.059 -0.105 -0.010 -0.059 -0.034 -0.127 1.000 -0.042 -0.110 0.056 -0.007 -0.062 1.000 -0.043 -0.512 -0.314 -0.385 1.000 0.088 0.101 0.081 1.000 0.520 0.564 1.000 0.835 1.000 Busser 26 Table 6: Correlation Coefficient Matrix for Equation 2 ARBELG BAA ERA ARBELG BAA ERA EXPERIENCE FAELG FIELD IP LGMKT PSOBB WP 1.000 -0.044 -0.032 -0.131 -0.242 0.026 -0.116 -0.023 -0.030 -0.057 1.000 0.783 0.048 0.257 0.101 0.235 -0.034 -0.409 -0.220 1.000 -0.035 0.240 0.039 0.140 -0.080 -0.474 -0.438 1.000 0.409 -0.016 0.088 0.166 0.093 -0.003 1.000 0.044 0.074 0.021 -0.168 0.056 1.000 0.108 -0.036 -0.015 -0.027 1.000 -0.003 0.040 0.080 1.000 0.049 0.088 1.000 0.201 EXPERIENCE FAELG FIELD IP LGMKT PSOBB WP 1.000 As can be observed by the italicized values in Table 5, there is a pair of independent variables (Runs and RBI) that exhibit a significant amount of multicollinearity, indicated by a positive correlation value over the 0.8 threshold. While these high correlations are a concern, there are a few solutions available for use to deal with this multicollinearity so that they do not have a detrimental effect on the regression output. One solution to this problem would be to eliminate one of the variables if there is some sort of redundancy present. This solution would work if both variables are essentially capturing the same information in the equations. A second fix for multicollinearity would be to redesign the model, transforming the variables into some other form that could be useful to the equation. This transformation would work to put them in some form that doesn’t exhibit this multicollinearity. A third solution for this would be to increase the sample size used for the regression of the equation. This solution is by far the best solution if you suspect that the variables are not as highly correlated in larger samples. A fourth solution would be to leave the highly correlated variables alone keeping in mind that the estimated coefficients on these variables may not be reliable. Busser 27 Given that the multicollinearity between Runs and RBI does not affect the estimated coefficients of other independent variables, the best solution in dealing with these variables would be to leave them alone. There is not more data from the given year to add into the sample to make it more robust. Additionally, the two variables are different enough that we would lose some explanatory power by omitting one of them from the equation. Transforming the variables into another form would not be especially useful, and such a transformation would not be logically created. Leaving these variables as they are in the equations does pose some issues with the interpretation of the results of the regression, however, as this high correlation would likely have an effect on the size (and possibly the sign) of resulting coefficients in the regression. Testing for Heteroskedasticity The data set for this regression analysis contains observations of cross sectional data. This specific type of data is susceptible to a problem called heteroskedasticity. For the Ordinary Least Squares, or OLS, method of estimation to yield the best estimates, the error terms of Equations 1 and 2 must have a constant variance (be homoskedastic) across observations. When heteroskedasticity is present, the variance of the error terms varies across observations. Heteroskedasticity manifests itself in the results of a regression in a couple of ways. It can cause the OLS results to show larger t-statistics, due to an underestimation of the standard error of the coefficients. This can be a huge problem, as it can lead to Type 1 errors, in which null hypotheses about the independent variables may be falsely rejected. Heteroskedasticity can be tested for using the White test. This test uses multiple steps to create an output that checks for several types of heteroskedasticity at one time. The test involves running the regression on the original equations and saving the residuals (the difference between the predicted and actual values of the dependent variable) for use in the White test. Those residuals are squared and used as observations on the dependant variable in second regression model. The independent variables in the second equations are the independent variables from the original equation; the squared values of each of the Busser 28 independent variables from the original equation; and product of each two independent variables from the original equation . The coefficient of determination ( R2) of this second equation multiplied by the number of observations (n) has a chi square distribution. The critical chi-square is determined by the percentage of significance we wish to use (typically, this value is 5%) and the number of degrees of freedom (which is the number of independent variable included in the second equation). If the value of the nR2 statistic is less than the chi square value we found, we fail to reject the null hypothesis of homoskedasticity. If the nR2 value is greater than the chi square value, we reject the null hypothesis in favor of the alternative hypothesis, and assume there is heteroskedasticity present. For the hitter’s equation, the nR2 value of the White test is 54.58323. The chi square value for a 5% level of significance and 62 degrees of freedom is equal to 81.38102. (This value was derived using the function =CHIINV(.05,62) in Microsoft Excel 2007.) In this case, the value of nR2 is less than the chi square critical value, so we fail to reject the null hypothesis and can assume that there is no heteroskedasticity present in this equation. The nR2 value of the White test for the pitcher’s equation is 92.96754. The chi square value for a 5% level of significance and 61 degrees of freedom is equal to 80.2321. (This value was determined using the same function as in the first equation, except the value of 62 is changed to 61 [=CHIINV(.05,61)]. In this case, the value of nR2 is greater than the chi square critical value, so we reject the null hypothesis in favor of the alternative hypothesis. There is heteroskedasticity present in this equation. As noted above, we have an equation (Equation 2) in which there is some level of heteroskedasticity present. To minimize the effect of heteroskedasticty I follow the White procedure to correct the standard errors values of the estimated coefficients in Equation 2 for heteroskedasticity. The method aims to obtain better estimates of the standard errors to hone in the accuracy of the t-statistics. Typically, the outcome results in there being higher calculated standard errors, but more accurate (albeit often lower) t-statistics. Busser 29 Estimation Results As discussed above, for the hitter’s equation (Equation 2), I can complete our estimation without any adjustment from the model outlined previously. For the pitcher’s equation (Equation 2) I will utilize the White method of correcting for heteroskedasticity in the estimation. Tables 7 and 8 present the results of the hitter’s and pitcher’s equations. Table 7: Regression Results for Hitter’s Equation Independent Variable Coefficient Value Standard Error T-Statistic Significance C 5.18553 1.479755 3.504316 99.94% RUNS 0.006113 0.001588 3.850105 99.98%*** RBI 0.004486 0.001402 3.199664 99.84%*** ONBASE 0.846241 1.281671 0.660264 49.01% HSOBB -0.007213 0.019612 -0.367794 28.66% BA -2.957660 1.343628 -2.201249 97.11%** FIELD 0.400801 1.510282 0.265382 20.90% EXPERIENCE 0.073126 0.007130 10.25646 100.00%*** LGMKT 0.168603 0.057574 2.928452 99.62%*** FAELG -0.046818 0.072723 -0.643782 47.95% ARBELG 0.128306 0.067614 1.897620 94.07%* Number of Observations = 200 Adjusted R2 = 0.612208 *** - Significant at 1% Error Level; ** - Significant at 5% Error Level; * - Significant at 10% Error Level Busser 30 Table 8: Regression Results for Pitcher’s Equation Independent Variable Coefficient Value Standard Error T-Statistic Significance C 5.097949 0.527007 9.673394 100.00% ERA -0.031529 0.046339 -0.680395 50.29% IP 0.004257 0.000485 8.783339 100.00%*** PSOBB 0.089062 0.021297 4.181866 100.00%*** BAA -0.106577 1.395424 -0.076376 6.08% WP -0.319201 0.171240 -1.864056 93.61%* FIELD 0.259975 0.457814 0.567861 42.92% EXPERIENCE 0.049797 0.005750 8.659995 100.00%*** LGMKT 0.038915 0.060691 0.641202 47.78% FAELG 0.248311 0.079497 3.123540 99.79%*** ARBELG 0.236802 0.056654 4.179815 100.00%*** Number of Observations = 200 Adjusted R2 = 0.567136 *** - Significant at 1% Error Level; ** - Significant at 5% Error Level; * - Significant at 10% Error Level With the regression results, a statistic known as Adjusted R2 is generated. This statistic is a measurement of the “goodness of fit” of the equation, or how well the regression explains the fluctuation of the dependent variables around its mean. The closer the adjusted R2 value is to 1, the better the fitness of the equation. In the case of our two equations, the Adjusted R2 values are 0.612208 and 0.567136 for the hitter’s and pitcher’s equations, respectively. This can be interpreted as meaning that the equations explain 61.77% and 56.7136% of the variation in the Log Salary around the mean Log Salary. The regression results generated from our equations provide a few other useful statistics. One of the most useful of these in terms of finding significance with the independent variables is the t-statistic. With the t-statistic, we can conduct a T-test to analyze the significance of the coefficient on a particular variable. The Busser 31 t-test works to analyze the slope coefficient estimates of each independent variable and compare that value to zero. This test works in the form of a hypothesis test, allowing us to use the quantitative output of the test to make a determination about the true effect of each independent variable on the dependent variable. For each variable, we formulate a null and alternative hypothesis. The alternative hypothesis for a given variable is the expected sign for the slope coefficient, while the null hypothesis is the opposite sign of the expected outcome as well as a zero value. To conduct the t-test, we take the absolute value of the t-statistic value for a particular variable and compare it to a critical value determined by the number of degrees of freedom and the level of significance we want to test for. The null and alternative hypotheses mentioned earlier are for a one-sided t-test; a two sided test exists in which the null hypothesis is zero and the alternative a non-zero value. Such a test is useful for a variable in which we are less concerned about what sign the slope coefficient is, but rather whether the variable has any effect at all on the dependent variable. The level of significance is the probability of type I (rejecting a true null hypothesis) error. The degrees of freedom are determined by the equation (n-k-1), in which n is the total number of observations used in the regression, and k being the number of independent variables in the equation. If the absolute value of the t-statistic is greater than the critical value, we can reject the null hypothesis in favor of the alternative hypothesis, and assume that for the level of significance we designate, the value of the coefficient is significantly different from zero. Otherwise, we fail to reject the null hypothesis and cannot be confident that the independent variable under study has a significant effect on the dependent variable. For example, we can conduct a t-test on the RUNS statistic from the hitter’s equation. We expect that a player who scores more runs will see a positive change in the difference between his salary and that of others. For this coefficient, we can conduct a test at a 10% level of significance, which is our minimum threshold for a variable to be considered significant. In this case, our null hypothesis is that the RUNS Busser 32 coefficient is negative or zero, while the alternative hypothesis is our expected positive coefficient on the RUNS variable. Our t-statistic for the value (as given by our regression) is 3.850105. The critical value for this regression (as a one sided test with a 10% level of significance, and (200-10-1) degrees of freedom) is 1.286047. We compare the t-statistic value to the critical value and find that our t-statistic is greater. Therefore, we can say that the slope coefficient of the RUNS variable is significant at a 10% level of significance. For the equations developed, we can examine the individual coefficients and the significance of the variables to examine what each of these results means for our dependent variable. In each case, a positive coefficient value suggests that a one unit increase in a particular value would have a positive effect on the Log Salary value. Table 9 examines the coefficients of the equations and what effect they have on Log Salary. Busser 33 Table 9: Regression Coefficients and Their Effects on Log Salary Dependent Variable RUNS Change in Value Effect on Log Salary 1 Additional Run Scored Increase by .006113 1 Additional RBI Increase by .004486 10 Additional Points (1%) Increase by .00846241 1 Additional Unit Decrease by .007213 10 Additional Points (1%) Decrease by .0295766 1 Additional Unit Decrease by .031529 1 Additional Inning Increase by .004257 1 Additional Unit Increase by .089062 BAA 10 Additional Points (1%) Decrease by .001065 WP 10 Additional Points (1%) Decrease by .003192 FIELD 10 Additional Points (1%) Hitters: Increase by .265382 RBI ONBASE HSOBB BA ERA IP PSOBB Pitchers: Increase by .259975 EXPERIENCE 1 Additional Year Hitters: Increase by .073126 Pitchers: Increase by .049797 LGMKT FAELG ARBELG Change from Non-Large Market team to Large Market team Hitters: Increase by .168603 Change from Ineligible to Eligible Hitters: Decrease by .046818 Change from Ineligible to Eligible Hitters: Increase by .128306 Pitchers: Increase by .038915 Pitchers: Increase by .248311 Pitchers: Increase by .236802 Busser 34 For each equation, the coefficients of six variables are found to be significant at the 10% level or better. In each case, the coefficients of three variables that are unique to each equation (performance variables) are found to be significant, and the coefficients of three overlapping variables are significant. The variable FIELD (Fielding Percentage) is found not to have a significant effect (at 10 percent or better) on Log Salary in either equation. I expected this value to have relevance to the equation, but at the same time understand that there is a logical reason for it possibly being insignificant. Some players are paid a premium primarily for their defensive abilities in the field, even if their performance offensively is below average. Judging by the result of this regression, however, this may be an exception rather than the norm. For pitchers, the FIELD variable has a little less significance to begin with, but still had some theoretical value to add to the equation. Again, it can be inferred by these results that fielding ability has little impact on the overall salary differences among pitchers. Lackritz (1990) found this variable to be significant, but other studies have been consistent in finding this variable to be insignificant. In both equations, the coefficient of the EXPERIENCE variable is found to be highly significant. This falls in line with the theoretical idea that players who are in the league for a longer period of time are paid a premium for their body of work later in their career or, as discussed earlier, players with a number of years of experience in the league are, in theory, able to collect some sort of economic rent. In both equations, additional years of experience will increase the difference between that player’s salary and that of others. This effect is almost twice as large for batters compared to pitchers, which could mean that those additional years of experience are less valuable to a team in terms of pitching. This is in line with Bollinger and Hotchkiss’ (2003) and Marburger’s (1994) findings, although in the case of the equations used here, experience was found to be significant for all players, whereas Marburger found it to be significant only to certain subsets of players. In both equations, the ARBELG variable had some relevance in salary determination. Players who are eligible for arbitration tend to see their salaries rise usually due to a combination of teams wanting to Busser 35 secure these players under contract prior to reaching arbitration as well as the idea that the pay a player should be earning at that point is usually higher than their current salary. In practice, arbitration is usually avoided by teams, and contracts are usually offered to players prior to their being able to take a team into the arbitration process. The increase being substantially larger for pitchers compared to batters is a bit curious – there wasn’t necessarily anything mentioned in prior literature about a major difference in salary change for pitchers compared to position players because of arbitration, and it doesn’t necessarily seem like there are any specific reasons why this should be the case. Logically, there could be a premium placed on pitchers who are younger – and given that arbitration is available as a tool for players with as little as two seasons of experience, this could be the case. The coefficient of the variable FAELG (Free Agency Eligible) is found to be significant for pitchers, but not for hitters, in determining salary changes. Considering the mean salary for pitchers is lower than that of batters, this is also a bit surprising. Hylan, Lage, and Treglia’s (1996) work does support the idea that free agent eligibility works over time to increase the salaries paid to pitchers. In these cases, the theory suggests that the free agent mechanism for pitchers can work to match players to teams offering the highest value for their services faster. Why free agency would not be significant for position players as well is a bit confusing, though, since the same market system is applicable to position players. The coefficient of LGMKT was significant for position players, but not for pitchers. Market size, as mentioned earlier, can play a role in paying higher salaries to players, as the teams can pull from larger revenue resources and thus have more ability to pay a player a premium to secure their services. Vrooman (1996) also finds that player movement from small market to large market teams over time, especially from the start of 1990’s onward, has increased. Teams in large markets are taking advantage of their extra revenues and securing talent that, theoretically, will allow them to field the most competitive team in the league. The insignificance for pitchers may be attributed to a perceived premium paid for pitching by teams of all market sizes. It could also indicate that while large market teams tend to spend more on salaries for Busser 36 position players, there is a smaller gap in the spending on pitchers between those teams and their smaller market brethren. For batters, three of the five equation specific performance variables are found to have a significant effect on Log Salary: RUNS, RBI, and BA. The two that were found to not have a significant effect on Log Salary, ONBASE and HSOBB, are found to have especially low significance percentages. These findings are in contrast to Lackritz’s (1990). I expected these two variables to have a significant effect on salary differences, as they both emphasize rates of having positive outcomes from an at-bat, which theoretically gives a team a better chance to score runs and thereby win more games. The coefficients on RUNS, RBI, and BA are significant, in line with the findings of Lackritz (1990) and Marburger (1994). The coefficients on RUNS and RBI have the expected sign and are in line with previous research, but the batting average coefficient is nonsensical. Having the dependent variable decrease for an increase in the value makes no sense in terms of the theory or the actual data. The signs and sizes of the coefficients of HSOBB as well as those of the three significant variables are in line with expectations. For pitchers, the three equation specific significant coefficients are those of variables IP, PSOBB, and WP. However the coefficients of ERA and BAA are found to be insignificant. In the case of BAA, the variable was found to be incredibly bad in predicting the dependent variable value. Recall that BAA seemingly has some theoretical connection with performance, as a pitcher who has a higher BAA value allows an opposing team more opportunities to score runs. The sign of the coefficient, for what it’s worth, is negative as expected. The insignificance of the coefficient of ERA is a bit surprising, as it has been found to be significant for Lackritz (1990) and for Marburger (1994). The sign of the coefficient is as expected. For the three significant coefficients, the sign and size of the coefficient for WP is troubling, as it differs from the expected sign. Seeing a decline in pay differnces for a higher winning percentage counters the theory that players who win more games are more valuable. Teams may not pay much mind to winning percentage when making hiring decisions, or may penalize some players who have a higher percentage of wins, possibly Busser 37 hedging against some decline in form. For IP, the results are similar to those found by Marburger (1994). For PSOBB, the results are similar to those found by Lackritz (1990). In the case of IP, the size and sign of the estimated coefficient is consistent with t Marburger (1994) findings. As for the coefficient on PSOBB, the sign differed from that of Lackritz (1990), although the theory supports the outcomes here as opposed to those found in previous works. If a pitcher records more strikeouts than walks, especially in a higher ratio, the pitcher will create more positive outcomes for his team. In theory, this would command a higher salary, as a team could benefit from a higher percentage of these positive outcomes, and would be willing to pay a pitcher a higher price to ensure that they are on the beneficial end of that pitcher’s performance. Conclusion While the models presented in this paper have their faults, and areas where the models could be improved upon do exist, there are some positive outcomes that I can draw from the findings. The findings show that there are some performance and non-performance variables identified in previous empirical works that are significant in determining salary for Major League Baseball players in the contemporary market. Given that much of the work completed by similar research used datasets of players from around two decades before the observations used here, finding out whether the past research has some ability to stand the test of time and changes in the labor market was one of the objectives in developing the models presented in this paper. Indeed, given the fact that a few performance and non-performance variables were viewed as significant leads to inferences that certain factors are universal in salary determination. For hitters, the batting average, the number of runs scored, and the number of runs produced variables all have a direct effect on the outcomes of games, and the significance of these is not questioned based on the results. For pitchers, the number of innings pitched, strikeout to walk ratio, and winning percentage are not as clearly defined measures of performance, but are nonetheless indicators of a successful (or unsuccessful) pitcher. The finding of experience being a significant factor for both pitchers and batters Busser 38 confirms that the market does have some sort of rent seeking behavior - a system in which players are paid more as they remain in the league for longer periods. The existence of the free agent structure and the arbitration systems also are shown to have significance in the setting of salaries for players, having a large positive effect on the salary of a player eligible for one or both of those mechanisms. Finding that these market systems have significance on how player salaries change can allow us to further explore how they affect the labor market in baseball – specifically, how changes in how these systems work to affect player movement and wage negotiations in the league. As Major League Baseball is a multi-billion dollar enterprise, drawing fans, players, and revenue from sources domestically and internationally, the salary paid to players have a large impact on the game. The league is under some pressure to offer a product that will continue drawing fans into ballparks or to pay for access to watch or listen to games, and the labor market implications of this study have an effect on how teams view this labor and wage / revenue tradeoff. If wages rise too quickly, outpacing revenues, teams may end up in poor financial situations, which could hurt the overall fiscal health of the league. Overpaying or underpaying players may result in the wage rate for all players to rise or fall, and could affect the level of talent willing to play in the league, and where those players would play. As shown in this study, market size at the large end does affect salaries – batters who play for teams in large markets have higher salary differentials than their counterparts in smaller markets. The struggle for small market teams to keep up with rising slaries is becoming more and more visible, especially in times where revenue streams are not increasing in cities like Cleveland, Pittsburgh, or Denver as fast as larger markets like New York, Los Angeles, or Philadelphia (if at all.) Seeing market size factors combining with other mechanisms like free agency could create another concern: smaller market teams with players becoming eligible for arbitration or free agency being unable to pay the salary increases needed to retain the players. Player retention may slowly become a concern, leading smaller market teams to theoretically have Busser 39 to stock rosters with younger, generally inferior players to put a team on the field, while larger market teams that can afford the higher salaries will have generally better talented players. Because there are significant salary rises for players who are eligible to move into free agency, arbitration, or can play in a larger market, the question of salary controls becomes relevant. As mentioned at the outset, Major League Baseball lacks a salary cap, contract value caps, or other salary controls at the higher end. There is a luxury tax, but teams are exceeding that on an annual basis, and the number of teams that cross or come close to crossing this threshold is increasing annually. Does the league need to institute controls now to head off a salary crisis over the long term? Given the size of the increases for salaries found in this study for market mechanism variables, one could certainly make that argument. The effectiveness of such controls would depend on where they were applied. A salary cap might work, but it would cause havoc in the short and medium term depending on the amount. Contract limitations like maximum annual salary or maximum pay raise scaling (in which a cap is placed on the amount the base salary can increase from one year to the next) could work to control costs in the long term, but could lead to more players being offered the maximum contract rather than a lower contract value that they might actually “deserve.” Given that the models presented here leave out some factors, such as the aforementioned player representation idea (in which it could be theorized that profession sports agents or players who act as their own agents have some effect on their pay) or evaluating a individual player’s merchandising revenue, being able to add some of these factors to the models could help to further explain the changes in salary for players. Considering this paper only considers one year of player data, as do many analyses of this type, multiple years worth of observations could be used to determine whether significant variables in one year are significant over time, or if they are only significant in certain years. The possibility that insignificant variables from this dataset are significant in other years also exists. Future research in this area should focus on the effects on salaries and the labor market of some of the aforementioned policy changes that could be brought to the labor market and project whether some of Busser 40 these changes could control salaries in the league over time. Further identifications of factors that could affect changes in salaries can help to explain changes over time. A number of ideas could play into salary determination externally, and these non-performance variables may help to create a more explanative model. Ultimately, the models presented here could be expanded upon through future research, to offer some further observations on what factors are significant in the current state of the Major League Baseball labor market. Busser 41 Works Referenced Bollinger, Christopher, and Julie Hotchkiss. "The Upside Potential of Hiring Risky Workers: Evidence from the Baseball Industry." Journal of Labor Economics 21.4(2003): 923-44. Faurot, David, and Stephen McAllister. "Salary Arbitration and Pre-Arbitration Negotiation in Major League Baseball." Industrial and Labor Relations Review 45.4(1992): 697-710. Hakes, Jahn, and Raymond Sauer. "An Economic Evaluation of the Moneyball Hypothesis." The Journal of Economic Perspectives 20.3(2006): 173-86. Halcoussis, Dennis. Understanding Econometrics. 1st. USA: Thompson South-Western, 2005. Hoaglin, David, and Paul Velleman. "A Critical Look at Some Analyses of Major League Baseball Salaries." The American Statistician 49.3(1995): 277-85. Hylan, Timothy, Maureen J. Lage, and Michael Treglia. “The Coase Theorem, Free Agency, and Major League Baseball: A Panel Study of Pitcher Mobility from 1961 to 1992.” Southern Economic Journal 62.4(1996): 1029-42. Lackritz, James. "Salary Evaluation for Professional Baseball Players." The American Statistician 44.1(1990): 4-8. Marburger, Daniel. "Bargaining Power and the Structure of Salaries in Major League Baseball." Managerial and Decision Economics 15.5(1994): 433-41. Sommers, Paul, and Noel Quinton. "Pay and Performance in Major League Baseball: The Case of the First Family of Free Agents." The Journal of Human Resources 17.3(1982): 426-36. Stevens, Carl. "Is Compulsory Arbitration Compatible with Bargaining?." Industrial Relations 5(1966): 3852. Busser 42 Vrooman, John. "A Unified Theory of Capital and Labor Markets in Major League Baseball." Southern Economic Journal 63.3(1997): 594-619. Vrooman, John. "The Baseball Players' Labor Market Reconsidered." Southern Economic Journal 63.2(1996): 339-360. Dataset information derived from data at the following sources: http://www.mlb.com/mlb/stats, http://www.retrosheet.org, http;//www.baseball-reference.com, http://espn.go.com/mlb/stats