Baseball fundamentals

advertisement
Baseball Fundamentals:
Pitching, Hitting, Running and Fielding Your Way to
Success
Statistics & Data Analysis
Data Analysis Project
Professor Jeffrey Simonoff
Overview of Analysis
Baseball is a “simple” game: you pitch, field, and hit and run. For our project, we evaluate how each of those
components contributes to the success of a team. Observations are taken for all Major League Baseball teams.
The 6 variables analyzed for each team are:
Hitting

On Base Percentage (OBP) – This is a measure of a batter’s contribution to his team’s offense by the rate
at which he reaches base.

Slugging Percentage (SLG) – This is a measure of a batter’s contribution to his team’s offense by
considering the total number of base per at bat for a hitter.
Running

Stolen Bases (SB) – This is a measure of teams’ ability to utilize its speed and base-running skill for
offensive gain.
Fielding

Fielding Percentage (FP) – The number of fielding chances handled without an error. High fielding
percentages indicate quality fielding and throwing.
Pitching

K/9IP – The average number of strike outs per standard 9-inning game. This statistic is one of the ways
to evaluate a team’s pitching and is commonly known as a measurement “power pitching”, as strikeouts
are indicative of a pitcher’s ability to overpower a batter.

Walks Plus Hits Per Inning Pitched (WHIP) – This statistic is useful when evaluating the effectiveness of
a team’s pitching. It indicates how successful the pitchers are keeping the opposing batter off base.
This analysis evaluates how these 6 variables effects a team’s winning percentage. These 6 variables have been
chosen because they best represent the above mentioned fundamental aspect of the game. In theory, a team that
can do the fundamentals of baseball the best will win the most games. We can see which aspect will have the
most effect on a team’s ability to win.
Motivations for Analysis
This analysis is of interest because it considers the various aspects of the game and evaluates the impact each has
on the overall performance of a team. Each team is effectively a company, competing in the industry of baseball.
Ideally, each team determines and pursues a strategy that maximizes its resources (e.g. financial support) and
capabilities (e.g. scouting and player development). By analyzing data for specific teams, our objective is to
understand the aspects of the game that winning teams have emphasized.
1 of 21
Based upon this historical data, we will see which areas of the game at which winning teams have excelled. We
expect this analysis to provide insight into winning strategies, which would enable us to forecast the winning
percentage of future teams based upon basic measures of their hitting, running, fielding and pitching performance.
Overview of Data
The data analyzed in this project covers five years worth of season statistics for all 30 team in Major League
Baseball. All of our data variables are numerical. Winning percentage, our response variable, and several
predictors, including OBP, SLG, K/9IP, WHIP and fielding percentage are continuous variables. Stolen bases are
a discrete numerical variable. All data was obtained from www.ESPN.com.
In determining what data to use, we wanted to select statistical measures that covered the traditional “5 tools” of
baseball: hitting for average, hitting for power, speed, fielding and throwing. These tools are the skills for which
individual players have been traditionally evaluated. Although our analysis looks at the overall team rather than
the individual player, we will use the traditional list of statistical measures to provide a view of team performance.
For example, to gain insight into batting for average and batting for power we used OBP and SLG batting
statistic. Additionally, we used fielding percentage as a measure of both fielding and throwing for non-pitchers.
While these statistical values are useful in measuring teams’ performance in batting, running, fielding and
pitching, there are some limiting factors to their ability to tell the whole story:

Fielding performance is a factor of both ability to execute a play without making an error and a player’s
ability to reach a ball put in play, commonly known as “range”. While fielding percentage does not account
for a range factor, range is a very subjective statistic. Teams comprised of players with exceptional range
may impact the game by taking away hits that otherwise may have occurred. The downstream impact of
teams with exceptional range could therefore be measured in WHIP. Additionally, fielders with limited range
make no attempt on a play on which fielders with exceptional range could attempt and make an error.
Therefore, since the impact of range may vary, it has been deemed acceptable to exclude.

A team’s proficiency at executing its running game is measured in many statistical and non-statistical ways.
In addition to stolen bases, the percentage of successful stolen base attempts, and the ability to take an
additional base on a play are a key components. The ability for a team to take the extra base or break up a
double play are good indications of a team’s ability to use good base running to its advantage. However, these
plays are not recorded in any statistical numbers.
2 of 21
First Look at the Data:
Descriptive Statistic:
SE
Mean
Median Mean
StDev
0.33388
0.333 0.000983 0.01205
0.42446
0.423
0.0019 0.02329
89.42
85
2.49
30.53
0.98311
0.983 0.000206 0.00253
1.3938
1.39
0.00699
0.0856
6.5297
6.44
0.0481
0.5887
0.49633
0.5015
0.00573 0.07023
Variable
OBP
SLG
SB
Fielding %
WHIP
K/9
Winning %
N
Variable
OBP
SLG
SB
Fielding %
WHIP
K/9
Winning %
Minimum Q1
Q3
Maximum
0.3
0.325
0.342
0.366
0.368 0.40775 0.44325
0.491
31
66
109
200
0.977
0.981
0.985
0.989
1.22
1.33
1.45
1.62
5.41
6.1175
6.8625
8.68
0.264
0.438
0.549
0.644
150
150
150
150
150
150
150
The initial analysis of our data highlights no unusually distributions. For the most part the mean and the median of
each variable are substantially similar, which points to a normal distribution. We then attempted to verify our
findings by plotting a histogram for each of our variables. Looking at the histogram for the K/9 variable, we saw a
slight right tail. However, after we took the log of K/9, there was no significant improvement.
Histogram of K/9
Histogram of Log K/9
25
25
20
15
Frequency
Frequency
20
10
5
0
15
10
5
5.4
6.0
6.6
7.2
7.8
0
8.4
K/9
3 of 21
0.75
0.78
0.81
0.84
Log K/9
0.87
0.90
0.93
Next we graphed our response (winning %) in a box plot to highlight any outlying data points. The only outlier
that was observed was the winning % of the 2003 Detroit Tigers, which was one of top 10 lowest winning % in
baseball history. We will take this into consideration as we evaluate the quality of our model.
Boxplot of Winning %
0.7
Winning %
0.6
0.5
0.4
0.3
We then looked at the fitted line plot of each variable against the winning % to get a better understanding of the
relationship of each predictor and the response. This plots isolates each predictor and doesn’t take into account
the combined effect of all variables on the response.
Fitted Line Plot
Winning % = - 0.6627 + 3.471 OBP
0.7
S
R-Sq
R-Sq(adj)
LA Dodgers ‘03
Winning %
0.6
0.5
0.4
0.3
Detroit ‘03
0.30
0.31
0.32
0.33
0.34
OBP
4 of 21
0.35
0.36
0.37
0.0566129
35.4%
35.0%
Fitted Line Plot
Winning % = - 0.1656 + 1.559 SLG
0.7
LA Dodgers ‘03
S
R-Sq
R-Sq(adj)
0.0603062
26.8%
26.3%
S
R-Sq
R-Sq(adj)
0.0703042
0.5%
0.0%
Winning %
0.6
0.5
0.4
0.3
Detroit ‘03
0.350
0.375
0.400
0.425
SLG
0.450
0.475
0.500
Fitted Line Plot
Winning % = 0.4825 + 0.000155 SB
0.7
0.5
0.4
0.3
NY Mets ‘03
50
100
150
200
SB
Fitted Line Plot
Winning % = - 11.90 + 12.61 Fielding %
0.7
S
R-Sq
R-Sq(adj)
0.6
Winning %
Winning %
0.6
0.5
0.4
0.3
0.976
Detroit ‘03
0.978
0.980
0.982
0.984
Fielding %
5 of 21
0.986
0.988
0.990
0.0628067
20.6%
20.0%
Fitted Line Plot
Winning % = 1.297 - 0.5744 WHIP
0.7
S
R-Sq
R-Sq(adj)
0.0502820
49.1%
48.7%
Winning %
0.6
0.5
0.4
0.3
Detroit ‘03
1.2
1.3
1.4
WHIP
1.5
1.6
Fitted Line Plot
Winning % = 0.2704 + 0.03460 K/9
0.7
S
R-Sq
R-Sq(adj)
0.0674335
8.4%
7.8%
Winning %
0.6
0.5
0.4
Arizona ‘04
0.3
Detroit ‘03
5
6
7
K/9
8
9
From these plots we do not see an overwhelming strong correlation between any individual variable and the
team’s winning, as each R-Sq is below 50%. The variables with the highest R-Sq are WHIP and OBP and the
variable with the lowest R-Sq is SB. Although each individual variable doesn’t show significant correlation to the
team’s winning %, this is not surprising since a team’s success is dependent on execution of all the fundamentals
of the game of baseball.
There appear to be potential outliers and/or leverage points identified in the fitted line plots above; however the
impact of these outliers will be further evaluated after analyzing the best subsets regression.
6 of 21
Preliminary Multiple Regression Model:
Regression Analysis: Winning % versus OBP, SLG, ...
The regression equation is
Winning % = - 3.02 + 1.67 OBP + 0.891 SLG + 0.000098 SB + 3.35 Fielding % - 0.510 WHIP - 0.00089 K/9
Predictor
Coef
SE Coef
T
P
Constant
-3.020
1.107
-2.73
0.007
OBP
1.6664
0.3117
5.35
0.000
SLG
0.8914
0.1569
5.68
0.000
0.00009752
0.00008467
1.15
0.251
SB
Fielding %
3.345
1.123
2.98
0.003
WHIP
-0.50953
0.03450
-14.77
0.000
K/9
-0.00089
0.004759
-0.19
0.851
S = 0.0309459 R-Sq = 81.4% R-Sq(adj) = 80.6%
Analysis of Variance
Source
Regression
DF
SS
MS
F
P
104.06
0.000
6
0.597891
0.099649
Residual Error
143
0.136944
0.000958
Total
149
0.734835
The multi-variable model highlights the importance of considering several fundamentals as it now accounts for
approximately 81% of the variability in team winning percentage. As expected, increases in OBP, SLG and
Fielding % are associated with higher winning percentages. Holding all other variables constant, the model
indicates that a team which gives up one additional hit or walk per game (an increase in WHIP of 0.1111) can be
expected to have a winning percentage that is decreased by 0.057, or nearly one standard deviation from the
mean. This result underscores the baseball adage that “pitching wins games.” On the contrary, our model
reveals that the impact of stolen bases on team winning percentage is negligible. Even when comparing the range
(169), or the difference between the team with the most stolen bases and the team with the fewest, the predicted
difference in winning percentage is only .017 (169 x 0.000098). This is further verified by the high P value for
stolen bases of 0.251, which is indicative of insufficient evidence to reject the null hypothesis that stolen bases are
unrelated to team winning percentage.
One point of interest in the model is that when comparing two teams with identical statistics other than K/9, the
model predicts that the team with fewer K/9 will actually have a slightly higher winning percentage. However,
7 of 21
K/9 appears to have a minimal impact on team winning percentage. The difference between the teams with the
highest and lowest K/9, results in a predicted difference in winning percentage of only .003 (3.27 x 0.00089).
This is again further verified by the extremely high P value for K/9 of 0.851, which is indicative of insufficient
evidence to reject the null hypothesis that k/9 are unrelated to team winning percentage. The P value results for
stolen bases and K/9 indicate that the inclusion of these variables in our model does not add value to its predictive
power. This will be further analyzed in the “Model Improvement” section below.
The standard error of the estimate of approximately 0.031 implies the model can predict winning percentage to
within ±0.062 (2 x 0.031) about 95% of the time. To put this further into perspective, over the course of a 162
game season, this translates into an error of the estimate of approximately ±10 wins (±0.062 x 162).
Checking Assumptions
In order to evaluate the validity of the model assumptions, we must analyze the model errors through the use of
several residual plots. We will begin with the plot of residuals versus the fitted values as well as residuals versus
each predictor. These plots will be evaluated to identify any structure which may indicate that the model
assumptions are invalid.
Residuals Versus the Fitted Values
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.30
0.35
0.40
0.45
0.50
Fitted Value
8 of 21
0.55
0.60
0.65
Residuals Versus OBP
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.30
0.31
0.32
0.33
OBP
0.34
0.35
0.36
0.37
Residuals Versus SLG
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.350
0.375
0.400
0.425
SLG
9 of 21
0.450
0.475
0.500
Residuals Versus SB
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
50
100
150
200
SB
Residuals Versus Fielding %
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.976
0.978
0.980
0.982
0.984
Fielding %
10 of 21
0.986
0.988
0.990
Residuals Versus WHIP
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
1.2
1.3
1.4
WHIP
1.5
1.6
Residuals Versus K/9
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
5
6
7
K/9
8
9
The above plots reveal no apparent structure of the residuals, indicating our assumptions regarding distributions
of errors is correct. That is, there are no well-defined subgroups and the variance of the errors is distributed in a
homoscedastic pattern.
Next we evaluate the normal probability plot of the residuals to ensure errors are normally distributed.
11 of 21
Normal Probability Plot of the Residuals
(response is Winning %)
99.9
99
Percent
95
90
80
70
60
50
40
30
20
10
5
1
0.1
-0.10
-0.05
0.00
Residual
0.05
0.10
This plot indicates that the residuals roughly follow a normal distribution. As a further step to ensure our
assumptions hold, we will run a time-series plot of residuals, which will indicate any auto-correlation of results
across seasons.
Residuals Versus the Order of the Data
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
1
10
20
30
40
50
60 70 80 90 100 110 120 130 140 150
Observation Order
Our data was ordered from 2007 down to 2003, with 30 observations from each season. The time-series plot of
residuals does not reveal any apparent patterns to indicate auto-correlation.
Model Improvement
12 of 21
Our multiple-regression model provided significantly greater ability to determine the winning percentage of a
baseball team, than any single regression model with an individual predictor. However, we will now evaluate
opportunities to improve upon our model.
Simplifying the Model
As previously indicated, we believe that stolen bases and K/9 are the weakest predictors of winning percentage.
We will evaluate the best subset regression to determine which combination of predictors provides the strongest
ability to predict winning percentage.
Best Subsets Regression: Winning % versus OBP, SLG, ...
Response is Winning %
Vars
1
1
2
2
3
3
4
4
5
5
6
R-Sq
49.1
35.4
76.3
74.8
80.1
77.3
81.2
80.2
81.4
81.2
81.4
R-Sq(adj)
48.7
35.0
76.0
74.5
79.7
76.9
80.7
79.6
80.7
80.5
80.6
Mallows
C-p
244.7
349.3
37.7
49.1
10.8
31.8
4.3
12.0
5.0
6.3
7.0
S
0.050282
0.056613
0.034401
0.035467
0.031661
0.033765
0.030874
0.031683
0.030842
0.030981
0.030946
W
O S
H K
B L S F I /
P G B % P 9
X
X
X
X
X
X
X X
X
X
X X
X X
X X
X X X
X
X X X X X
X X
X X X
X X X X X X
These results seem to support our initial conclusion that stolen bases and K/9 have negligible impact on the
model’s ability to predict team winning percentage. By eliminating these two variables from the model, we
reduce the complexity while improving our ability to predict winning percentage, as noted by the slight increase
in adjusted R2. While the model only eliminating K/9 provides slightly higher R2 and lower standard error of the
estimate, the benefits (R2 increased by 0.2 and S decreased by 0.000032) are nearly inconsequential compared to
the simplicity of modeling based upon fewer variables.
The output of this simplified model is shown below. As expected the model has produced a slightly lower R2 of
81.2 with a standard error of estimate of 0.030874. We also see slight changes to the coefficients for our
remaining variables. SLG and Fielding % each dropped slightly, while OBP and WHIP increased slightly.
13 of 21
Regression Analysis: Winning % versus OBP, SLG, Fielding , WHIP
The regression equation is
Winning % = - 2.93 + 1.69 OBP + 0.876 SLG + 3.26 Fielding % - 0.510 WHIP
Predictor
Coef
SE Coef
T
P
Constant
-2.932
1.096
-2.68
0.008
OBP
1.6877
0.3101
5.44
0.000
SLG
0.8763
0.1552
5.64
0.000
Fielding %
WHIP
3.26
1.116
2.92
0.004
-0.51041
0.03172
-16.09
0.000
S = 0.0308740 R-Sq = 81.2% R-Sq(adj) = 80.7%
Analysis of Variance
Source
DF
SS
MS
F
P
4
0.59662
0.14916
156.48
0.000
Residual Error
145
0.13821
0.00095
Total
149
0.73483
Regression
14 of 21
Residuals Versus the Fitted Values
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.30
0.35
0.40
0.45
0.50
Fitted Value
0.55
0.60
0.65
Normal Probability Plot of the Residuals
(response is Winning %)
99.9
99
Percent
95
90
80
70
60
50
40
30
20
10
5
1
0.1
-0.10
-0.05
0.00
Residual
0.05
0.10
We will now return to the outliers and leverage points we have previously identified. The outliers are the
following:

2007 New York Mets and 2004 Arizona Diamondbacks – These two teams were identified as outliers
in the Stolen Base and K/9 fitted line plots respectively. Since these two variables have been removed
from our simplified model and they were not identified as outliers for any of the other variables, we can
presume that they no longer act as unusual observations.
15 of 21

2003 Detroit Tigers – With their .264 winning percentage, the 2003 Detroit tigers were one of the worst
teams of all time. While their predictive indicators were consistently below the mean, the team’s
performance fell far short of their expected winning percentage of .319. As such an extreme data point,
we will remove it from our analysis.

2003 Los Angeles Dodgers – The Dodgers’ actual winning percentage of .521 significantly exceeded
their expected winning percentage of .472. With a WHIP approximately 2 standard deviations below the
mean, the Dodgers compensated for their relatively pedestrian offensive and fielding statistics. As such
we will exclude them from our analysis.
We will next remove these outliers from our regression analysis to determine their impact on our regression. We
will first revisit the best subset regression to determine if the model is still superior.
Best Subsets Regression: Winning % versus OBP, SLG, ...
Response is Winning %
Vars
1
1
2
2
3
3
4
4
5
5
6
R-Sq
49.7
32.7
74.4
73.7
78.7
75.8
79.7
78.8
79.8
79.7
79.9
R-Sq(adj)
49.3
32.3
74.0
73.3
78.2
75.3
79.1
78.2
79.1
79.0
79.0
Mallows
C-p
205.7
322.8
37.0
41.8
9.3
29.3
4.3
10.7
5.2
6.2
7.0
S
0.047430
0.054838
0.033960
0.034420
0.031093
0.033130
0.030445
0.031139
0.030440
0.030543
0.030523
W
O S
H K
B L S F I /
P G B % P 9
X
X
X
X
X
X
X X
X
X
X X
X X
X X
X X X
X
X X X X X
X X
X X X
X X X X X X
Once again it appears that the simplified model based upon OBP, SLG, Fielding % and WHIP is the superior
model. Running the full regression yields the following results:
Regression Analysis: Winning % versus OBP, SLG, Fielding , WHIP
The regression equation is
Winning % = - 2.67 + 1.62 OBP + 0.878 SLG + 2.99 Fielding % - 0.496 WHIP
Predictor
Constant
OBP
SLG
Fielding %
WHIP
Coef
-2.670
1.6215
0.8783
2.994
-0.49588
S = 0.0304448
SE Coef
1.109
0.3112
0.1536
1.123
0.03204
R-Sq = 79.7%
T
-2.41
5.21
5.72
2.67
-15.47
P
0.017
0.000
0.000
0.009
0.000
R-Sq(adj) = 79.1%
Analysis of Variance
16 of 21
Source
Regression
Residual Error
Total
DF
4
141
145
SS
0.51296
0.13069
0.64365
MS
0.12824
0.00093
F
138.36
P
0.000
The removal of these data points has resulted in a slight decrease of R2 to approximately 79.7. Additionally, the
standard error of estimate has reduced slightly to approximately 0.0305, which implies the model can predict
winning percentage to within ±0.0610 (2 x 0.030) about 95% of the time. To put this further into perspective,
over the course of a 162 game season, this translates into an error of the estimate of approximately ±9.88 wins
(±0.061 x 162), thus tightening our confidence interval by ± .15 games.
Our predictor coefficients have decreased slightly too. Holding all other variables constant, the model indicates
that a team which gives up one additional hit or walk per game (an increase in WHIP of 0.1111) can be expected
to have a winning percentage that is decreased by 0.055, translates into nearly 9 fewer wins over the course of a
162 game season. To impact a teams winning percentage by the same amount, SLG, OBP and Fielding % would
have to decrease by approximately -0.034, -0.064 and -0.019 respectively. The very low P values for each
predictor as well as the overall regression allow us to reject the null hypothesis that the predictors are unrelated to
the response.
We must also recheck our assumptions to ensure they still hold. We can analyze the model errors through the use
of several residual plots below. The plots of residuals versus the fitted values as well as residuals versus each
predictor still do not appear to identify any structures to indicate that the model assumptions are invalid.
Residuals Versus the Fitted Values
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.35
0.40
0.45
0.50
Fitted Value
17 of 21
0.55
0.60
0.65
Residuals Versus SLG
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.36
0.38
0.40
0.42
0.44
0.46
0.48
0.50
SLG
Residuals Versus OBP
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.31
0.32
0.33
0.34
OBP
18 of 21
0.35
0.36
0.37
Residuals Versus WHIP
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
1.2
1.3
1.4
WHIP
1.5
1.6
Residuals Versus Fielding %
(response is Winning %)
0.10
Residual
0.05
0.00
-0.05
0.976
0.978
0.980
0.982
0.984
Fielding %
0.986
0.988
0.990
The elimination of several outliers is evident in this new normal probability plot, as the data points now appear to
follow a slightly more normal distribution.
19 of 21
Normal Probability Plot of the Residuals
(response is Winning %)
99.9
99
Percent
95
90
80
70
60
50
40
30
20
10
5
1
0.1
-0.10
-0.05
0.00
Residual
0.05
0.10
Conclusion
This analysis evaluated how the basic fundamental aspects of the game – pitching, fielding, and hitting and
running – impact a team’s ability to win. Our theory hypothesized that teams that can successfully perform all
these fundamentals will achieve a positive winning percentage. The results of our analysis revealed helpful
insight into the relative significance of each of these aspects, and in some cases identified preferred statistics to
measure performance in the fundamentals.
Our analysis revealed that team performance in pitching, hitting and fielding is highly correlated to winning
percentage. For pitching and hitting, we analyzed two performance measures each: WHIP and K/9IP, and OBP
and SLG respectively. Per our analysis, WHIP, a measure of a team’s ability to prevent batters from reaching
base, is a far more highly correlated to team winning percentage than K/9, a power-pitching measurement. For
hitting, our analysis indicated that both OBP and SLG are positively correlated to team winning percentage.
Overall, WHIP, OBP and SLG, and fielding percentage have relatively strong predictive ability for winning
percentage. A team with a high OBP and SLG will usually score more runs which lead to wins. A team with
high fielding percentage and low WHIP will usually give up fewer runs. From the regression, WHIP is the
strongest predictor of winning percentage. Our model seems to prove the saying that “good pitching will always
beat good hitting.”
Running best subsets regression analysis revealed that our regression model could be simplified by removing SB
and K/9, which each have very little predictive ability for winning percentage. The insignificant of SB can be
20 of 21
explained by two factors – the relative insignificance of stolen bases to the modern game of baseball and the
imperfection of stolen bases as a measure of running proficiency, as discussed in the Overview of Data. As for
K/9, a pitcher doesn’t have to strike out a lot of batters to be successful. A wild pitcher might have a high K/9 but
can also surrender many walks and runs.
The next question should be, “what should we do with this finding?” Over the past decade the science of
analyzing baseball through objective evidence, called “sabermetrics” has evolved and reached significant
prominence. General Managers (GMs) of baseball teams use data such as ours to understand the relevance of
fundamentals as well as their key measurements when building their team. Likewise, fans can use performance
metrics to evaluate the strength of management decisions in “upgrading” their favorite team for the upcoming
season.
Our regression analysis suggests that GMs and fans should emphasize quality pitching, though not necessarily
“power pitching”, before focusing on a balanced hitting attack that delivers both consistent base-runners (as
measured by OBP) and power (as measured by SLG). Additionally, solid defense is important to keep opposing
runners off the base-paths. While a team should put less emphasis on stolen bases, the running game should not be
ignored. The ability for a team to take the extra base or break up a double play are good indications of a team’s
ability to use good base running to its advantage, but they have not been specifically accounted for in our analysis
as these plays are not recorded in any statistical numbers. These intricacies make the game fun and hard to
predict.
21 of 21
Download