Offensive Strategies in Baseball A Discrete Event Simulation Project Kirby Hunt

advertisement
Offensive Strategies in Baseball
A Discrete Event Simulation Project
Kirby Hunt
DSES6620-Prof Gutierrez-Miravete
10/5/02
Table of Contents
Introduction ...................................................................................................................................... 2
Objective and Scope ........................................................................................................................ 2
Collection and Analysis of Data ....................................................................................................... 3
Construction of the Model ................................................................................................................ 6
Model Verification and Validation ..................................................................................................... 7
Experimentation and Results ........................................................................................................... 8
Conclusions ................................................................................................................................... 10
Appendix ........................................................................................................................................ 11
Introduction
The strategy of the game of Baseball is a matter of intense interest to fans and
coaches at all levels of the game. This project investigates the dynamics of a baseball
team's offensive strategy using discrete event simulation. This problem is ideally suited to
discrete event simulation as it involves dynamic stochastic processes which influence
each other in ways that are difficult, if not impossible, to predict with other modeling
techniques. This project sheds some light on the traditional strategy of batting order (the
order in which the players bat) by modeling the offense of a real major league baseball
team and then exploring the relative merits of several batting order strategies.
Objective and Scope
The objective and scope of this project were laid out in the project proposal
written before work began (See the file “proposal.doc” ). In brief, the main goal for this
project was for the author to gain experience in setting up a simulation that works
properly and represents the offensive side of the game of Baseball with acceptable
fidelity. This simulation would then be exercised to produce some useful insight into the
most productive strategies for playing the game. All of these objectives were
accomplished.
The scope of the project was limited to the investigation of the merits of various
strategies of batting order. For this reason, only the batting, basic base running, and
scoring aspects of the game were modeled. No defensive interaction (neither pitching nor
fielding) was modeled. Base stealing, bunting, and hit and run plays were also omitted.
Finally, pressure effects such as game winning or losing situations or batting with runners
in scoring position were ignored. Limiting the scope of the project in this way permitted
focus to be maintained on getting the basic offensive functionality of the model to work
correctly in the time allotted for the project.
Collection and Analysis of Data
The data needed for this project was obtained from the Major League Baseball
official website (www.mlb.com). It consists of batting statistics recorded for each player
in the league for the entire 2002 baseball season. The result of each at-bat for the year is
represented in this data so it is a good statistical sample (300-600 at-bats per player). The
batting numbers for one team, the Seattle Mariners (not chosen at random), are shown in
the following table.
Seattle Mariners Offensive Statistics Summary (2002 Regular Season)
I Suzuki
OF
Position
157
G - Games Played
647
AB - At Bats
111
R - Runs Scored
208
H - Hits
165
1B - Singles
27
2B - Doubles
8
3B - Triples
8
HR - Home Runs
51
RBI - Runs Batted In
275
TB - Total Bases
68
BB - Bases on Balls (Walks)
62
SO - Strikeouts
31
SB - Stolen Bases
15
CS - Caught Stealing
0.388
OBP - On-base Percentage
0.425
SLG - Slugging Percentage
0.321
AVG - Batting Average
5
SF - Sacrifice Flies
3
SH
5
HBP - Hit by Pitch
27
IBB - Intentional Walks
8
GIDP - Ground into Double Plays
728
TPA- Total Plate Appearances
4.6
Plate Appearances per Game
0.07
RBI Per Plate Appearance
2627
NP- Number of Pitches
43
XBH- Extra Base Hits
67.4
SB% - Stolen Base Percentage
238
GO - Ground Outs
140
AO- Fly Outs
1.76
GO/AO- Ground Outs/Fly Outs
0.813
OPS - On-base Plus Slugging Percentage
B Boone
2B
155
608
88
169
108
34
3
24
107
281
53
102
12
5
0.339
0.462
0.278
6
2
6
4
11
675
4.4
0.16
2502
61
70.6
204
134
1.6
0.801
J Olerud
1B
154
553
85
166
105
39
0
22
102
271
98
66
0
0
0.403
0.49
0.300
12
0
5
6
19
668
4.3
0.15
2633
61
0
155
169
1.03
0.893
M Cameron
OF
158
545
84
130
74
26
5
25
80
241
79
176
31
8
0.34
0.442
0.239
5
4
7
3
8
640
4.1
0.13
2612
56
79.5
100
141
0.77
0.782
J Cirillo
3B
146
485
51
121
95
20
0
6
54
159
31
67
8
4
0.301
0.328
0.249
9
13
9
0
12
547
3.7
0.10
1928
26
66.7
136
171
0.87
0.629
C Guillen
SS
134
475
73
124
85
24
6
9
56
187
46
91
4
5
0.326
0.394
0.261
3
3
1
4
8
528
3.9
0.11
2063
39
44.4
120
138
0.93
0.719
R Sierra
OF
122
419
47
113
77
23
0
13
60
175
31
66
4
0
0.319
0.418
0.27
2
0
0
5
17
452
3.7
0.13
1572
36
100
125
112
1.27
0.736
D Wilson
C
115
359
35
106
83
16
1
6
44
142
18
81
1
0
0.326
0.396
0.295
8
7
2
1
8
394
3.4
0.11
1442
23
100
88
99
0.97
0.721
M McLemore
OF
104
337
54
91
65
17
2
7
41
133
61
63
18
10
0.38
0.395
0.27
4
4
1
1
3
407
3.9
0.10
1642
26
64.3
92
96
0.99
0.774
D Relaford
SS
112
329
55
88
67
13
2
6
43
123
33
51
10
3
0.339
0.374
0.267
7
1
6
2
6
376
3.4
0.11
1337
21
76.9
88
108
0.87
0.713
In order to use these numbers in the discrete event simulation it was necessary to
categorize and tally the outcome for each at-bat in terms of a few possible outcomes.
Accordingly, each at-bat was put into one of the following categories:
1. Single
2. Double
3. Triple
4. Home Run
5. Walk or Hit by Pitch
6. On base on error
7. Strike Out
8. Fieldable Grounder
9. Pop Fly
The batting statistics above for each player then boiled down to the following
probabilities for each of the possible outcomes:
Seattle Mariners Hitting Probabilities (based on 2002 regular season statistics)
Single Probability
Double Probability
Triple Probability
Home Run Probability
Walk&HBP Probability
On Base on Error Probability
Strikeout Probability
Grounder Probability
Pop Fly Probability
I Suzuki
0.227
0.037
0.011
0.011
0.100
0.002
0.085
0.327
0.192
B Boone
0.160
0.050
0.004
0.036
0.087
0.001
0.151
0.302
0.199
J Olerud
0.157
0.058
0.000
0.033
0.154
0.000
0.099
0.232
0.253
M Cameron
0.116
0.041
0.008
0.039
0.134
0.003
0.275
0.156
0.220
J Cirillo
0.174
0.037
0.000
0.011
0.073
0.007
0.122
0.249
0.313
C Guillen
0.161
0.045
0.011
0.017
0.089
0.002
0.172
0.227
0.261
R Sierra
0.170
0.051
0.000
0.029
0.069
0.000
0.146
0.277
0.248
D Wilson M McLemore
0.211
0.160
0.041
0.042
0.003
0.005
0.015
0.017
0.051
0.152
0.006
0.004
0.206
0.155
0.223
0.226
0.251
0.236
D Relaford
0.178
0.035
0.005
0.016
0.104
0.001
0.136
0.234
0.287
Conceptual Model
Once the data had been gathered and analyzed, the conceptual model that would
form the framework or outline for the detailed model was generated. The conceptual
model for batting itself is as simple as generating a random variate based on the set of
probabilities just described that determines the outcome of each at-bat.
These probabilities were put directly into Pro-Model as "User Defined
Distributions" for each batter. They were then used to produce a random variate for each
at-bat during the simulation.
Although the batting order directly affects only the quality of the hits being
produced, it would not be sufficient to model the hitting alone to be able to evaluate
batting order strategies. This is because scoring runs is the final product of the batting
order and this is crucially linked to the relationship between base running and batting. In
other words, not only the number of hits, but also the timing of hits is important because
if there aren't runners in scoring position when a hit is made, then it may be meaningless.
Thus, a modeling scheme had to be generated for base running as well as hitting.
The conceptual model for base runners proved to be significantly more
complicated than for batters. The following rules include many simplifications but never
the less capture the essence of how base-runners behave in the real game based on the
outcome of the at-bat.
Type of Hit
A single base hit
Runner response
The batter gets to first and each base runner
advances two bases
A double base hit
The batter gets to second and all other base runners
score
A triple base hit
The batter gets to third and all other base runners
score
A home run
The batter and all base runners score
Walk or Hit by Pitch
The batter gets to first and any runner in a force
position advances one base (because only one
runner can occupy a base, any runner who must
advance to make room for another runner during a
play is in a force position)
On Base on Error
The batter gets to first and any runner in a force
position advances one base
Strike Out
The batter is out and no runners advance
A fieldable grounder
Any two lead runners in force situations, including
the batter are out
A pop fly
The batter is out and no runners advance except
any runner on third tags for home if there are less
than 3 outs
Construction of the Model
The model for this project was created with the student version of a commercially
available discrete event simulation software package called ProModel.
The simulation space consists of the 4 bases of a baseball field, a batter’s box, and
some other holding areas that have no physical parallel but are convenient for model
functionality. The defensive team is not modeled but defensive effects are reflected in the
offensive batting probabilities mentioned above.
Because of the limitation on the number of different kinds of entities imposed by
the software, generic batters and base-runners are used in the simulation. Logic in the
processing of the batters keeps track of the batting order by counting the batters and then
using the appropriate player’s hitting probability distribution in turn to determine the
outcome of each at-bat. The batter becomes a generic runner upon leaving the batter’s
box and all runners behave exactly the same. In this way, each player’s batting skill is
represented in the desired order for batting but only two entity types are used.
The outcome of the at-bat depends on a random variate generated from the
player's batting distribution. The random variate is an integer between one and nine
corresponding to the possible at-bat results described above. As each at-bat occurs, the
random variate is generated and then assigned to a variable called “BatResult”. This
variable is then used in various places in decision logic to control the flow of the players.
A successful at-bat (1-6) will result in the batter ending up on base while a strike-out,
grounder, or pop-fly (7,8,9) will result in an out(s).
As each base-runner arrives at a base, the statement, “wait until BatResult > 0” is
used near the top of the processing logic to delay processing of the runner until another
batter has batted and the outcome of the at-bat is known. When the batter does hit, “if…
then” statements in the base’s processing logic routes the base-runners according to the
BatResult variable. These “if … then” statements enforce the base-running rules
described in the conceptual model above.
Timing is controlled via a downtime at the batter’s box location. This location
experiences a downtime immediately after each at-bat. When the downtime is over, the
location sets the BatResult back to zero and all base runners are stopped at the next base
by the “wait until BatResult > 0” statements. The down time is scheduled according to the
outcome of the at-bat to allow runners to advance the appropriate number of bases. If a
double is hit, for example, the batter’s box is still down when the batter reaches first base
so BatResult is still equal to two and the base processes the batter and sends him on to
second. Once all the base-runners have settled, the next batter bats and his fate and the
fate of the base runners are once again determined by the at-bat outcome. Any runs scored
are tallied and, after three outs, the bases are cleared and a new inning starts. The model
keeps track of innings played and terminates after nine.
There may be a better way to construct the base running routing and logic but part
of the complexity was driven by the desire to send each runner to each base in sequence
for the sake of the animation. It would have been easier to route all runners directly to
their final location after each at-bat but the animation would have looked wrong if
runners were skipping bases. In any case, for all its complexity, the model worked
correctly as you will see in the next section.
Model Verification and Validation
The model was verified and validated to insure good results. The first verification
checks involved simply watching the animation of the simulation and checking the
model parameters. Parameter displays were added to the animation to facilitate this
process. This allowed the verification that hitters were arriving to bat in the right order,
the outcome of their at-bat was varying randomly, and the hitters and base-runners were
being routed correctly. This also facilitated the verification that innings were ending after
three outs and the score was being tallied correctly. Because the model involved complex
synchronization of events, the trace facility proved to be invaluable during some debugging involving the timing of code execution at multiple locations.
The final validation checks of the simulation involved comparing 2002 season
statistics from the real players and the team as a whole with results for the simulation. For
example, during 157 games in real life, Ichiro Suzuki made 728 plate appearances.
During 157 replicated games, the simulation recorded 743 plate appearances for Ichiro.
With a 95% confidence band of 726-743 appearances, it is clear that the simulation is
closely matching reality in this respect. This indicates that the simulation is correctly
capturing the real life dynamics of the game in general (not just for this one batter)
because the number of plate appearances depends on the number of outs vs. hits for all
players. The more hits per inning, the more batters will bat, and the more times each
player will bat during a game. If the ratio between outs and base hits was grossly wrong
for any player, all other player's plate appearances would likely be affected.
Overall team statistics from the simulation were also compared against real life
data. The most important measure, runs per game did not match as well as might be
desired with the initial model. The 2.9-3.6 runs per game (90% Confidence) from the
simulation fell short of the 5 runs per game achieved by the real team. This prompted reexamination of the simplifications that were used in the model for base running (namely
the omission of base stealing and the limitation of base runner advancement to one base
for a single and two bases for a double) . This re-examination revealed that base stealing
is an infrequent enough occurrence that including it would not materially effect the results
of the model. The assumption that each base runner would advance only one base when a
single was hit and two bases when a double was hit was found to be in error. In reality,
runners more often advance two bases on a single and three on a double. The model was
adjusted accordingly and the average runs per game for the simulation changed to 4.2-5.0
(90% Confidence) which matches the actual 5 runs per game of the real team well.
Experimentation and Results
The universally embraced strategy for arranging a line-up or batting order from
little league to the major leagues, is to place the three batters with highest on base
percentages (but not necessarily home run hitting ability) first in the batting order with the
best power/home-run hitter in the fourth spot as the "clean-up" hitter. The rationale
behind this traditional strategy is that you can get at least one of your high percentage
batters on base and then your "clean-up" batter brings them in to score with a home run or
a double or triple. The objective of the experimentation in this project was to challenge
this notion.
An alternative strategy was proposed. The cleanup hitter, Mike Cameron, was
simply moved to the first spot and the lead off hitter, Ichiro Suzuki, was moved to the
fourth spot. In the case of the team being analyzed here, the second, third, and fourth
batters are all above average home run hitters while Suzuki is purely a base-hit batter.
Thus, moving Suzuki to the fourth spot puts the home-run hitters up front in general. No
other changes were made to the order. This scheme gives the power hitters more
opportunities at-bat as well as changing the dynamics of the game in other ways that are
difficult to predict.
The base case and the alternative were run with 200 replications (each replication
is a 9 inning game) and then compared against each other with a paired-t confidence
interval analysis (see the included spreadsheet “compare strategies.xls” for calculations).
The result of this comparison was that the alternative batting order produced .03-.76
(90% confidence interval) more runs per game than the traditional order. In other words,
the team scored slightly more runs with the slugger batting first than with the traditional
lead-off man batting first. It is important to note that the difference fell slightly short of
being statistically significant at a confidence level of 95%.
In order to investigate this question further, a trade was made for a player from
another ball-club. It was theorized that because the clean-up hitter used in the study was
sub-typical for a real power hitter, the difference in the two batting order strategies might
be highlighted by using a more successful player. Thus, statistics were gathered for
another player, Shawn Green of the Los Angeles Dodgers who hit 42 home runs last year
compared to Mike Cameron’s 25. The simulation was run first with Shawn Green batting
in the traditional clean-up spot. Not surprisingly, the team’s runs per game increased .03.85 runs (90% confidence interval) compared to the standard Seattle lineup. This is an
interesting result in itself and is the kind of information that would be useful to baseball
managers contemplating player trades. Next the simulation was run with the alternate
batting order strategy (Shawn Green batting first and Suzuki batting fourth). The results
of this run were somewhat surprising. The team averaged almost exactly the same number
of runs with Green batting first as they did with him in the fourth position. Thus, the
conclusion that switching the clean-up and lead-off hitters leads to more scoring was not
strengthened, but weakened by making this comparison. It also indicates that
generalizations about batting order strategies might be error prone because the merit of
different strategies might depend on the players involved.
One other batting strategy was investigated with the simulation. It explored the
relative merits of the Mariner’s clean-up hitter, Mike Cameron and the Mariner’s lead-off
batter Suzuki. To make the comparison, Cameron was removed from the order and
Suzuki was put into the order in his normal spot and in Cameron’s spot. The performance
of the team was then compared against the baseline performance. This comparison would
show the relative merits of having two Suzuki like batters vs one Suzuki and one
Cameron. The results were that the team scored approximately the same number of runs
(the difference was not statistically significant at a 90% confidence interval). This
indicates that Suzuki’s superior batting average is about equal in scoring value to
Cameron’s combined low average but good power hitting ability.
Conclusions
With the results of these comparisons in mind there are a couple of conclusions
that can be drawn. First, the limited combinations explored here indicate that there is no
advantage to having the traditional clean-up batter bat in the clean-up position rather than
the first position. In the case of the Mariners 2002 lineup, the simulation showed that
there is actually a slight advantage to having the cleanup batter bat first and the lead-off
batter bat fourth. However, making the substitution of one player, Shawn Green into the
lineup changed the dynamics so that there was no advantage to having the cleanup batter
bat first. This demonstrated that it is difficult to make general rules about batting order
strategies as their effectiveness is largely dependent on the individual skills of the players
involved and the dynamics of the team. It would be folly to take the results generated with
these major league players and apply them universally to major league teams. It would be
even more erroneous to apply these results to amateur teams with vastly different talent
distribution. Rather, in order to get good results for this kind of analysis it is necessary to
model the players involved individually.
Appendix
The following files are included with this report for reference
baseball.mod
baseball.TXT
marinerstats.xls
compare strategies.xls
various results output files
The Pro-Model model file
The model text file
The raw data and data analysis used for the model
The results comparison analysis for the various
configurations explored
Output files generated for each batting configuration
studied
Download