Artificial Neural Network Prediction of Major League Baseball

advertisement
Artificial Neural Network
Prediction of Major League
Baseball Teams Winning
Percentages
Scott Wiese
ECE 539
Professor Hu
Motivation
Current trends in managing player
personnel focuses heavily on statistics
to weigh future production against
potential salaries.


Used to determine whether or not to sign
specific players
Determine if current players are overpaid
Motivation
Claimed that statistics can be a valid
predictor of both a player’s and team’s
production
Claimed that one season, 162 games, is
a long enough trial period that statistics
can predict a team’s winning
percentage
Goals
Can I develop an artificial neural
network that when given a team’s
statistics for a year that will accurately
predict a team’s winning percentage?
Data Collection
Collected 3 years of data for all 30
Major League Baseball teams


Gathered from statistical database
available on www.MLB.com
74 statistics besides winning percentage
gathered
Neural Network Selection
Back Trained Multi Layer Perceptron



Excellent at analyzing large feature sets
Supervised Training
Good at classification problems
Preprocessing
Normalized each feature vector
Used singular value decomposition to
emphasize most important features
Testing
Wanted to determine which MLP
configuration would best predict
winning percentage


Baseline MLP: 1 hidden layer, 1 hidden
neuron
Tested MLPs: 1 through 5 hidden layers, 1,
3, or 5 hidden neurons in all layers
Testing Results
Average Success Rates 1 hidden layer 2 hidden layers
1 neuron
33.33
56.67
3 neurons
45.56
35.56
5 neurons
32.22
45.56
3 hidden layers
50
41.11
43.33
4 hidden layers
60
31.11
40
Testing Results
% Success Rate
Average Success Rates - Exact Matching
65
60
55
50
45
40
35
30
25
20
1 neuron
3 neurons
5 neurons
0
1
2
3
Hidden Layers
4
5
6
Testing
Now that we know the 4 hidden layers,
1 hidden neuron network performed the
best, test it again against the baseline
with new data
Success when predicted winning
percentage within +/- 0.15
Testing Results
Final Testing
Baseline
Best MLP
Trial 1
23.33
73.33
Trial 2
16.67
43.33
Trial 3 Mean
33.33 24.44
26.67 47.78
Best MLP’s performance almost twice as good as
baseline’s performance.
Preliminary Conclusions
Advanced MLP structure is better at
predicting a team’s winning percentage.
Unfortunately, still under 50% given a
.15 error bound
Can classification work better
Classification Testing
Classify teams into 3 groups



Division winners (> .590)
Winning teams (.500<x<.589)
Losing teams (<.500)
Same process as above
Classification Results
Average Success Rates 1 hidden layer 2 hidden layers 3 hidden layers 4 hidden layers
1 neuron
55.5556
61.1111
56.6667
53.3333
3 neurons
58.8889
61.1111
66.6667
57.7778
5 neurons
66.6667
62.2222
73.3333
60
3 hidden layers with 5 hidden neurons is best
Classification Results
Average Success Rates - Classification
% Success Rate
75
70
65
1 neuron
60
3 neurons
55
5 neurons
50
45
40
0
1
2
3
Hidden Layers
4
5
6
Classification Results
Again, now that we know the best
advanced network, test it against the
baseline with more data.
Classification Results
Final Testing Trial 1 Trial 2 Trial 3 Mean
Baseline
66.6667 63.333
60
63.333
Best MLP
66.6667 63.333 63.3333 64.444
Negligible difference between the two networks even
though there was nearly a 50% improvement in the
original trial.
Conclusions
Advanced network better at pure
prediction than baseline

Still a very moderate success rate given the
error bounds
Classification results very promising

Shows that statistics are important in
separating teams’ results
Download