Neural Network Prediction of NFL Football Games

advertisement
Neural Network Prediction
of NFL Football Games
Joshua Kahn
ECE539 – Fall2003
Overview


Introduction
Work Performed







Data Collection
Preliminary Study
Training and Prediction Set Creation
Data Preprocessing
Making Predictions
Results
Conclusion
Introduction




The National Football League (NFL) is a
multi-billion dollar business
Many web sites claim to be able to predict the
outcome of NFL games
Some of these sites are trustworthy, others
are downright seedy
Why are actually correct?
Project Goal

Most prognostications are based on human
opinion


Invariably, some degree of bias enters in
This project aims to create a completely
objective, statistics based system for
predicting the outcome of NFL games


The trouble lies in the “intangible” aspects of the
game
It seems plausible to do create a statistical system
Why a Neural Network?

Teams can win in a variety of ways


No linear mapping exists to determine the
outcome
This problem essentially boils down to a
pattern classification problem


Neural networks are very good at solving these
problems
Neural network provides a non-linear mapping
Data Collection



Data was to be available from a typical NFL
box score
A large data set was required to represent the
large number of ways to win
Collected from NFL.com

Used Excel’s web query feature to acquire tabular
data, such as box scores and team averages
Data Collection

Data was extracted from the box scores
using a Perl script


Statistics could be selected from the box
scores as desired


Perl provides an Excel interface
Perl also allowed additional data processing
Needed to determine which statistics to use
Preliminary Study

Data was analyzed
using Matlab to look for
dependency, redundant
data, etc.
No hyperplane exists to
separate wins and
losses based on
statistical analysis
8
6
4
Turnover Differential

2
0
-2
-4
-6
-8
-400
2000
-300
1000
-200
-100
0
0
100
Total Yardage Differential
200
300
-1000
400
-2000
Time of Possession
Differential
Preliminary Study Results

Determined the following statistics were most
predictive:







Total yardage differential
Rushing yardage differential
Time of possession differential (in seconds)
Turnover differential
Home or away
Differential statistics provide insight into offensive
and defensive performance
Scoring data was excluded as it would bias the
network’s output toward a single feature
Training and Prediction Sets


Training sets include the statistics for both
teams for each game
Each training vector also includes the
outcome of the game



Outcome marked for both teams
1 = win, -1 = loss
Two prediction sets were created:



One based on team season averages
Other based on average of prior 3 weeks
Both sets were applied to determine effectiveness
Neural Network Selection

Back-propagation multi-layer perceptron
provides a great deal of flexibility



Good pattern classifier
Supervised learning
Network parameters and structure were
determined based on testing
Data Preprocessing

Processed all data using singular value
decomposition



Gives additional weight to the most pertinent
features prior to network input
Makes training more effective
Performed using Matlab’s svd function
Making Predictions


Trained network using training data
Applied prediction data three times



Used both season and three week average to
determine effectiveness of the two
Found the average of the three trials
Classified winner/loser of game

Winner had higher network output
Results
Prediction Rate
Week


Week 14
Season Average
Data
75%
Three Week
Average Data
62.5%
Week 15
75%
37.5%
Neural network classification correct 94% when
actual (not predicted) statistics are used
NFL teams seem to be consistent over the long-term
Results
Week 14
Week 15
Green Bay def.
Chicago
Baltimore def.
Cincinnati
Indianapolis def.
Atlanta
Tennessee def.
Buffalo
Philadelphia def.
Dallas
Jacksonville def.
Houston
Kansas City def.
Detroit
Tampa Bay def.
Houston
Indianapolis def.
Tennessee
Pittsburgh def.
Oakland
New England def.
Jacksonville
Minnesota def.
Chicago
San Diego def.
Detroit
Minnesota def.
Seattle
New York Jets def.
Pittsburgh
St. Louis def. Seattle
Tampa Bay def.
New Orleans
New York Giants def.
Washington
Cincinnati def. San
Francisco
Oakland def.
Baltimore
San Francisco def. Denver def. Kansas
Arizona
City
Denver def.
Cleveland
Carolina def. Arizona
New England def.
Miami
Buffalo def. New York
Jets
Dallas def.
Washington
Green Bay def. San
Diego
Atlanta def.
Carolina
St. Louis def.
Cleveland
New Orleans def.
NY Giants
Philadelphia def.
Miami
Baseline Study
Prediction Rate


Week
Neural Network
ESPN.com
Week 14
75%
57%
Week 15
75%
87%
Neural network was more accurate on average
Previous neural networks predictors accurate for
63% of games
Conclusions
Game

Of eight
misclassifications,
each can be
subjectively
identified in one of
3 categories
Misclassification
Reasoning
Philadelphia def. Dallas Misclassification
San Diego def. Detroit
Too close to call
Atlanta def. Carolina
Upset
Minnesota def. Seattle
Too close to call
New England def.
Jacksonville
Misclassification
New York Jets def.
Pittsburgh
Too close to call
Cincinnati def. San
Francisco
Too close to call
Oakland def. Baltimore
Upset
Conclusions

Prediction rate could be improved by adding the
“human element”




Training set could be based on previous season
data


Take immeasurable into consideration
Las Vegas betting lines
Subjective team rankings
Ways in which teams win presumably does not change
over time
Proves that a statistically based system can be
developed to predict outcome of NFL games
References
Haykin, S. (1999). Neural Networks: A Comprehensive
Foundation. Upper Saddle River, New Jersey:
Prentice-Hall, Inc.
ESPN.com, http://www.espn.com [Retrieved Dec 2003].
Purucker, M.C. (1996) Neural Network Quarterbacking.
Potentials, IEEE, vol. 15:3, pp. 9-15.
NFL.com, http://www.nfl.com [Retrieved Dec 2003].
Questions???
Thank you…
Download