Neural Network Prediction of NFL Football Games Joshua Kahn ECE539 – Fall2003 Overview Introduction Work Performed Data Collection Preliminary Study Training and Prediction Set Creation Data Preprocessing Making Predictions Results Conclusion Introduction The National Football League (NFL) is a multi-billion dollar business Many web sites claim to be able to predict the outcome of NFL games Some of these sites are trustworthy, others are downright seedy Why are actually correct? Project Goal Most prognostications are based on human opinion Invariably, some degree of bias enters in This project aims to create a completely objective, statistics based system for predicting the outcome of NFL games The trouble lies in the “intangible” aspects of the game It seems plausible to do create a statistical system Why a Neural Network? Teams can win in a variety of ways No linear mapping exists to determine the outcome This problem essentially boils down to a pattern classification problem Neural networks are very good at solving these problems Neural network provides a non-linear mapping Data Collection Data was to be available from a typical NFL box score A large data set was required to represent the large number of ways to win Collected from NFL.com Used Excel’s web query feature to acquire tabular data, such as box scores and team averages Data Collection Data was extracted from the box scores using a Perl script Statistics could be selected from the box scores as desired Perl provides an Excel interface Perl also allowed additional data processing Needed to determine which statistics to use Preliminary Study Data was analyzed using Matlab to look for dependency, redundant data, etc. No hyperplane exists to separate wins and losses based on statistical analysis 8 6 4 Turnover Differential 2 0 -2 -4 -6 -8 -400 2000 -300 1000 -200 -100 0 0 100 Total Yardage Differential 200 300 -1000 400 -2000 Time of Possession Differential Preliminary Study Results Determined the following statistics were most predictive: Total yardage differential Rushing yardage differential Time of possession differential (in seconds) Turnover differential Home or away Differential statistics provide insight into offensive and defensive performance Scoring data was excluded as it would bias the network’s output toward a single feature Training and Prediction Sets Training sets include the statistics for both teams for each game Each training vector also includes the outcome of the game Outcome marked for both teams 1 = win, -1 = loss Two prediction sets were created: One based on team season averages Other based on average of prior 3 weeks Both sets were applied to determine effectiveness Neural Network Selection Back-propagation multi-layer perceptron provides a great deal of flexibility Good pattern classifier Supervised learning Network parameters and structure were determined based on testing Data Preprocessing Processed all data using singular value decomposition Gives additional weight to the most pertinent features prior to network input Makes training more effective Performed using Matlab’s svd function Making Predictions Trained network using training data Applied prediction data three times Used both season and three week average to determine effectiveness of the two Found the average of the three trials Classified winner/loser of game Winner had higher network output Results Prediction Rate Week Week 14 Season Average Data 75% Three Week Average Data 62.5% Week 15 75% 37.5% Neural network classification correct 94% when actual (not predicted) statistics are used NFL teams seem to be consistent over the long-term Results Week 14 Week 15 Green Bay def. Chicago Baltimore def. Cincinnati Indianapolis def. Atlanta Tennessee def. Buffalo Philadelphia def. Dallas Jacksonville def. Houston Kansas City def. Detroit Tampa Bay def. Houston Indianapolis def. Tennessee Pittsburgh def. Oakland New England def. Jacksonville Minnesota def. Chicago San Diego def. Detroit Minnesota def. Seattle New York Jets def. Pittsburgh St. Louis def. Seattle Tampa Bay def. New Orleans New York Giants def. Washington Cincinnati def. San Francisco Oakland def. Baltimore San Francisco def. Denver def. Kansas Arizona City Denver def. Cleveland Carolina def. Arizona New England def. Miami Buffalo def. New York Jets Dallas def. Washington Green Bay def. San Diego Atlanta def. Carolina St. Louis def. Cleveland New Orleans def. NY Giants Philadelphia def. Miami Baseline Study Prediction Rate Week Neural Network ESPN.com Week 14 75% 57% Week 15 75% 87% Neural network was more accurate on average Previous neural networks predictors accurate for 63% of games Conclusions Game Of eight misclassifications, each can be subjectively identified in one of 3 categories Misclassification Reasoning Philadelphia def. Dallas Misclassification San Diego def. Detroit Too close to call Atlanta def. Carolina Upset Minnesota def. Seattle Too close to call New England def. Jacksonville Misclassification New York Jets def. Pittsburgh Too close to call Cincinnati def. San Francisco Too close to call Oakland def. Baltimore Upset Conclusions Prediction rate could be improved by adding the “human element” Training set could be based on previous season data Take immeasurable into consideration Las Vegas betting lines Subjective team rankings Ways in which teams win presumably does not change over time Proves that a statistically based system can be developed to predict outcome of NFL games References Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Upper Saddle River, New Jersey: Prentice-Hall, Inc. ESPN.com, http://www.espn.com [Retrieved Dec 2003]. Purucker, M.C. (1996) Neural Network Quarterbacking. Potentials, IEEE, vol. 15:3, pp. 9-15. NFL.com, http://www.nfl.com [Retrieved Dec 2003]. Questions??? Thank you…