Predicting Winners of NFL Games with a Neural Network Matthew Gray CS/ECE 539 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Introduction Predicting who will win a given football game in the NFL is something that many people have an interest in. Millions of people watch NFL games every Sunday and Monday during the regular season and then the focus switches to select teams that make the playoffs. At the start of the season, every team has a clean slate and a chance to make it to the Super Bowl. As the season goes on, the quality of teams becomes more apparent and teams go on to win their divisions, make the playoff s or they may be knocked out of the playoffs. Furthermore, every year, the NFL Draft follows the season, for which the record of the teams determines the draft order of the teams in the recently concluded season, with the teams that have the worst record getting the higher draft picks. Fans, analysts, and gamblers all try to predict who will win a game between two teams. The motives are different for each group. Fans tend to have more interest in how their favorite team and division will do and if their team is doing poorly, they may try to predict the final draft order so that they can see whom the team may draft. Analysts try to keep a more professional and realistic outlook as to predicting who will make the playoffs, win the Super Bowl, and predicting what order teams will draft. Gamblers aim at trying come up with predictions that will net them a profit. The problem with all of these predictions is that they rely on human opinion and this could cause some of the predictions to be wrong because of the how the predictors feel about certain players and teams. I will try to eliminate the human bias from this problem of predicting the winner of an NFL game by using a multi-layer perceptron. 2 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network One of the problems that is presented with predicting the winner an NFL game is that the ability of players varies greatly and they can affect the result of the game. Unfortunately, players get hurt from game to game, or they switch teams between seasons, or their productivity from one season to the next drops or rises by large amounts. Some teams are able to put any player in at a position and receive similar results. Because of these problems, I will not look at individual performances of players to factor into the prediction of who will win a game. On the other hand, the team statistics tend to remain constant throughout a season. I propose that based on the statistics of the teams that it is possible to predict who will win a football game. Now, one of the problems inherent with choosing the statistics to predict a winner is that not one statistic besides the final score of the game determines who wins. There are games where a team may commit more turnovers and still win a game and there are games were a team wins while gaining fewer yardages than the opponent does. This presents a problem that is not linearly separable. Therefore, a neural network capable of handling a non-linear separable problem is needed. Considering this, I will use a multi-layer perceptron, which is capable of handling problems like this. Work Performed My first objective was to collect data. I decided that I should have a data for a complete 16 game regular season. As the current 2005 NFL season is still in progress, I opted to start with collecting game data from 2004. Now, the NFL does keep all of the game box scores for the previous few years on the website, but the format of the data is a 3 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network somewhat awkward, especially for setting the data up in a table. I discovered a website called Statfox Sports Handicapping Community that also has this data (Statfox). The data here was sorted by team in a table with statistics such as passing yards gained and allowed, rushing yards gained and allowed, yards per play, etc. So, for each of the 32 teams, I collected the data tables and brought them into an Excel document. Once in the document, I started converting the data so that I could use in Matlab. The team names were replaced with team id numbers, home and away status of the game was changed to be represented by 1 or 0 respectively. I set up equations to compute the total yards gained and allowed per game for each team. After that, I then computed a running average of yards gained and allowed per team as the season went on. Finally, I computed the running standard deviation of the total yards gained and allowed. I believe that there should be some relation between a team’s consistency, its seasonal averages, and the result of its games. After this was done and doing some data manipulation so that all I was left with was the data and not the column headings, I saved the file as a text tab delimited file so it would be readable by Matlab. To prevent training or testing of the same game twice and to take advantage of the averages that were computed in Excel, I used Matlab to first set the data up such that all games were listed with home team first and away team second. The setup per row was then [Team1 Features, Team 2 Features, Team 1 Result, Team 2 Result]. To prevent the multi-layer perceptron from learning whether a specific team is going to win lose, and to make the neural network applicable to other games from other seasons where a team may be better or worse, the team ids were removed. The ids were used simply to organize the 4 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network data. In addition, the total yardage and not the separate rushing and passing yardage were utilized. One last manipulation was to convert the result from being one column to a 1 in 2 output. With the data being set, I moved on to modifying the multi-layer perceptron code that was provided from the class website http://homepages.cae.wisc.edu/~ece539/ and was originally written Professor Yu Hen Hu. I wanted to be able to run the bp.m program multiple times in a row with different values, so I made the necessary changes to the code that would allow me to change variables from an outside driver program. Further, I wrote a driver program that would allow me to change the variables as I was testing the data. The driver program also randomized the order of the games and I select a subset of the data to be my training data and the rest to be the test data. My next step was to test this data. On my initial test, I used a 10-3-2 MLP structure with a default alpha of 0.1 and a default momentum of 0.8. I did not scale the input to from -5 to 5 and ended up with a classification rate of 53.9844%, but the error rate became a hundred, as there were divide by zero errors. Next, I used the same structure but with the input scaled from -5 to 5 and the classification rate were 58.714 with errors that remained fairly consistent throughout the train. From this point on I decided to you use input scaling. Here is one of the output training error charts. 5 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Graph 1 After seeing these results from these experiments, I decided I needed to build a stronger correlation between the season statistics of each team and the result of the game. To do this, I decided to subtract the away team’s statistics from the home team. The goal of this was to form a relationship based on the differential of the two teams’ statistics. The result of this evident as the classification rate jumped significantly. The initial result of this was an average classification rate of 62.72855% with a standard deviation of 5.760351. After seeing this improvement, I decided to experiment with running the tests with different numbers of hidden layers. 6 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Additionally, after seeing the basic results from one season of data, I collected another season’s data. This time I collected the 2002 NFL Football season data. I then used this as a testing set and the whole of the 2004 season data as the training set. Results Tables: Results from 1 Hidden Layer and no scaling of input variables with taking the difference of the teams’ statistics. 1 Hidden Layers Number of Neurons Per Hidden Layer 1 Classification Rate of Training Set 60.1563 Classification Rate of Testing Set 73.3203 Standard Deviation of Training Set 1.6877 Standard Deviation of Testing Set 2.6941 2 61.6016 69.3750 2.8885 7.4781 3 60.1953 65.3906 4.0655 7.5593 4 61.9141 73.0078 1.9070 2.9775 5 60.3906 63.2031 3.0381 8.2347 6 62.1875 69.4922 1.8396 5.9965 7 60.7031 65.4297 2.7138 6.9494 8 61.0156 67.9297 2.9965 4.6674 9 61.9922 70.1953 1.6680 4.4919 10 62.1484 70.0000 1.9700 3.7774 11 61.9141 70.8203 1.4529 4.3069 12 59.8047 68.9844 5.2098 4.2878 7 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network 13 60.5469 68.6328 2.0422 5.9756 14 61.5625 67.1094 3.3714 4.8678 15 60.6641 71.8359 1.5630 3.3726 Table 1 Results without taking the difference in the teams averages 1 Hidden Layers Number of Neurons Per Hidden Layer 1 Classification Rate of Training Set 57.1484 Classification Rate of Testing Set 54.6484 Standard Deviation of Training Set 1.4737 Standard Deviation of Testing Set 4.8175 2 56.8359 54.9609 0.3796 3.4156 3 57.4609 55.3906 1.7896 5.6120 4 57.4219 56.5234 1.3279 6.3444 5 56.6797 53.1641 0.1235 0.1235 6 57.0313 54.6875 0.9916 4.5443 7 56.9531 53.4375 0.9882 0.9882 8 57.0703 54.0625 0.9822 1.8248 9 56.8750 53.3984 0.7412 0.8647 10 56.6406 53.1250 0 0 11 56.6797 53.1641 0.1235 0.1235 12 57.1875 54.5313 1.2103 3.5905 13 56.6797 53.0859 0.1235 0.1235 14 56.9922 55.5469 0.7469 5.3680 8 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network 15 57.3047 55.3125 1.3663 6.2125 Standard Deviation of Training Set 1.8620 Standard Deviation of Testing Set 6.8831 Table 2 2 Hidden Layers Number of Neurons Per Hidden Layer 1 Classification Rate of Training Set 57.6172 Classification Rate of Testing Set 56.7969 2 59.4531 58.9453 3.0967 6.4039 3 59.2578 64.1797 1.7377 9.8788 4 60.3906 59.8828 2.4512 6.3710 5 58.9844 61.0938 2.8823 9.6428 6 59.5313 59.3359 2.8598 7.8395 7 59.7656 64.7266 2.5911 8.6685 8 58.7891 63.3984 2.1573 10.6262 9 58.3203 60.2344 1.8141 7.7969 10 59.0625 61.7969 2.7176 10.3739 11 58.8672 60.1172 2.3871 8.4934 12 57.8125 57.6172 2.1156 9.0084 13 57.5781 56.9531 1.7967 8.4722 14 57.5000 54.6094 1.4476 2.6672 15 57.4609 54.4531 1.7220 3.0658 Table 3 9 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network 3 Hidden Layers Number of Neurons Per Hidden Layer 1 Classification Rate of Training Set 57.6172 Classification Rate of Testing Set 56.7969 Standard Deviation of Training Set 1.8620 Standard Deviation of Testing Set 6.8831 2 59.4531 58.9453 3.0967 6.4039 3 59.2578 64.1797 1.7377 9.8788 4 60.3906 59.8828 2.4512 6.3710 5 58.9844 61.0938 2.8823 9.6428 6 59.5313 59.3359 2.8598 7.8395 7 59.7656 64.7266 2.5911 8.6685 8 58.7891 63.3984 2.1573 10.6262 9 58.3203 60.2344 1.8141 7.7969 10 59.0625 61.7969 2.7176 10.3739 11 58.8672 60.1172 2.3871 8.4934 12 57.8125 57.6172 2.1156 9.0084 13 57.5781 56.9531 1.7967 8.4722 14 57.5000 54.6094 1.4476 2.6672 15 57.4609 54.4531 1.7220 3.0658 Table 4 10 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Results with taking the difference in the teams averages Here are results from running the MLP 10 times on each of the hidden neurons ranging from 1 to 15 on the test set. The testing set here was the 2002 statistics; the training data was the 2004 statistics. 1 Hidden Layers Number of Neurons Per Hidden Layer 1 Classification Rate of Training Set 67.3438 Classification Rate of Testing Set 72.8125 Standard Deviation of Training Set 0.7856 Standard Deviation of Testing Set 0.7412 2 69.1406 73.1641 0.5524 0.9216 3 70.4297 73.9453 1.3412 0.7823 4 71.1719 72.3047 1.2187 3.9770 5 71.4453 72.2656 1.1566 2.3654 6 72.6172 73.7109 1.5782 1.4622 7 73.0469 71.9141 1.9661 1.7513 8 72.8516 72.0313 1.8252 3.2380 9 72.8906 71.8359 0.9951 2.6840 10 73.5156 69.0234 1.7739 3.4255 11 71.9531 71.0156 1.6756 2.6608 12 72.8125 70.0781 0.7856 3.7386 13 72.6563 69.8047 1.7274 4.6404 14 73.0078 70.4688 2.4460 2.9646 15 72.4219 71.1719 1.0286 3.3033 11 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Table 5 2 Hidden Layers Number of Neurons Per Hidden Layer 1 Classification Rate of Training Set 68.9453 Classification Rate of Testing Set 73.2422 Standard Deviation of Training Set 0.5600 Standard Deviation of Testing Set 0.2762 2 69.9609 73.8281 0.4677 0.6379 3 72.7344 73.0859 0.7769 1.7220 4 75.2734 70.5469 1.2897 2.8893 5 77.3438 68.2422 1.6776 2.7808 6 80.5078 68.1641 1.9614 1.8529 7 83.0859 68.3984 1.8693 3.9169 8 84.6875 67.6172 1.2995 2.1978 9 87.2656 68.1250 1.9850 3.6699 10 87.6563 66.5625 1.6074 2.5859 11 89.4922 67.9297 2.0126 1.7416 12 91.3281 68.1641 1.8579 3.2695 13 91.5625 68.9844 1.7682 2.9931 14 92.9688 68.1250 1.0737 1.9935 15 93.5547 68.2031 1.1536 1.8340 Table 6 12 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network 3 Hidden Layers Number of Neurons Per Hidden Layer 1 Classification Rate of Training Set 57.6172 Classification Rate of Testing Set 54.4922 Standard Deviation of Training Set 2.0772 Standard Deviation of Testing Set 2.8838 2 58.6719 59.7656 2.0817 8.5898 3 59.0625 60.4688 3.0636 8.8308 4 59.6094 62.1484 2.6314 8.8474 5 59.6094 62.6172 2.8120 9.1998 6 60.1563 61.6797 2.5382 7.1921 7 60.1563 62.4609 2.7313 7.7787 8 60.2734 60.7813 2.2557 6.3287 9 59.3359 60.9766 2.2055 8.0381 10 60.1953 62.0313 2.6840 8.7633 11 58.5938 57.7734 2.2097 6.4724 12 58.3984 58.8281 2.3091 7.6662 13 58.2422 57.6563 2.1352 6.3180 14 58.0859 55.8984 2.0427 4.6565 15 59.7656 58.2813 3.1574 5.1619 Table 7 Graphs: 13 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Graph 2: Training error from a running the training data through an MLP of 5-3-2 with teams’ statistics concatenated together. 14 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Graph 3: Training error from a running the training data through an MLP of 5-3-2 with differences between the teams statistics. Discussion The results from this are interesting. First, look at table 1t. First, note that this test was done by taking the statistics of the away team from the statistics of the home team. The negative effect of not scaling the input caused some serious problems as is evident by the fluctuation in the standard deviation of the testing set. The classification rate of the training set remains constant around 60 percent for all of the different multilayer perceptron structures. The standard deviation did vary a lot and no discernable pattern in the standard deviation was apparent. Part of the reason is that division by zero 15 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network errors occurred causing problems with the training of the multi-layer perceptron. Further, the classification rates upon the testing set were around 70 percent but the standard deviation was large as it ranged from 2.69 to 8.234. Therefore, even though the classification results are decent, I still wanted to find a more consistent prediction structure. To see the performance increase from scaling the input, refer to table 5. This test used the same data setup by having the difference in statistics for the two teams being the data. Here, the classification rate of both the training set and the testing set are higher than what occurred in table 1. For the training set, the jump in classification rate brought it up by roughly seven percent. The testing set did not improve as much for the classification rate, but improve is noticeable and it remains more consistent across all the model structures. The most remarkable observation with this data comes from the standard deviation. The standard deviation for both of the data sets remains closer to zero and does not vary as wildly as the standard deviation from table 1. Upon further inspection, it appears that the multi-layer perceptron structure of 5-3-2 provides the best predicting model as it has both the lowest standard deviation and the highest classification rate. If the model were to pick based solely on the training set, then the 59-2 or 5-12-2 structure would have proved to be the best models as they both provide similar classification rates and standard deviations. Examining tables 6 and 7 reveal results of using two and three layers of hidden layers of neurons. In table 6, the analysis of the training set classification rates and standard deviation would lead one to believe that the neural network structure is better 16 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network than those found using one layer. The classification rate jumps to 93 percent, which is a drastic increase compared to the single layer of hidden neuron models, and the standard deviation is very consistent. When you look at the results of the testing set, a decrease in the classification rates occurs, though the standard deviation rates do not change much. What is occurring here is that the multi-layer perceptron is overfitting its training data. The two hidden layer structures with one and two neurons in each layer produce very good results and very good standard deviations. The multi-layer perceptron structures with three layers did respond the way I thought they would. My initial thought was that more overfitting would occur with three layers. However, what I discovered was that at three layers, the network seemed to get confused with the data. The standard deviations were once again large, but the classification rates were much lower than what the multilayer perceptron structures resulted in table 1 with one hidden layer and not scaling the input. With three layers, the testing classifications peak around 62 percent at five neurons per layer, yet the classification rate is much lower than what was found in previous one and two layer structures. The data from the table 3, 4, and 5 emphasize the need for building a correlation between the season statistics of the two teams. When the two teams’ features are fed in simultaneously to the multi-layer perceptron the pattern is hard for the neural network to discover as there are many features being passed in. Once again, the structures using two and three layers of hidden neurons to predict the outcome of a football game do see some improvement to bring the testing classification rate up into the low 60s. The standard deviation on the training set though for these does not differ a great deal from one 17 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network structure to another. Without building a stronger correlation, predicting who will win football game remains difficult. Now, after taking all of this data into account, it seems that it is very possible to predict the winner of a given NFL game. What is necessary is that a strong correlation must be built to predict the winner. From using the difference between each team’s average yards gained and allowed, and their standard deviations per game appears to give a good prediction as to who will win a game. It may be possible to increase the performance by including average turnovers per game and the team’s standard deviation per game as turnovers play a big role in the outcome of the game. For now though, this shows that it is possible to predict the winner of a football game with a reasonably high accuracy. 18 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Program Listing bpShell_project.m – Calls the initializing data and sets up and the MLP variables that determine the number of layers, features etc. Performs 10 test runs of 1 to 15 neurons and stores the classification rate for each and then calculates the mean and standard deviation for testing and training data sets. bp_project.m – Actually performs the MLP algorithm bpconfig_project.m – Takes care of some variable initialization initData4.m – Initializes the 2004 NFL season data. initData5.m – Initializes the 2002 NFL season data. bptest.m – Used for testing the back propagation cvgtest.m - Used for convergence test Data Files outputEnd.txt - 2004 statistics stats2002.txt - 2002 statistics Programs can be found at end of printed documented are attached in a zip file for email. 19 Matthew Gray CS/ECE 539 Predicting Winners of NFL Games with a Neural Network Newman, M. E. J., and Park, Juyong; A network-based ranking system for US college Football. Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109. arXiv:physics/0505169 v4 31 Oct 2005 NFL.com - Official Site of the National Football League, National Football League 280 Park Avenue New York NY. 19 Dec. 2005 http://www.nfl.com Purucker, Michael C; Neural network quarterbacking: How different training methods Perform in calling the games. IEEE Potentials, August/September 1996. http://ieeexplore.ieee.org/iel1/45/11252/00535226.pdf?arnumber=535226 Statfox | NFL Team Logs. Statfox Sports Handicapping Community. 19 Dec. 2005 http://www.statfox.com/nfl/nfllogs.htm 20