• Introduction (Problem statement, Motivation)

advertisement
Predicting Winners of NFL Games with a
Neural Network
Matthew Gray
CS/ECE 539
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Introduction
Predicting who will win a given football game in the NFL is something that many
people have an interest in. Millions of people watch NFL games every Sunday and
Monday during the regular season and then the focus switches to select teams that make
the playoffs. At the start of the season, every team has a clean slate and a chance to make
it to the Super Bowl. As the season goes on, the quality of teams becomes more apparent
and teams go on to win their divisions, make the playoff s or they may be knocked out of
the playoffs. Furthermore, every year, the NFL Draft follows the season, for which the
record of the teams determines the draft order of the teams in the recently concluded
season, with the teams that have the worst record getting the higher draft picks. Fans,
analysts, and gamblers all try to predict who will win a game between two teams. The
motives are different for each group. Fans tend to have more interest in how their
favorite team and division will do and if their team is doing poorly, they may try to
predict the final draft order so that they can see whom the team may draft. Analysts try
to keep a more professional and realistic outlook as to predicting who will make the
playoffs, win the Super Bowl, and predicting what order teams will draft. Gamblers aim
at trying come up with predictions that will net them a profit. The problem with all of
these predictions is that they rely on human opinion and this could cause some of the
predictions to be wrong because of the how the predictors feel about certain players and
teams. I will try to eliminate the human bias from this problem of predicting the winner
of an NFL game by using a multi-layer perceptron.
2
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
One of the problems that is presented with predicting the winner an NFL game is
that the ability of players varies greatly and they can affect the result of the game.
Unfortunately, players get hurt from game to game, or they switch teams between
seasons, or their productivity from one season to the next drops or rises by large amounts.
Some teams are able to put any player in at a position and receive similar results.
Because of these problems, I will not look at individual performances of players to factor
into the prediction of who will win a game. On the other hand, the team statistics tend to
remain constant throughout a season. I propose that based on the statistics of the teams
that it is possible to predict who will win a football game. Now, one of the problems
inherent with choosing the statistics to predict a winner is that not one statistic besides the
final score of the game determines who wins. There are games where a team may
commit more turnovers and still win a game and there are games were a team wins while
gaining fewer yardages than the opponent does. This presents a problem that is not
linearly separable. Therefore, a neural network capable of handling a non-linear
separable problem is needed. Considering this, I will use a multi-layer perceptron, which
is capable of handling problems like this.
Work Performed
My first objective was to collect data. I decided that I should have a data for a
complete 16 game regular season. As the current 2005 NFL season is still in progress, I
opted to start with collecting game data from 2004. Now, the NFL does keep all of the
game box scores for the previous few years on the website, but the format of the data is a
3
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
somewhat awkward, especially for setting the data up in a table. I discovered a website
called Statfox Sports Handicapping Community that also has this data (Statfox). The
data here was sorted by team in a table with statistics such as passing yards gained and
allowed, rushing yards gained and allowed, yards per play, etc. So, for each of the 32
teams, I collected the data tables and brought them into an Excel document. Once in the
document, I started converting the data so that I could use in Matlab. The team names
were replaced with team id numbers, home and away status of the game was changed to
be represented by 1 or 0 respectively. I set up equations to compute the total yards
gained and allowed per game for each team. After that, I then computed a running
average of yards gained and allowed per team as the season went on. Finally, I computed
the running standard deviation of the total yards gained and allowed. I believe that there
should be some relation between a team’s consistency, its seasonal averages, and the
result of its games. After this was done and doing some data manipulation so that all I
was left with was the data and not the column headings, I saved the file as a text tab
delimited file so it would be readable by Matlab.
To prevent training or testing of the same game twice and to take advantage of the
averages that were computed in Excel, I used Matlab to first set the data up such that all
games were listed with home team first and away team second. The setup per row was
then [Team1 Features, Team 2 Features, Team 1 Result, Team 2 Result]. To prevent the
multi-layer perceptron from learning whether a specific team is going to win lose, and to
make the neural network applicable to other games from other seasons where a team may
be better or worse, the team ids were removed. The ids were used simply to organize the
4
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
data. In addition, the total yardage and not the separate rushing and passing yardage were
utilized. One last manipulation was to convert the result from being one column to a 1 in
2 output.
With the data being set, I moved on to modifying the multi-layer perceptron code
that was provided from the class website http://homepages.cae.wisc.edu/~ece539/ and
was originally written Professor Yu Hen Hu. I wanted to be able to run the bp.m
program multiple times in a row with different values, so I made the necessary changes to
the code that would allow me to change variables from an outside driver program.
Further, I wrote a driver program that would allow me to change the variables as I was
testing the data. The driver program also randomized the order of the games and I select
a subset of the data to be my training data and the rest to be the test data. My next step
was to test this data.
On my initial test, I used a 10-3-2 MLP structure with a default alpha of 0.1 and a
default momentum of 0.8. I did not scale the input to from -5 to 5 and ended up with a
classification rate of 53.9844%, but the error rate became a hundred, as there were divide
by zero errors. Next, I used the same structure but with the input scaled from -5 to 5 and
the classification rate were 58.714 with errors that remained fairly consistent throughout
the train. From this point on I decided to you use input scaling. Here is one of the output
training error charts.
5
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Graph 1
After seeing these results from these experiments, I decided I needed to build a
stronger correlation between the season statistics of each team and the result of the game.
To do this, I decided to subtract the away team’s statistics from the home team. The goal
of this was to form a relationship based on the differential of the two teams’ statistics.
The result of this evident as the classification rate jumped significantly. The initial result
of this was an average classification rate of 62.72855% with a standard deviation of
5.760351. After seeing this improvement, I decided to experiment with running the tests
with different numbers of hidden layers.
6
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Additionally, after seeing the basic results from one season of data, I collected
another season’s data. This time I collected the 2002 NFL Football season data. I then
used this as a testing set and the whole of the 2004 season data as the training set.
Results
Tables:
Results from 1 Hidden Layer and no scaling of input variables with taking the
difference of the teams’ statistics.
1 Hidden Layers
Number of
Neurons Per
Hidden Layer
1
Classification
Rate of
Training Set
60.1563
Classification
Rate of Testing
Set
73.3203
Standard
Deviation of
Training Set
1.6877
Standard
Deviation of
Testing Set
2.6941
2
61.6016
69.3750
2.8885
7.4781
3
60.1953
65.3906
4.0655
7.5593
4
61.9141
73.0078
1.9070
2.9775
5
60.3906
63.2031
3.0381
8.2347
6
62.1875
69.4922
1.8396
5.9965
7
60.7031
65.4297
2.7138
6.9494
8
61.0156
67.9297
2.9965
4.6674
9
61.9922
70.1953
1.6680
4.4919
10
62.1484
70.0000
1.9700
3.7774
11
61.9141
70.8203
1.4529
4.3069
12
59.8047
68.9844
5.2098
4.2878
7
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
13
60.5469
68.6328
2.0422
5.9756
14
61.5625
67.1094
3.3714
4.8678
15
60.6641
71.8359
1.5630
3.3726
Table 1
Results without taking the difference in the teams averages
1 Hidden Layers
Number of
Neurons Per
Hidden Layer
1
Classification
Rate of
Training Set
57.1484
Classification
Rate of Testing
Set
54.6484
Standard
Deviation of
Training Set
1.4737
Standard
Deviation of
Testing Set
4.8175
2
56.8359
54.9609
0.3796
3.4156
3
57.4609
55.3906
1.7896
5.6120
4
57.4219
56.5234
1.3279
6.3444
5
56.6797
53.1641
0.1235
0.1235
6
57.0313
54.6875
0.9916
4.5443
7
56.9531
53.4375
0.9882
0.9882
8
57.0703
54.0625
0.9822
1.8248
9
56.8750
53.3984
0.7412
0.8647
10
56.6406
53.1250
0
0
11
56.6797
53.1641
0.1235
0.1235
12
57.1875
54.5313
1.2103
3.5905
13
56.6797
53.0859
0.1235
0.1235
14
56.9922
55.5469
0.7469
5.3680
8
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
15
57.3047
55.3125
1.3663
6.2125
Standard
Deviation of
Training Set
1.8620
Standard
Deviation of
Testing Set
6.8831
Table 2
2 Hidden Layers
Number of
Neurons Per
Hidden Layer
1
Classification
Rate of
Training Set
57.6172
Classification
Rate of Testing
Set
56.7969
2
59.4531
58.9453
3.0967
6.4039
3
59.2578
64.1797
1.7377
9.8788
4
60.3906
59.8828
2.4512
6.3710
5
58.9844
61.0938
2.8823
9.6428
6
59.5313
59.3359
2.8598
7.8395
7
59.7656
64.7266
2.5911
8.6685
8
58.7891
63.3984
2.1573
10.6262
9
58.3203
60.2344
1.8141
7.7969
10
59.0625
61.7969
2.7176
10.3739
11
58.8672
60.1172
2.3871
8.4934
12
57.8125
57.6172
2.1156
9.0084
13
57.5781
56.9531
1.7967
8.4722
14
57.5000
54.6094
1.4476
2.6672
15
57.4609
54.4531
1.7220
3.0658
Table 3
9
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
3 Hidden Layers
Number of
Neurons Per
Hidden Layer
1
Classification
Rate of
Training Set
57.6172
Classification
Rate of Testing
Set
56.7969
Standard
Deviation of
Training Set
1.8620
Standard
Deviation of
Testing Set
6.8831
2
59.4531
58.9453
3.0967
6.4039
3
59.2578
64.1797
1.7377
9.8788
4
60.3906
59.8828
2.4512
6.3710
5
58.9844
61.0938
2.8823
9.6428
6
59.5313
59.3359
2.8598
7.8395
7
59.7656
64.7266
2.5911
8.6685
8
58.7891
63.3984
2.1573
10.6262
9
58.3203
60.2344
1.8141
7.7969
10
59.0625
61.7969
2.7176
10.3739
11
58.8672
60.1172
2.3871
8.4934
12
57.8125
57.6172
2.1156
9.0084
13
57.5781
56.9531
1.7967
8.4722
14
57.5000
54.6094
1.4476
2.6672
15
57.4609
54.4531
1.7220
3.0658
Table 4
10
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Results with taking the difference in the teams averages
Here are results from running the MLP 10 times on each of the hidden neurons
ranging from 1 to 15 on the test set. The testing set here was the 2002 statistics; the
training data was the 2004 statistics.
1 Hidden Layers
Number of
Neurons Per
Hidden Layer
1
Classification
Rate of
Training Set
67.3438
Classification
Rate of Testing
Set
72.8125
Standard
Deviation of
Training Set
0.7856
Standard
Deviation of
Testing Set
0.7412
2
69.1406
73.1641
0.5524
0.9216
3
70.4297
73.9453
1.3412
0.7823
4
71.1719
72.3047
1.2187
3.9770
5
71.4453
72.2656
1.1566
2.3654
6
72.6172
73.7109
1.5782
1.4622
7
73.0469
71.9141
1.9661
1.7513
8
72.8516
72.0313
1.8252
3.2380
9
72.8906
71.8359
0.9951
2.6840
10
73.5156
69.0234
1.7739
3.4255
11
71.9531
71.0156
1.6756
2.6608
12
72.8125
70.0781
0.7856
3.7386
13
72.6563
69.8047
1.7274
4.6404
14
73.0078
70.4688
2.4460
2.9646
15
72.4219
71.1719
1.0286
3.3033
11
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Table 5
2 Hidden Layers
Number of
Neurons Per
Hidden Layer
1
Classification
Rate of
Training Set
68.9453
Classification
Rate of Testing
Set
73.2422
Standard
Deviation of
Training Set
0.5600
Standard
Deviation of
Testing Set
0.2762
2
69.9609
73.8281
0.4677
0.6379
3
72.7344
73.0859
0.7769
1.7220
4
75.2734
70.5469
1.2897
2.8893
5
77.3438
68.2422
1.6776
2.7808
6
80.5078
68.1641
1.9614
1.8529
7
83.0859
68.3984
1.8693
3.9169
8
84.6875
67.6172
1.2995
2.1978
9
87.2656
68.1250
1.9850
3.6699
10
87.6563
66.5625
1.6074
2.5859
11
89.4922
67.9297
2.0126
1.7416
12
91.3281
68.1641
1.8579
3.2695
13
91.5625
68.9844
1.7682
2.9931
14
92.9688
68.1250
1.0737
1.9935
15
93.5547
68.2031
1.1536
1.8340
Table 6
12
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
3 Hidden Layers
Number of
Neurons Per
Hidden Layer
1
Classification
Rate of
Training Set
57.6172
Classification
Rate of Testing
Set
54.4922
Standard
Deviation of
Training Set
2.0772
Standard
Deviation of
Testing Set
2.8838
2
58.6719
59.7656
2.0817
8.5898
3
59.0625
60.4688
3.0636
8.8308
4
59.6094
62.1484
2.6314
8.8474
5
59.6094
62.6172
2.8120
9.1998
6
60.1563
61.6797
2.5382
7.1921
7
60.1563
62.4609
2.7313
7.7787
8
60.2734
60.7813
2.2557
6.3287
9
59.3359
60.9766
2.2055
8.0381
10
60.1953
62.0313
2.6840
8.7633
11
58.5938
57.7734
2.2097
6.4724
12
58.3984
58.8281
2.3091
7.6662
13
58.2422
57.6563
2.1352
6.3180
14
58.0859
55.8984
2.0427
4.6565
15
59.7656
58.2813
3.1574
5.1619
Table 7
Graphs:
13
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Graph 2: Training error from a running the training data through an MLP of 5-3-2 with
teams’ statistics concatenated together.
14
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Graph 3: Training error from a running the training data through an MLP of 5-3-2 with
differences between the teams statistics.
Discussion
The results from this are interesting. First, look at table 1t. First, note that this
test was done by taking the statistics of the away team from the statistics of the home
team. The negative effect of not scaling the input caused some serious problems as is
evident by the fluctuation in the standard deviation of the testing set. The classification
rate of the training set remains constant around 60 percent for all of the different multilayer perceptron structures. The standard deviation did vary a lot and no discernable
pattern in the standard deviation was apparent. Part of the reason is that division by zero
15
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
errors occurred causing problems with the training of the multi-layer perceptron. Further,
the classification rates upon the testing set were around 70 percent but the standard
deviation was large as it ranged from 2.69 to 8.234. Therefore, even though the
classification results are decent, I still wanted to find a more consistent prediction
structure.
To see the performance increase from scaling the input, refer to table 5. This test
used the same data setup by having the difference in statistics for the two teams being the
data. Here, the classification rate of both the training set and the testing set are higher
than what occurred in table 1. For the training set, the jump in classification rate brought
it up by roughly seven percent. The testing set did not improve as much for the
classification rate, but improve is noticeable and it remains more consistent across all the
model structures. The most remarkable observation with this data comes from the
standard deviation. The standard deviation for both of the data sets remains closer to zero
and does not vary as wildly as the standard deviation from table 1. Upon further
inspection, it appears that the multi-layer perceptron structure of 5-3-2 provides the best
predicting model as it has both the lowest standard deviation and the highest
classification rate. If the model were to pick based solely on the training set, then the 59-2 or 5-12-2 structure would have proved to be the best models as they both provide
similar classification rates and standard deviations.
Examining tables 6 and 7 reveal results of using two and three layers of hidden
layers of neurons. In table 6, the analysis of the training set classification rates and
standard deviation would lead one to believe that the neural network structure is better
16
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
than those found using one layer. The classification rate jumps to 93 percent, which is a
drastic increase compared to the single layer of hidden neuron models, and the standard
deviation is very consistent. When you look at the results of the testing set, a decrease in
the classification rates occurs, though the standard deviation rates do not change much.
What is occurring here is that the multi-layer perceptron is overfitting its training data.
The two hidden layer structures with one and two neurons in each layer produce very
good results and very good standard deviations. The multi-layer perceptron structures
with three layers did respond the way I thought they would. My initial thought was that
more overfitting would occur with three layers. However, what I discovered was that at
three layers, the network seemed to get confused with the data. The standard deviations
were once again large, but the classification rates were much lower than what the multilayer perceptron structures resulted in table 1 with one hidden layer and not scaling the
input. With three layers, the testing classifications peak around 62 percent at five
neurons per layer, yet the classification rate is much lower than what was found in
previous one and two layer structures.
The data from the table 3, 4, and 5 emphasize the need for building a correlation
between the season statistics of the two teams. When the two teams’ features are fed in
simultaneously to the multi-layer perceptron the pattern is hard for the neural network to
discover as there are many features being passed in. Once again, the structures using two
and three layers of hidden neurons to predict the outcome of a football game do see some
improvement to bring the testing classification rate up into the low 60s. The standard
deviation on the training set though for these does not differ a great deal from one
17
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
structure to another. Without building a stronger correlation, predicting who will win
football game remains difficult.
Now, after taking all of this data into account, it seems that it is very possible to
predict the winner of a given NFL game. What is necessary is that a strong correlation
must be built to predict the winner. From using the difference between each team’s
average yards gained and allowed, and their standard deviations per game appears to give
a good prediction as to who will win a game. It may be possible to increase the
performance by including average turnovers per game and the team’s standard deviation
per game as turnovers play a big role in the outcome of the game. For now though, this
shows that it is possible to predict the winner of a football game with a reasonably high
accuracy.
18
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Program Listing
bpShell_project.m – Calls the initializing data and sets up and the MLP variables
that determine the number of layers, features etc. Performs 10 test runs of 1 to 15
neurons and stores the classification rate for each and then calculates the mean and
standard deviation for testing and training data sets.
bp_project.m – Actually performs the MLP algorithm
bpconfig_project.m – Takes care of some variable initialization
initData4.m – Initializes the 2004 NFL season data.
initData5.m – Initializes the 2002 NFL season data.
bptest.m – Used for testing the back propagation
cvgtest.m - Used for convergence test
Data Files
outputEnd.txt - 2004 statistics
stats2002.txt - 2002 statistics
Programs can be found at end of printed documented are attached in a zip file for email.
19
Matthew Gray
CS/ECE 539
Predicting Winners of NFL Games with a Neural Network
Newman, M. E. J., and Park, Juyong; A network-based ranking system for US college
Football. Department of Physics and Center for the Study of Complex Systems,
University of Michigan, Ann Arbor, MI 48109. arXiv:physics/0505169 v4 31 Oct
2005
NFL.com - Official Site of the National Football League, National Football League 280
Park Avenue New York NY. 19 Dec. 2005 http://www.nfl.com
Purucker, Michael C; Neural network quarterbacking: How different training methods
Perform in calling the games. IEEE Potentials, August/September 1996.
http://ieeexplore.ieee.org/iel1/45/11252/00535226.pdf?arnumber=535226
Statfox | NFL Team Logs. Statfox Sports Handicapping Community. 19 Dec. 2005
http://www.statfox.com/nfl/nfllogs.htm
20
Download