1 NFL Game Simulator Oliver Reimer, Matthew Crites, & Brian Jones University of North Carolina Wilmington Abstract Our goal was to accurately simulate a National Football League game, given a number of meaningful statistics. To find how important each stat was to the result of a game we multiplied each stat by a weight. This implies an optimization problem; which combination of weights would give the best win percentage of NFL game simulations. We used a genetic algorithm to optimize these weights. The weights our algorithm gave us came close to 70% accuracy in predicting a past NFL season. Our algorithm consistently favored high weights for fumbles recovered, and interceptions, while consistently gave low weights to first downs allowed. However there is still room for improvement to our algorithm; more stats can be considered and different ways of evaluating the cost of our stats can be considered. We used a relatively simple algorithm, and a small subset of statistics in an NFL game; this leads us to believe more ground can be covered, and better algorithms can be created to forecast the winner of an NFL game. Key Words: Genetic Algorithm, National Football League, Optimization, Simulation, Weights 1. Introduction This project began as an amalgamation of our collective interest in football, statistics, and algorithms that potentially could be used to predict a National Football League game. Coming into the project we knew there were many, factors that could play into an NFL game. Many of those factors are difficult to classify numerically. It is difficult to quantitatively represent every aspect of an NFL game. Our goal was to take the crucial factors of an NFL game and factor them into an algorithm that would, as accurately as possible, predict an NFL game. For any algorithm to forecast a future event, background knowledge is needed. The background knowledge was based on statistics. Obtaining stats is not difficult for professional sports; they are meticulously recorded and for the most part are obtainable online. Our primary focus was to figure out which stats are the most important and how they affected an NFL game. We approached this as an optimization problem, where each stat was assigned a modifier. Our goal was to optimize each modifier so that a given combination of weighted stats would select a winner between two NFL teams. Our chosen weapon of optimization was a genetic algorithm. 2. Formal Problem Statement Given two teams denoted as πππππ» and πππππ΄ determine the winner of a NFL game. This is done through statistical analysis using common NFL statistics and assigning them a specific weight. Because each statistic is weighted differently, those with heavier weights indicate that they have a larger impact in the outcome of a game than those with lighter weights. To test the algorithms, the statistics from the previous 16 regular season games for all teams were used. The statistics used for offense and defense were as detailed in Table 1. All stats were based on pergame averages of the team over the past 16 regular season games. Offensive Statistics Passing Yards Rushing Yards Points Fumbles Lost First Downs Interceptions Defensive Statistic Passing Yards Allowed Rushing Yards Allowed Points Allowed Fumbles Recovered First Downs Allowed Interceptions Table 1 This allowed a controlled environment where game outcomes were already known, making algorithm validation more accurate. 3. Background Information Our project was conceived through our collective interest in sports, and how a computer might be able to predict the outcome of a game. We were able to find a number of methods previously used to some degree of efficacy. Our research started with Jack David Bundell’s use of a linear regression model. He states, “As well as using data such as the two teams’ previous results, novel features such as Stadium size and the distance the away team had to travel were incorporated” [3]. This gave us insight into how much of a factor a small number of stats could be. Blundell in his research used linear regression models to predict the outcome of a game. Interestingly, he found that choosing a winner with a bias towards home teams was generally more accurate than basing 2 a victor off of previous matchups between the teams. This gave us the notion that home field advantage should be taken into account when predicting the outcome of a game. We also gleaned information from the algorithm used by Ed, from Ed’s tech cave. Ed hypothesized that: Prediction relies on estimating two numerical characteristics per team, an Offensive Strength Factor (Fo(t)) and a Defensive Strength Factor (Fd(t)). The model assumes that the score (S) achieved by a team 'A' when playing against team 'B' is given by S(A) = M * (Fo(A)/Fd(B)) [2]. We were able to take this model and adapt it to our genetic approach which rewarded us with very similar results as Ed; we found this encouraging. We wondered how our algorithm stacked up against other methods of predicting a game, such as going off intuition or plain old guessing. The following website helped us with this how our algorithm stacked up to other forecasting styles, some computerized, some not. The staff of Betting Expert did some algorithm work predicting the results of soccer games. The algorithms predicting the outcome of soccer games were pretty consistent with the betting odds. This gave us hope that with more advanced analytics, and more stats examined we can get closer and closer to an effective football forecasting algorithm [1]. We were not able to find any public research claiming greater than 80% accuracy in forecasting a game. Our research indicates that using basic stats can give a high degree of accuracy; we feel that utilizing more advanced stats and analytics can only improve the percentage of correct predictions. 4. The Method To determine the winner of a game we used the equations: π¬π― = πΆπ― π«π¨ π¬π¨ = πΆπ¨ π«π― Where ππ» is home offensive rating divided by away defensive π·π΄ rating making the home team’s edge πΈπ» . Likewise the away offensive rating, shown above as ππ΄ , divided by the home defensive rating, π·π» , will render the away team’s edge πΈπ΄ . When the home team’s edge is greater than the away team’s edge, the home team is the predicted winner. If the away team’s edge is greater than the away team is the predicted winner. Defensive and offensive ratings are determined by normalizing and summing their prospective statistics: Normalized offensive statistics = Normalized defensive statistics = ππππ πππππππ ππππππ πππππππ ππππππ πππππππ ππππ πππππππ For example, if a team had 1,000 rushing yards and the league average rushing yards was 2,000, then that team would be given a rushing yard rating of 1,000/2,000 = 0.5. However, if the teams rushing yards allowed is 1,000 and the league average rushing yards allowed is 2,000, then that team would be given a rushing yard rating of 2,000/1,000 = 2.0. It is important to have the league average in the numerator since a team always wants to be under the league average in terms of defensive statistics. 5. The Genetic Algorithm Genetic algorithms are excellent for solving optimization problems. These algorithms can be compared to the biological evolution of a population. At the start, a set of solutions is compiled into an original starting population. Each solution is rated by its effectiveness in solving the problem. Attributes from the better rated solutions are combined to form new solutions. After many iterations, or generations, of this process, an optimal or near optimal solution is found; however, the gene pool can easily stagnate if initialized with poor solutions. To prevent this stand still in the population, mutations are introduced. These random changes in the solution set keep the population ever-changing. Algorithms of this nature are often easy to implement, but produce great solutions to optimization problems. Being tasked with optimizing weights, a genetic approach was a perfect solution to the problem. First, a population of solutions must be prepared. The number of parent solutions can vary. In our simulations, the starting pool of solutions consists of 12 solutions. Each solution was a set of randomly generated weight values. The weight values ranged from 0.0 to 1.0 exclusive. The random set of solutions was then ranked based on fitness criteria. There were four different fitness ratings used which will be explained in more detail below. After each solution was rated, the higher rated solutions were merged and the lower rated solutions were replaced. Given n solutions, each solution ranked n/2 to n would not be present in the next generation of solutions. The solutions were merged by taking the average of the weight vectors. For example given the sets of weights x and y: X Y 0.3 0.5 0.6 0.4 0.4 0.6 0.9 0.7 0.1 0.3 3 merging set x with set y will produce a set z: Z 0.4 0.5 0.5 0.8 0.2 The initial population of solutions was bred to form the first generation. This process was repeated for a set number of generations. Our simulation used five generations. At the conclusion of the fifth generation, the set of weights with the highest fitness rating was kept and used in the simulation of the game. 6. The Rating System Two different methods were used to rate the solutions which the genetic algorithm produced: edge magnitude and simulation. Edge magnitude is simply the absolute value of the difference of each team's edge in a particular matchup: | πΈπ» - πΈπ΄ | This method was designed to look at how evenly matched two teams appeared statistically. A lower edge magnitude implies that one team did not have a huge edge over the other. Whereas a higher edge magnitude indicates that one team had the clear advantage in the matchup. The simulation method differed from the edge magnitude method because it took into account results from the previous season. The set of weights needing rating was used to simulate each one of the 257 games of the 2014 NFL season. Since the outcome of these games is already known, the results could be classified as accurate or inaccurate. The percentage of these games the set of weights predicted accurately was the rating the solution was assigned. For each of these two methods, the ratings were sorted both in ascending and descending order. The order in which the solution pool was sorted allowed the simulator to breed for different solutions. For example, by order the solutions by edge magnitude in descending order, the simulator, in theory, was searching for a set of weights which produced a simulation with a lopsided result (a high edge magnitude). Sorting the pool by edge magnitude in ascending order would search for weights that produced a close result (a low edge magnitude). 7. The Perfect Weights At the conclusion of five generations of the genetic process, one solution was given. This solution is near optimal but is not always perfect. To ensure the simulator runs with the most accuracy, the genetic approach was expanded. For each condition, (edge magnitude and simulation both ascending and descending) the genetic algorithm was executed 1 million times. The average of each solution out of all 1 million trials was calculated. This average set was coined the perfect weight for each condition. The perfect weights that were determined are outlined in Table 2. The variables SIMD, ASC, DES, and SIMA relate to the fitness rating conditions simulation descending, edge magnitude ascending, edge magnitude descending, and simulation ascending, respectively. OFFENSIVE PPG FLPG FDPG PASS_YPG INTPG RUSH_YPG SIMD 0.586039 0.387547 0.530422 0.547612 0.590935 0.496791 ASC 0.4899 0.45325 0.490675 0.477683 0.513831 0.506735 DES 0.532413 0.464036 0.538195 0.534158 0.470165 0.536802 SIMA 0.501823 0.507713 0.505403 0.505435 0.506435 0.497627 DEFENSIVE PAPG FRPG FDAPG PASS_YAPG INTPG_D RUSH_YAPG SIMD 0.603601 0.286293 0.57432 0.59289 ASC 0.59359 0.383621 0.583781 0.573764 0.43375 DES 0.322485 0.711595 0.353089 0.383612 0.594453 0.358635 SIMA 0.363092 0.637764 0.352727 0.354634 0.727386 0.361065 0.327986 0.539183 0.580627 Table 2 8. Results Our genetic algorithm was a success in the fact that we are able to correctly predict NFL games with a reasonably high level of accuracy - around 70%. How the accuracy of the results of the simulation were heavily dependent on the fitness rating method used in by the genetic algorithm. A detail account of the results of the simulation is present in Table 3. Simulation Descending Edge Magnitude Ascending Edge Magnitude Descending Simulation Ascending 48.8% 50.4% 70.7% 70.7% Table 3 The edge magnitude descending and simulation ascending conditions produced the best simulation results. These results differed from the hypothesized results. The edge magnitude descending condition was intended to mimic a lopsided match-up between two teams. While a skill gap exists between certain teams, the gap is normally small. All players in the league are professionals and blowouts are rare occurrences. This condition was hypothesized to represent the antithesis of the reality of the league; however, it produced the best results. 4 The simulation ascending condition was designed to select weights which produced poor results in our simulation tests. Because the results were sorted in ascending order, the solutions with the lowest percentage of correctly picked games were kept and used. Given that in development these weights performed poorly, the final results were expected to be poor; however, like the edge magnitude descending condition, great results were achieved. Our algorithm deemed the stats highlighted in yellow (in Table 2) as the most important (higher weights), and the stats highlighted in red (in Table 2) as the least important. These stats were selected because in the two best conditions (edge magnitude descending and simulation ascending) The weights had a great variance from the mean weight. It was encouraging to find the weights match up with what we thought some of the most important stats are. For instance, we hypothesized that first downs allowed per game (FDAPG), shouldn’t be weighted very high. We based the hypothesis upon the bend-but-not break defensive philosophy in football; this makes sense because first downs do not score points. Using similar logic, fumbles recovered and interceptions were given high rankings because these stats create more possessions for a team. The more possessions a team has throughout the course of a game, the more opportunities they will have to score. 9. Future Work Future work will consist of being able to add statistics to the algorithm. It could be argued that other statistics are equally important in dictating a winning team. This leads into being able to use individual player statistics rather than just team statistics. Considering individual players can considerably change a defense or offense rating depending on particular matchups. An example of this would be if the leading NFL rusher is not playing due to an injury and his team is playing against an average rated rush defense. If the leading rusher was playing you could predict that the possibility of touchdowns would be greater which would give a higher edge value for that team. Taking weather into consideration would be part of our future work as well. Some teams have home field advantage due to the climate in which they practice. A team from the north will be acclimated for cold weather giving them the edge over a visiting team from the south. However this is reversed when the northern team as to go to the south to play being the northern team will be more prone to dehydration and fatigue. This effect is even more evident when a team accustom to playing indoors is forced to play outside in freezing weather or other sub-optimal conditions. 10. References [1]http://www.bettingexpert.com/blog/how-to-builda-football-game-prediction-model [2]http://www.edscave.com/football_algorithm/footb allalgorithm.htm [3]http://www.engineering.leeds.ac.uk/eengineering/documents/JackBlundell.pdf [4]http://www.pro-football-reference.com