Sites I Adam Sites Senior Project Monte Carlo Methods Dr. Frey and Dr. Volpert Ranking Division 1-A College Football Teams At the end of every regular season, the best of the best in college football wait to see the BCS rankings to determine which bowl game they will play in and their respective opponents. For those unaware, BCS stands for Bowl Championship System, and it is a complex formula designed to determine the top 25 college football teams over the course of the year. Over the course of this semester, we have learned how random walks generated by the program R can be used to predict certain models. As part of my senior project, I looked to create my own ranking system based on a series of random walks based on the paper Random Walker Ranking for NCAA Divison I-A Football by Callaghan, Mason, and Porter. After studying their system, I attempted to create my own twist to that system in hopes of getting a more accurate ranking based on certain factors. College football is a billion dollar industry that brings in millions of dollars for schools across the country. Attendance is at an all time high, reportedly 49,670,895 people attended games across 639 NCAA schools. That is an increase of almost 3% from the previous year (Hootens). Along with the increase in attendance is the increase in viewers watching from home. Fans turn out in record numbers for bowl games as well. 1,813,215 people attended the 35 bowl games in 2010, in addition to the 134 million viewers tuning in from home. It has been estimated that 1.6 billion dollars was netted from travel and tourism for those 35 bowl games. For colleges across the country, football is seen as a way to promote the school’s image as well as raise Sites II money. According to CNN, of all the teams in the top 6 conferences in division 1 football, only one reported a monetary loss for that season. 3 teams broke even, leaving the other 64 teams to report an average profit of around 15.8 million dollars over the 2010 season. That is over a million dollars a game respectively. Money may not grow on trees, but for colleges across the country they can count on college football to bring in the money. Some people outside of the college football world might question, “What is so important about ranking the top 25 teams correctly? What does it matter if the 2nd best team is technically labeled as the 3rd?” The answer can be found in a variety of explanations. Schools that play in bowl games get their college or university recognized by millions of people at once. This exposure alone will increase applications to that school. Partying and tailgating go hand in hand with college football, so the more successful the team is the more the school’s reputation grows as being a“party”school. As reputation increases there is a corresponding increase in application numbers among graduating high school seniors. Another monetary value is the fact that schools are given money to attend bowl games. Individual bowl game’s sponsors pay for schools to attend and play in their respective games. For example this past year, the Chick-Fil-A bowl paid 3.25 million dollars to the teams participating in their bowl game (O’Toole). Typically the more important the bowl game, the higher the payout. This accuracy in ranking the teams is important. Along with that revenue, the school can also count on merchandise sales to increase. However, not every motivation is strictly about the money. Coaches recruiting high school players into their program will be able to boast about their programs history of winning and attending bowl games. This fact tells the players that the program is successful, and with successful programs typically comes successful athletes. These athletes then in turn are scouted by professional Sites III football teams, so the more exposure the better. Therefore it benefits the recruiting process to attend and win bowl games. Currently, the BCS uses a variety of polls and computer system rankings to determine the official top 25 teams. First we will start with the polls. The Associated Press (AP) Top 25 poll is written by reporters who each rank the top 25 teams in Division 1-A. The coaches of Division 1A also have a poll, where they rank the top 25 teams as well. After the last week of the regular season, the polls are taken and each team is assigned points for where they are ranked in that respective poll. For example, if Coach A thinks Notre Dame is the best team, they will receive 25 points in Coach A’s rankings. If Coach B thinks they are the 3rd best team, they will receive 23 points in Coach B’s rankings and so on. Each team’s total points are calculated across all the coaches and polls and are used to create the team’s percentage of points earned over all possible points. So if UCLA received 1,580 out of a possible 1,625 points by the writers, they are given a 97.2 percentage value. These are added to a computer average collected from 6 separate systems. Each computer system ranks the top 25 teams, however for each respective team the highest and lowest ranking is dropped. The other four rankings are assigned points like the polls, and then those 6 assigned values are averaged to create one final ranking. The team with the highest value is ranked first. Some people believe this current BCS system is flawed and does not accurately represent the best 25 teams and call for a new update system to take its place. Some arguments against the system is that an undefeated team can be placed below a team with 1 or even 2 losses, as well as the fact that subjective voting takes place. Coaches can vote/rank their team number 1 in their respective poll even if they do not deserve to be, and the same can go for reporters. Reporters in the AP poll with ties to schools or coaches can vote for that school and are therefore biased. Sites IV President Barack Obama was asked his thoughts on the current BCS system, and he told reporters that he suggest that an 8 team playoff would be more fitting that the current system (Dufresne). Other criticisms include that the computer systems do not take into consideration lopsided victories and injuries. A star player hurt on a good team could have a huge impact on the outcome of future games, although the computer systems fail to recognize that. In a paper titled Random Walker Ranking for NCAA Division I-A Football by Thomas Callaghan, Peter Mucha, and Mason Porter, they have come up with a unique approach that differs from the BCS model. They have suggested using a random walker to generate the rankings based on wins and losses. I decided to take their model and create it in R to determine how effective it was. In order to begin, I had to record the entire season’s schedule and outcomes from all of Division I-A. Luckily, a fellow named James Howell has posted exactly that data on his website (Howell). I then copied his data into an Excel Spreadsheet and separated the columns into Team 1 name, Team 1 score, Team 2 name, Team 2 score. I read that data into R, and stored each column. Because Division 1-A football has 195 teams, I created 2 matricies that were 195 by 195 in size. The first matrix was created to store opponents and the second was used to score outcomes of those games. In order to fill those matricies, I first needed to break down the columns of the excel file. I created a list of all the teams alphabetically, and assigned them a number from 1 – 195. I then filled the opponent matrix based on the corresponding team numbers. Each row in the matrix corresponded with a unique team, and their opponents numbers will filled in across the row. Each column across the entire matrix corresponded to each week of college football. So if the opponent matrix read [2,6]=120, that meant that in the 6th week of the season team 2 played team 120. For the outcome matrix, I filled a corresponding matrix with 1 and -1 values representing a win(1) or a loss(-1) between those teams. So if team 2 beat team 120 Sites V in week 6, the outcome matrix would read [2,6]=1, while [120,6]=-1. Now I had the entire season’s schedule and results into 2 matrices. Now I created my random walker loop that would eventually rank each team from 1 to 195. To set this up, I first needed to establish how many steps I wanted my random walker to take. I decided on 10 millions steps, which would give him enough steps to traverse the entire Division. I then said for each step from 1 to 10 million, I would start at a team, and randomly select one of their opponents from random from the opponent matrix. After I selected that team, I would check the outcome matrix to check to see which team won. Once that was determined, I would create a random number between 0 and 1. I would compare this number to the probability value that the better team won the head to head matchup, or what I call the p-value. Basically, the p value must be greater than .5, but less than 1. The reason for this is if the p value equals a half that means the team that won was no better than the team that lost and basically wins and losses are a coin flip. If p is 1, that means that the team that won would never lose to the team it beat if it played them again, which is unlikely. Field conditions, injuries, and the nature of competition all point to the fact that a team has some chance of beating an opponent, regardless of how good they are. To better illustrate this walk, let’s take a look at an example. Suppose the walker is at team 2, and randomly selects team 120 as its opponent. Looking at the opponent matrix, we see team 2 has won the regular season matchup between the teams. Now we create a random number between 0 and 1. If it is less than the p value, the walker will remain on team 2 and increase its count by 1. If the random number is greater than p, the walker will leave team 2 and go to team 120, and increase team 120’s count by 1. However, if team 2 had lost the head to head matchup, Sites VI the random generated number would be compared to (1-p). Therefore is the number was less that (1-p) it would stay, and greater than (1-p) it would leave. Now we do this walk for 10 million steps with the same p value throughout the entire process. We count the amount of steps the walker has taken and the count each team has. The higher the count for each team, the higher the ranking they will receive. Below are the results my program produced with their respective p-values. As the p value gets closer to .5, the teams the rankings begin to reflect less of the records and become more of a toss-up. Again, this is what we expected because with the lower p value wins and losses become irrelevant. What is nice about this system is that it does in fact take into consideration strength of schedule. If a team plays better competition who has better records, the walker will be funneled to their location more often than a team with fewer wins. As the walker is located by the teams that win more often, they are also exposed to teams that those teams play. So Wisconsin benefits from playing tougher opponents in the Big 10 conference because as those teams accumulate victories, it allows the walker more exposure to Wisconsin. Thus Wisconsin’s ranking should increase if they were able to beat their competition. What is interesting to note is that with the higher p values, a non division I-A team was actually ranked in the top 25. I believe this to be caused by a “perfect storm” type scenario; early in the season Virginia Tech lost to non-BCS opponent James Madison University. Virginia Tech then went on to have a very good season, and only other regular season loss was to a Boise State team that had 12 wins and only one loss. Therefore due to the fact that James Madison University only played one game against Division I-A opponents, and because they won that game, as well as their opponent having a very good record and beating a strong opponent, they ended up with a top 25 ranking. I imagine that the walker would funnel his way toward Boise Sites VII State and Virginia Tech (due to their high amounts of wins), and eventually fall on James Madison University through Virginia Tech. Once there, the only way he could leave was to attempt to walk back to Virginia Tech, but the odds were against him because James Madison University won that head to head matchup. This anomaly is therefore a flaw in the system, and should be noted. To potentially correct this error, one might look to modify the code in such a way that only games against Division I-A opponents were potential candidates for the walker to go, and everything else would not exist. This might give a slight advantage to teams that played only Division I-A opponents and therefore would have an extra game to their schedule, and thus another opportunity the walker to land on them, but that hypothesis was not examined in this experiment. Looking back at the data, clearly it is obvious that there are distinct changes to the rankings when compared to the BCS formula. It is also clear that the rankings differ when the p value is changed, so then there becomes an obvious question. Which p-value is best? There is no official answer; it lies in how much value you put in a win. Win differential does not matter in this random walk, so a win by 1 is equally as valuable as a win by 35. Therefore it would be up a panel to determine which p-value they want to use so that a win is valued high enough that it gives credit to those with winning records, but not too high that only wins determine the ranking. The more “excuses” one can come up with that caused a team to lose should technically lower the p-value. Bad playing conditions, injuries, playing on a short week, and simply bad play execution are all reasons why a team might lose one game against an opponent yet still feel like they were the better team. For this reason the p-value needs to be more closely. After thinking about the code, I decided that I would look into adding my own twist to the random walk experiment to see how that changed the rankings. I decided to change p-values Sites VIII based on the location of the game. The reason being is that better teams should win at home, and therefore a road win is more valuable than a win on a neutral playing field. I say this because when looking at college football teams at home, they should have an outstanding advantage against their opponent. The elite in college football call home to stadiums that seat a hundred thousand people who are constantly rooting for and cheering for the home team. They also do not have to travel to play, and therefore have the luxury of being at home. Sleeping in their own beds instead of sleeping in a hotel with teammates, eating quality food instead of being forced to eat what was at the hotel buffet, and simply the familiarity with the area can all bring an ease to the player before a big game. For a team to overcome this disadvantage is quite an accomplishment, and therefore I looked to reflect that in my p-value calculation. In order to show this in my code, I decided to create 3 p values. One would be for home games, one would be for neutral games, and one would be for away games. I decided on having the p value at a neutral field remain the same, while I increased the p value for away games and decreased the value for home games. I decided to change the p value half the distance from either endpoint of my interval, so the p values was listed as Code: pneut = .75 phome=(pneut+.5)/2 paway=(pneut+1)/2 Next, I needed to record the location of each game and include that into my data. I modified my original excel spreadsheet to include locations for the game. I followed the exact steps I did with the outcome and opponent matrices to create a home matrix that would represent if the team was Sites IX home, away, or a neutral location. A 0 would represent a neutral location, -1 for a away game, and 1 for a home game. Each row represented a certain team out of the 195 possible teams, and each column represented a given week over the stadium. So if location [54,8] = 0, that meant team 54 played a game at a neutral location on week 8. When I performed my random walk, I added an extra condition in my for loop to determine the location of the game as well as the p value. As I started my function, I called on the matrix to tell me the numerical identifier of the location of the game. Then I called on the outcome matrix to tell me who won, and then called on the random generated number to tell me if I moved to my opponent’s location or not. The R code looks as following... Code: for(i in 1:steps){ #this is for the entire random walk. Steps represents the 10 million steps. whichopp<-sample(1:g[pos],1) #this is saying I wish to select one opponent from the vector of opponents . winlose<-outcomematrix[pos,whichopp] #winlose represents the outcome of the game between the teams homeoraway<-homematrix[pos,whichopp] #this represents the location of the game u<-runif(1,0,1) #this is the random generated number if(homeoraway==0){ #neutral playing field game if(winlose==1){ #and the team the walker is currently at won if(u<pneut){ #if the random number is less than the p value… pos<-pos} #...the walker stays where he is. if(u>pneut){ #if not… pos<-opponentmatrix[pos,whichopp]}} #...he leaves for the other opponent. if(winlose==-1){ #if the team the walker is currently at lost if(u>(1-pneut)){ #then the inverse of the p value is used. If the random number is greater… pos<-opponentmatrix[pos,whichopp]} #...the walker leaves. Sites X if(u<(1-pneut)){ #if not… pos<-pos}}} #...he stays. I will repeat this process for home and away games, and at the end of the function increase both the count for the location as well as the step. That way the walker knows he has one less step to take, and the team’s ranking gets increased by a vote. Once I completed the random walk, I generated the top 25 teams and compared them to the original random walk as well as the BCS’s rankings. The results can be found in the appendix. We run into the same problem regarding James Madison Univeristy, mainly because they won on the road against a very good opponent. The increased p-value for a road win makes it that much tougher for the walker to leave their location and move on to Virginia Tech. My code seems to be a mix of both the original code and the BCS formula. There are a view teams that jump high or low in the rankings, notably Nevada jumping up 7 places while Stanford dropping 7 places. However, I am satisfied with my results. The results show that the top 25 teams are consistent with both the BCS and the original system, only that the ordering of them is different. This is a good fact, because it means no surprise teams “snuck” its way into my rankings. According to my system, the national championship game would be played between TCU and Auburn. The other 34 bowl games would then be seeded according to their specific rules, such as the Rose Bowl being decided by the Big Ten conference champion and the Pac-10 conference champion. Therefore it is my assumption that Oregon would then be part of the Rose Bowl, and would probably play Wisconsin (this is because Oregon and Wisconsin won their respective conferences and were not included in the national championship game). The rest of the bowl games would be decided in this fashion. Sites XI Going forward, I believe the original system created by Callaghan, Mucha, and Porter could be altered like I have done. My idea for changing the p value for location was a start, but it was not perfect. For example, if Penn State were to play UCLA at Villanova, it would technically be a neutral game but might give Penn State a slight advantage because they have to travel a shorter distance and would not have to change time zones. One change I can see that would be fairly easy to make is to factor in margin of victory. A win of 50 points should not be equal to a win of 1 point. Perhaps increasing the p-value based on the margin in which the game was won might be considered. Another idea might to take into consideration how the teams are currently playing at the end of the season. A hot streak to end the season might reflect that a team that began with a slow start but were able to turn it around and win the rest of the games might in fact be the best team. However, this begins a debate over what the national championship game means. Should it reflect the teams with the best overall seasons, or the teams who are playing the best football at the end of the season? Hypothetical questions that have no correct answer makes ranking teams that much harder. Regardless of how the teams are ranked, someone will always be unhappy. Some teams will argue that one system favors one team instead of another, and call for a new system to take its place. I created a system that constructed a random walk with 10 million steps to determine who would be ranked in the top 25. Perhaps my system will be added onto the BCS formula, or perhaps it will take over the BCS system entirely. However, I believe I created a fairly simple, yet complex, system that ranked teams based on wins and strength of schedule. It does have flaws, such as being biased to teams that play more games, however I believe it shows how powerful and creative a Random Walk can be. Sites XII Appendix AP USA BCS Auburn TCU Oregon Stanford Ohio State Oklahoma Wisconsin LSU Boise State Alabama Nevada Arkansas Oklahoma State Michigan State Mississippi State Virginia Tech Florida State Missouri Texas A&M Nebraska Auburn Oregon TCU Wisconsin Stanford Ohio St. Michigan St. Arkansas Oklahoma Boise St. LSU Virginia Tech Nevada Missouri Alabama Oklahoma St. Nebraska Texas A&M South Carolina Utah UCF Mississippi St. South Carolina Maryland Tulsa North Carolina State West Virginia Florida St. Hawaii Auburn Oregon TCU Stanford Wisconsin Ohio State Oklahoma Arkansas Michigan State Boise State LSU Missouri Virginia Tech Oklahoma State Nevada Alabama Texas A&M Nebraska Utah South Carolina Mississippi State West Virginia Florida State Hawaii Connecticut Central Florida Sites XIII Appendix Random Walk Random Walk Random Walk Random Walk Random Walk Auburn TCU Oregon Ohio State LSU Arkansas Stanford Nevada Boise State Oklahoma Alabama Wisconsin Michigan State* Oklahoma St Missouri South Carolina Virginia Tech Texas A&M Nebraska Florida State Mississippi State Hawaii Auburn TCU Oregon LSU Stanford Ohio St Arkansas Oklahoma Nevada Boise State Alabama Wisconsin Oklahoma St Michigan St Missouri Virginia Tech South Carolina Texas A&M Nebraska Florida St Auburn TCU Oregon Stanford LSU Oklahoma Ohio St Arkansas Boise State Alabama Nevada Wisconsin Oklahoma State Missouri Michigan St Virginia Tech Texas A&M South Carolina Nebraska Florida St Auburn TCU Oregon Stanford LSU Oklahoma Arkansas Ohio St Boise St Alabama Nevada Oklahoma St Wisconsin Virginia Tech Missouri Michigan St Texas A&M Nebraska South Carolina Florida St Auburn TCU Oregon Stanford Oklahoma LSU Arkansas Boise St Ohio St Alabama Nevada Oklahoma St Virginia Tech Missouri Wisconsin Michigan St Nebraska Texas A&M South Carolina Florida St Mississippi St Utah Utah Utah Utah* Mississippi St Mississippi Iowa Iowa Iowa Iowa Mississippi St North Carolina St Utah Hawaii Hawaii North Carolina State Iowa North Carolina State p=.95 North Carolina St p=.90 10,000,000 Steps North Carolina St Hawaii Hawaii 10,000,000 steps p=.85 p=.80 p=.75 10,000,000 steps 10,000,000 steps 10,000,000 steps Sites XIV Appendix Random Walk Random Walk Random Walk Random Walk Random Walk Auburn Oregon TCU Stanford Oklahoma LSU Arkansas Ohio St Boise St Nevada Alabama Virginia Tech Oklahoma St Missouri Wisconsin Michigan St Nebraska Texas A&M Florida St South Carolina Utah Mississippi St North Carolina St Hawaii Auburn TCU Oklahoma Oregon Stanford LSU Arkansas Nevada Alabama Ohio St Virginia Tech Boise St Oklahoma St Missouri Wisconsin Nebraska Michigan St Texas A&M Florida St South Carolina Utah Mississippi St Hawaii North Carolina St Auburn Oklahoma Oregon Stanford TCU LSU Virginia Tech Nevada Arkansas Alabama Ohio St Boise St Oklahoma St Florida St Missouri Nebraska South Carolina Michigan St Texas A&M Wisconsin Mississippi St Utah Hawaii North Carolina St Auburn Oklahoma Nevada Virginia Tech Stanford Oregon TCU Florida St LSU Nebraska South Carolina Boise State Alabama Missouri Arkansas Oklahoma St Ohio St Hawaii Wisconsin Michigan St Texas A&M Utah Central Florida USC Central Florida Nevada Nebraska Hawaii Northern Illinois Auburn Miami (Ohio) Oklahoma Southern Methodist South Carolina Virginia Tech Florida St Penn State Toledo Ohio Boise State Iowa Wisconsin Southern Mississippi Idaho Texas Tech Michigan St Ohio St Michigan USC USC USC Mississippi St Georgia p=.70 10,000,000 steps p=.65 10,000,000 steps p=.60 10,000,000 steps p=.55 10,000,000 steps p=.50 10,000,000 steps Sites XV Appendix Team Wins Losses Auburn TCU Oregon Stanford Ohio State Oklahoma Wisconsin LSU Boise State Alabama Nevada Arkansas Oklahoma State Michigan State Mississippi State Virginia Tech Florida State Missouri Texas A&M Nebraska UCF South Carolina Maryland Tulsa North Carolina State 14 0 13 0 12 1 12 1 12 1 12 2 11 2 11 2 12 1 10 3 13 1 10 3 11 2 11 2 9 4 11 3 10 4 10 3 9 4 10 4 11 3 9 5 9 4 10 3 9 4 Sites XVI Appendix Random Walk Auburn TCU Oregon Stanford Oklahoma LSU Arkansas Boise St Ohio St Alabama Nevada Oklahoma St Virginia Tech Missouri Wisconsin Michigan St Nebraska Texas A&M South Carolina Florida St Utah Mississippi St North Carolina St Iowa Hawaii p=.75 10,000,000 steps My System BCS Auburn TCU Oregon Ohio St LSU Arkansas Stanford Nevada Boise State Oklahoma Alabama Wisconsin James Madison* Michigan St Oklahoma St Missouri South Carolina Virginia Tech Texas A&M Nebraska Florida St Mississippi St Hawaii Iowa Utah pneut = .75 10,000,000 Steps Auburn Oregon TCU Stanford Wisconsin Ohio State Oklahoma Arkansas Michigan State Boise State LSU Missouri Virginia Tech Oklahoma State Nevada Alabama Texas A&M Nebraska Utah South Carolina Mississippi State West Virginia Florida State Hawaii Central Florida Sites XVII Works Cited Dufresne, Chris. "The BCS and Barack Obama ... | The Fabulous Forum | Los Angeles Times." Top of the Ticket | Sunday Shows: Ryan, Bachmann, Rubio, Van Hollen | Los Angeles Times. LA Times, 4 Nov. 2008. Web. 01 May 2011. <http://latimesblogs.latimes.com/sports_blog/2008/11/the-bcs-and-bar.html>. Hootens, Staff. "College Football Facts & Figures: Attendance, Viewership Keep Going up Hootens.com." Hootens.com. Hootens, 23 Mar. 2011. Web. 01 May 2011. <http://www.hootens.com/college-football-facts-figures-attendance-viewership-keep-goingup_335_a.aspx>. Howell, James. "James Howell's College Football Scores." Wisc.edu. James Howell, 1999. Web. 01 May 2011. <http://homepages.cae.wisc.edu/~dwilson/rsfc/history/howell/>. Mucha, Peter, Thomas Callaghan, and Mason Porter. "The Mathematical Association Of America." Science 114 (2007): 761-77. Web. <http://people.maths.ox.ac.uk/~porterm/papers/bcsmonthly.pdf>. O'Toole, Thomas. "$17M BCS Payouts Sound Great, but ... - USATODAY.com." News, Travel, Weather, Entertainment, Sports, Technology, U.S. & World - USATODAY.com. USA Today, 06 Dec. 2006. Web. 01 May 2011. <http://www.usatoday.com/sports/college/football/2006-12-06-bowl-payouts_x.htm>.