write_up

Soccer Match Predictor The short program uses a bunch of input parameters to predict the outcome of a soccer match. The essential idea is to give players ratings, then introduce an element of randomness into their performance, and finally pick a winner. The main program declares eleven players for each team, with the positions of the declared player objects corresponding to the formation played by the team. Then the program assigns each player a randomized rating, and predicts the outcome using an overall team rating which simply corresponds to the sum of all the random player ratings. Each position class is derived from a player base class. This is because all players of different positions have certain characteristics in common. They all have an intelligence, a “spirit” (determination) and an athleticism rating, all of which are equally important to any player. Each player has a base rating corresponding to the average of these three basic characteristics. They also all have a normal distribution associated with their overall rating, which is a way of incorporating the fact that no player performs perfectly consistently. This normal distribution is defined in the <random> header file, which adds random number functionality to C++11. All the players also have an overall positional rating, defined as a weighted average of all their position-specific characteristic ratings. This positional rating is weighted depending on the style the team plays. This is what the style data member corresponds to in the player base class. If the player has style one (1), he plays for an attacking team, and if the player has style two (2), he plays for a team that employs a defensive system. If the player is in an attacking system, his attacking attributes are weighted more heavily in the calculation of his positional rating, and vice versa if the player plays in a defensive system. This feature was included in order to incorporate the factor of overall team chemistry. There are many teams with essentially equally talented players, but the most successful teams buy the right players for a particular system. There have been many instances where on paper, just based on the quality of individual players, a team should have been contending for championships, and ends up delivering a totally mediocre season. This model includes this sports phenomenon. Also, attacking players tend to perform more inconsistently, as attacking soccer requires more risk and is often more difficult to execute. However, attacking players also have more freedom, and therefore do extraordinary things more often. Therefore, the standard deviation associated with the normal distributions of attacking players is greater than that of a defensive player, who is more consistent, but less likely to perform significantly above or below his average. This is taken into account by the “create_distribution” method, which assigns the standard deviation and average to be used in constructing the normal distributions of each player. The friend template function “set_rand_rating” picks a value randomly from the normal distribution data member of each player and assigns it to the random_rating data member of the corresponding player. It is this rand_rating data member that is most significant in determining the winner/loser of the match. First, each player of each team is assigned a rand_rating. Then, each team is assigned an overall team rating, which is just equal to the sum of all the rand_ratings of each player on the team. The team with the highest team rating wins the match. In order to test the model, a trial simulation was made using Arsenal F.C’s and Chelsea F.C’s current 2014-2015 rosters, with starting 11’s chosen and rated by myself. Chelsea was the Premier League Champion this season and the stronger of the two sides, although not by much. Arsenal finished in 3rd. Therefore, the average player ratings of the Chelsea players are a bit higher, implying that if the model did not include randomness, Chelsea would simply win very game. Of course this is not realistic, so the Gaussian player-rating probability distributions were introduced. However, these needed to be tweaked, for if they were chosen too close to one another, whichever team had the higher team rating would win just about every time. Much stronger teams lose to even much weaker teams relatively often, a fact that needed to be considered. For the trial simulation, Chelsea and Arsenal are fairly evenly matched, with Chelsea having a slight advantage. The model was tuned in order to produce statistical distributions that realistically correspond to hypothetical matches between these two teams. The histograms below show results for 4 runs with different values for the standard deviation of the attacking team, which in this case is Arsenal. The Chelsea standard deviation was selected to be 1.5 rating points, a reasonable number considering player ratings are out of 10 possible points. The Arsenal rating was adjusted from this base value. Each histogram corresponds to one simulation, each of which consists of 1000 program runs, each of these runs themselves being 100 games between the two teams. (a) (b) (d) (c) (d) The histograms are distributions of the variable corresponding to the difference between the number of Chelsea wins and the number of Arsenal wins, i.e (Chelsea Wins) – (Arsenal Wins). A positive value means Chelsea won more games out of 100, and vice versa. The first plot has a mean value of 41, meaning that in this trial Chelsea won on average 41 more games out of 100 than Arsenal. This is unreasonably high considering the high and comparable quality of both teams this season. The standard deviation for the Arsenal team was increased to improve this result, permitting Arsenal to more significantly over-perform in the model. The Arsenal standard deviation corresponding to plot (c) is, on the other hand, probably too high. It is very unlikely that out of 100 games Arsenal would ever win more games than Chelsea. The values in (b) are probably closest to a realistic outcome. Out of 100 games, Chelsea wins on average about 30 more times than Arsenal, and Arsenal essentially never wins more games out of 100. The plots below correspond to the four simulations plotted in the histograms above, and help to demonstrate the amount of randomness in the results. These consider only the first 100 of the 1000 runs so that the fluctuations can be more clearly seen. The plots on the left show the fluctuation of the results over time, (the 100 points are connected by a smooth curve to emphasize oscillation, and note again that each point on the plots corresponds to 100 games between the two teams.) where time is an implicit variable and runs from left to right, meaning the leftmost points of the plot were calculated earliest. The plots on the right are simply scatter plots of the first 100 runs. In the future, it would be interesting to run the model for two teams where the attacking side is the slightly stronger of the two teams and see if the standard deviations chosen for the model also produce accurate results for that scenario. This seems like the most likely outcome, but it would be important to verify this. Otherwise, the model would have to be redesigned, as it is of course intended to accurately predict the outcomes of football matches between teams of all types. It would also be interesting to build a model with subtler team styles. For example, instead of just attacking or defensive, one could add to these styles, such as attacking possession-based style, or defensive counter-attack style. Since football managers often change such specific team tactics from game to game, this would allow one to use the model to predict outcomes of individual games more accurately, by incorporating the tactics each manager most probably will use for the specific opponent. The model may be a little input-intensive, but could possibly be turned into an interesting phone application or something along those lines with further development and testing.

write_up

Related documents

Products

Support

write_up

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib