Result Prediction for Soccer Games CS221 Project Final Report Bo Shen, Shaohan Xu, Wenhao Chen 1 Introduction Soccer is the worlds most popular sport. The unpredictableness of the soccer game makes it extremely exciting to watch and popular for various forms of betting. Different models for predicting the results of soccer games for betting have existed for a long time. Rue and Salvesen [1] established a bayesian network to predict the result of soccer game and based on this, Timmaraju, et. al [2] incorporate key game statistics to improve the prediction performance. Their main idea is to use the recent performance as a key factor to predict the current match. In reality, even though it is a good benchmark to predict a new game, the correlation is not strong. An example from Inter Milan at the end of 2003 shows that after 6-game winning streak, Inter failed to make a single in the next 7 games [3]. Goddard [4] compared studies using goal scored and conceded by each team to predict a match result indirectly and studies using other information to directly predict match result, and he suggested a hybrid scheme could be utilized. Different from existed models, in this project, the performance a team is evaluated from the fundamental factors, i.e. players’ ability and coach ability and integrate the recent team performance. Our data source is a computer game called Football manager (FM). FM is a series of football management simulation games that has highly accurate and valuable resource collected from professional soccer scouts; even the professional soccer club starts to use FM data to assist the recruiting process [5]. As FM is available to us, we are able to construct sophisticated machine learning models. In this project, we aim to use a neural network to predict the results of soccer games. Artificial neural network, commonly referred to as neural network, is a powerful machine learning model that is inspired by nervous system of human brain. Neural networks currently provide robust solutions to complex problems in wide range of disciplines (e.g. classification, prediction, filtering, optimization, pattern recognition) [6]. The prediction of the result of a soccer game can be extreme complicated; it depends on many factors like players skills, coach abilities, home ground away ground effect, team tactics, etc. Thus, we believe the neural network is the appropriate model for the problem. However because of the complexity of the model, we do expect difficult learning process. This report presents a proposed model and an algorithm for tackling the problem. In 1 the section 2, we discuss our proposed neural network model in details and present the backpropagation algorithm that improves the model using existing data. In section 3, results of the proposed model is presented and compared with simpler model like Naive Bayes model. Finally, limitation of this project are discussed and conclusions are drawn in section 4 and 5. 2 2.1 Method Neural Network Model In this section, we describe the construction of our neural network mode with the emphasis on the design of features and feature extractors. In FM, database includes four aspects in soccer games, namely, players’ ability, coach ability, tactics and some other influence factors. Considering the scope and feasibility of this project, only some key factors out of these four aspects are considered in our model. Given the fact that soccer is a game of possession and utilization of court space, the court is simply divided into 9 regions (Figure 1); in the last hidden layer, the output feature vectors include the possession (or control) of each region, and team stability. The former is noted as space features in this report, with each feature has a value of +1 (control) or -1 (not control). It should be noted that all the region stated here is in the perspective of home team. For example, for the space factor of FL, it means the forward left region of the home team, which corresponds to the the DR region of the away team. With this in mind, next we’ll explain the construction of the neural network from the input layer to the space features and team stability feature as shown in Figure 2. φ(x) = [F L, F C, F R, M L, M C, M R, DL, DC, DR, T S(team stability)] For the space features, the raw data are a set of player ability ratings, team formation information, and coach ability ratings. As referring to Football Manager by Sega, the player ability is classified into eight attributes: attacking (A), creativity (C), technical (T), speed (S), physical (P), defending (D), mental (M) and aerial (E). For each team, Players are assigned to certain region according to the team formation (e.g.4-4-2 or 3-5-2). For each region, in each attribute category, the value of the attribute is first calculated as the summation of players’ attributes in this region and players’ attributes in neighbor regions times some contribution weight. Then, to account for coach’s ability, the final team attribute for the region 2 is the summation multiplied by the coach attributes, for instance, the home team attacking is Ah = A × Acoach . To determine the possession of each region, a set of attribute differences between home and away team is calculated in this region, namely Ah −Da , Ch −Da , Th −Da , Dh − Aa , Dh − Ca , Dh − Ta , Sh − Sa , Mh − Ma , Eh − Ea , and Ph − Pa ; and these are our input layer features for the space features. The total nine regions correspond to 90 features in the first hidden layer. For the team stability feature, we consider both short-term stability and long term stability, and other influence factor like home advantage, and weather. According to our knowledge of soccer game, for short-term stability, if the winning variance is small, the team is more likely to have stable performance in the next time. However, if the long term winning variance is small, for instance, a streak winning of 15 games, the team is more likely to have some unstable performance. Home advantage is important; familiar court and cheering from large group of fans usually lead more stable performance. Weather also contributes to the team stability. Extreme weather like heavy rain is highly likely to cause unstable performance. Thus, in the input layer, features includes a short-term stability ( previous 5 games), a long-term stability (previous 15 games), home advantage and weather. Figure 1. Nine court regions 3 Figure 2. Neural network structure 2.2 Backpropagation Algorithmn In 1986, in their paper, David Rumelhart, Geoffrey Hinton, and Ronald Williams demonstrated how neural networks along with backpropagation algorithmn learn far faster than earlier approaches, which enable neural networks to solve problems that had previously been insoluble [7]. Today, the backpropagation algorithm is still the main engine of learning in neural networks. The core of the backpropagation is to express the partial derivative ∂C/∂w, which is the partial derivative of the cost function C with respect to any weight w in the network. In the project, we use the hinge loss function, which has the form C= 1X max(0, 1 − aL (x) · y(x)) n x 4 (1) where n is the total number of training examples; x is the individual training example; y(x) is the corresponding desired output; L denotes the number of layers in the network; aL (x) is the output of the neural network given input x. In stochastic gradient descent method, the cost function can be simplified as C = max(0, 1 − aL (x) · y(x)) (2) The backpropagation is based on two fundamental equations. First, for the error in the output layer, δjL , its components are given by: δjL = ∂C ∂C = L σ 0 (zjL ) L ∂zj ∂aj (3) Second,the error in layer l δ l in terms of the error in the next layer δ l+1 is given by: δ l = ((wl+1 )T δ l+1 ) σ 0 (zjl ) (4) Combining Equation 3 with Equation 4, we can compute the error. We start with Equation 3 to compute δ L , then apply Equation 4 to compute δ L−1 , then Equation 4 again to compute δ L−2 , and so on, all the way back through the network. Along the stochastic gradient descent, the pseudo code of the learning algorithm can be written as: 5 3 3.1 Result and Analysis Neural Network Prediction Results In the preliminary study, using stochastic gradient descent, we trained our predictor based on limited features: the defense, midfield and attack rating for each team. The predictor yielded 35 % error rate (random guessing yields 50 % error rate). Additionally, the correctness of soccer betting company bwin was analyzed. The correctness of the betting company is around 87.5%. As for our project, the target correctness was set a reasonable value of 75%. As we focused on the games in Serie A, we trained our predictor specifically for 2012 2013 season of Serie A. Each of the 20 clubs in the league played thirty eight games, and in total, 380 were played that season. Our data set covers all the 380 games happen that season. We divided the data set randomly into a train set, which covered 70% of the games, and a test set, which covered the rest 30% of the games. After running 94 iterations, the training error converged to around 20%, and the test error was about 25% (Figure 3). As result, we was able to reach our target correctness rate. 6 3.2 Principal Component Analysis To test the feasibility of adopting simpler classifier like SVM and K-means, we ran PCA to reduce the feature space into less dimensions and to see whether visual patterns exist. As an example, by using two principal components as in Figure 4, data points of wins (circles) and losses (crosses) are randomly scattered and mixed; no clear patterns separate wins and losses points. Therefore, for this project, we believe the neural network is more suitable and didn’t proceed further to use SVM and K-means. 3.3 Naive Bayes To test of superiority of neural network, we also built a Naive Bayes classifier and compared the results of Naive Bayes predictor with results of neural network predictor. In the Naive Bayes model, the space features are the same as that in the neural network. For a particular delta in a given region, it is classified to be one if it is larger than zero; otherwise its 0 (e.g. if ∆AD = 3.52 in the neural network, then ∆AD = 1 in this model). Laplace smoothing was introduced to eliminate the problem of probabilities being estimated to be zero. Ten-fold cross-validation using total 380 games was used to assess the performance of Naive Bayes predictor. The result is as shown in Table 1. Table 1: Test Error of Naive Bayes Subsample Test Error 1 0.44 2 0.39 3 0.39 4 0.39 5 0.5 6 0.28 7 0.5 8 0.39 9 0.39 10 0.44 Average 0.41 7 Figure 3. Change of training and test errors through iterations using neural network Figure 4. PCA analysis using two principal components 8 4 Discussion The results indicate that the proposed neural network model could perform reasonable good predictions for the game results. However, the proposed model does not incorporate some important influence factors. First, uncertainty is one of the most important parts in soccer game. Unusual performance of players, small mistakes accidentally made by players or referees, wind speed, moisture in the court grass surface, audience response and other uncontrollable factors can unpredictably influence the result. All this effects can not be captured in other model. A probabilistic model could be utilized to incorporate these factors. Second, even though the team short-term and long-term performance are considered in the proposed model, only the match results are considered. A more detailed evaluation of team performance could indeed provide a better prediction. Game statistics, such as goals, corners and shots on target [2], can reflect the team performance very well. Third, as the objective of this project is to predict the match result before the match start, it is reasonable that some critical runtime data is not considered in this model. These may include coach touchline instructions, injury in the match, and team discipline. 5 Conclusion In conclusion, we are able to predict soccer game results with test error about 25% and achieve about 10% improvement from baseline. Compared with the Naive Bayes model (60% Error), the neural network performs far better. The neural network may be further improve by incorporating detailed evaluation of team performance (e.g. goals, corners and shots on target). References [1] H. Rue and O. Salvesen, “Prediction and retrospective analysis of soccer matches in a league,” Journal of the Royal Statistical Society: Series D (The Statistician), vol. 49, no. 3, pp. 399–418, 2000. [2] A. S. Timmaraju, A. Palnitkar, and V. Khanna, “Game on! predicting english premier league match outcomes,” 9 [3] “Serie a fixture.” hoscored.com/regions/108/tournaments//5/Seasons/3512/Stages/6739/ Fixtures/ItalypSerie-A-2012-2013. Accessed: 2014-11-30. [4] J. Goddard, “Regression models for forecasting goals and match results in association football,” International Journal of Forecasting, vol. 21, no. 2, pp. 331–340, 2005. [5] R. Bleaney, “Football manager computer game to help premier league clubs buy players.” http://www.theguardian.com/football/2014/aug/11/football-manager-computergame-premier-league-clubs-buy-players. Accessed: 2014-11-30. [6] S. Haykin and N. Network, “A comprehensive foundation,” Neural Networks, vol. 2, no. 2004, 2004. [7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” Cognitive modeling, 1988. 10