Alberto Palacios Pawlovsky 1 SPM Introduction Soccer is one of the most popular sports in the world and with baseball is one of the two most popular sports in Japan. In soccer, two teams, each one of eleven players, try to put a ball into the adversary goal that is defended by a goalkeeper who is the only player that can touch the ball with the hands within a restricted zone of the game area. The other players can only kick or head the ball inside all the game area. In almost all the tournaments of soccer a team is awarded three points if it wins, zero points if it is defeated, and each of the contenders is given one point if the match ends in a tie in the score (Brillinger (2010)). The distribution of goals in soccer has been the focus of research and Reep, Pollard, and Benjamin (1971) showed that the number of goals scored by a team would follow a Negative Binomial distribution. Maher (1982) contended this finding and instead used a Poisson distribution and defined the mean of the goals scored by a team as the product of its attack strength and defense weakness. His model can also be used to predict scores. For game outcome prediction, we can also use ranking. Ranking systems are used in some sports to select or seed teams for pre or post season tournaments (Harville (2003)). Ranking teams before a match has also value to managers because it could help them in choosing defense and offense strategies or the starting players for a game. The problem of rating teams and forming a ranking has been studied for a long time and people from diverse disciplines and backgrounds have proposed several methods. The creation and open availability of databases for almost any popular sport has also fostered the development of many computer-based ranking systems. One that has attracted attention is the method proposed by Colley (2002) since it is one of the computer rankings used for the College football’s Bowl Championship Series (BCS). His ranking system can also be used for prediction. Ingram (2007) work using linear algebra is the base of the ODM (Offense Defense Model) of Govan, Langville, and Meyer (2009). ODM is a model based on defense and offense ratings that can be used for prediction too. Stefani (2008) developed a least squares approach to predict the scores in rugby and soccer games using also, like Maher, defense and offense ratings of the teams. Ranking is in itself a constant evolving area. We have models in the area of information processing, like the one of Callagham, Mucha, and Porter (2007), that could also be applied to sports. For soccer, we have the work of Hallinan (2005) that ranks national soccer teams using a modified Bradley-Terry Model (Bradley and Terry (1952)). We even have works that use computer voters to determine rating and rank (Gleich and Lim (2011)). One characteristic almost common to all computer based methods is that they use the information currently available in sports association sites. This paper intro1 Alberto Palacios Pawlovsky SPM duces two metrics we have developed to rate soccer teams. We evaluated the quality of these metrics rating and ranking Japanese university soccer teams and using the resulting rank for predicting the outcome of soccer games in the first and second division of JUFA (Japanese University Football Association). 2 Scores and Points Metrics (SPM) In the case of soccer the usual and minimal information available in association sites, for tournament games, is the date of the matches and the corresponding scores. We have been studying several metrics based on this basic information to use them in rating teams, rank them and predict the outcome of future games. We propose two performance metrics and one way to rate a team. One of the metrics uses the goals scored by a team and the other one the points that the team gets up to a given match, so we will call them scores and points metrics (SPM) in what follows. 2.1 SPM and Rating We will explain our metrics using two teams, i and a that has confronted each other in the k-th match day. We will express the points gained by team i before this game and those of the a (adversary) team by pk−1 by pk−1 a . Their initial values, before i the start of the season, will be p0i = 0 and p0a = 0. We will use the point rules of soccer tournaments, so if the i team wins the k+ 3. th match, it will earn three points, and its points will be given by pki = pk−1 i However, if the game ends in a tie, the team points will be given by pki = pik−1 + 1, and if it loses by pki = pk−1 . The total number of points possible to be earned up to i the k-th match, for any team, is given by equation (1). k ptotal = k×3 (1) One of our performance metrics evaluates the points gained by a team relative to all the points that it could have earned. We call it the points metric and is defined by the following equation. pk (2) pi,k = k i ptotal We also measure the performance of a team by its goals. The goals of the i team in the j-th game are expressed by gi, j and all its goals up to the k-th match are given by equation (3). k gki = ∑ gi, j j=1 2 (3) Alberto Palacios Pawlovsky SPM In the same way we express the goals conceded by a team in the j-th game by cgi, j and its total number up to the k-th game by equation (4). k cgki = ∑ cgi, j (4) j=1 So the total number of goals scored and conceded by team i is given by the following equation. If tgki is 0, it is set to 0.1 to avoid the division by zero in some special cases. tgki = gki + cgki (5) Our second metric measures the performance of team i, up to the k-th game, using what we call the scores metric (s) of a team, which is defined by equation (6). si,k = gki tgki (6) We use the above two metrics (equations (2) and (6)) to rate a team, up to the k-th game, according to the formula given by equation (7). rik = si,k × pi,k (7) We use the rating of a team to compare it to other teams and if needed rank them. We have also studied and evaluated the individual effects of the scores and points metrics when rating. In the case of using only the score metric, equation (7) becomes, rik = si,k (8) And when using only the points metric the rating is given by, rik = pi,k (9) We evaluated our metrics combining them, as in equation (7), but using a weighting factor (w) to measure their effect on rating. For the evaluation, we used the following (modified) rating. rik = (si,k )(1−w) × (pi,k )w (10) When w is 0 we have equation (8) and when it is 1 we get equation (9). The evaluation was carried setting w to 11 values, from 0 to 1 in increments of 0.1 and using the ratings for game outcome prediction. The results are detailed in the following subsections 3 Alberto Palacios Pawlovsky 2.2 SPM Weighted SPM Evaluation : Prediction As indicated above, we used in the evaluation of our metrics the data of the last twelve years (1999∼2010) of the first and second divisions of the Japanese University Football Association (JUFA, Kanto League). It has the characteristic that almost all its games are played in neutral stadiums, with none or negligible home advantage. We used the public data in the site of JUFA (2011). The rules governing JUFA have changed over the years, and the data collected have the following characteristics. The first and second divisions of JUFA had only 8 teams in 1999 and 2000, and the teams played only one game against all other teams in those seasons. From 2001 to 2004, the teams played also a return game and the season’s games were divided into two terms. Since the 2005 season, the number of teams per division grew to 12 teams. In all these seasons, all the teams played the same number of games before confronting an adversary. The only exception in the data is season 2006, second division. In this season, one team was suspended in the middle of the second term and that year we had only 119 games in that division. The data gathering process required the parsing and processing of all the corresponding match day pages. We used for it tailored programs written in Python. Table 1: Prediction Results: Detailed Example (by match date, w = 0.5) 1999 Season : 1st Division match w = 0.5 0/4 League 1 2 0.00% 2/4 League 1 3 50.00% 1/4 League 1 4 25.00% 1/4 League 1 5 25.00% 3/4 League 1 6 75.00% 2/4 League 1 7 50.00% 9/24 Total: 37.50% JUFA’s games are all scheduled weekly, so for all the weights (w) the scores and points metrics were computed using weekly results. We predicted only the results of the games after the first match day (from the second game onward). One 4 Alberto Palacios Pawlovsky SPM sample of the detailed results, for one season, is shown in Table 1. Match days are represented in a League x y format. Where x is the division and y is the match day. We used equation (10) to rate each team before its k-th match using data up to its previous (k-1)-th match. We then used those ratings to determine the winners of the k-th match day. We have not used pre-season data neither other data to improve the predictions. The total results for the first division and all weights are shown in Figure 1. It shows that the best setting is w = 0.5 for the twelve years span (an equal weight for the s and p metrics). However, if we see the details of Table 2, for the 1999 and 2000 seasons, all the weights between 0.1 and 0.6 will give the same highest prediction percentage. Also, for the seasons between 2001 and 2004, the best weight is 0.9. Moreover, for the contemporary data (2005 onward) the best figures are obtained Figure 1: Weighted SPM : Foresight Prediction Percentages (1st Division JUFA). with w set to 0.0 (only using the scores metric) or 0.1. Figure 2 shows the results for the second division. Those results show a slight different distribution. The best setting is for w = 0.6 and second is w = 0.5. Taking the details of Table 3, we can determine that for the 1999 and 2000 seasons, the best prediction percentages are obtained with the values of w between 0.0 and 0.6. Also, for the seasons between 2001 and 2004 the highest prediction percentages are obtained with w set to 0.6 or 1.0 (second best values are for 0.5 and 0.7). And if we limit the spam to the seasons after 2005, the best values are obtained with 5 Alberto Palacios Pawlovsky SPM w set to 0.3, 0.5 and 0.6. The best weight values for both divisions hint that the best combination of the score and points metrics is with an equal weight for both metrics. Figure 2: Weighted SPM : Foresight Prediction Percentages (2nd Division JUFA). Table 2 shows the details per season of the prediction results for the first division of JUFA. As it has already been shown in Figure 1, for all the twelve seasons, the best total is obtained with w = 0.5. If we look at the best values per season (in boldface) of this table, the setting with the highest number of seasons with best values is w = 1. It has five seasons with best values (1999, 2001, 2003, 2005, and 2009), with two of them on the contemporary range. The next best setting is w = 0.7, with three seasons with best values and all them in the contemporary range. Three values of w share the third position, 0, 0.5 and 0.6. Each one has three best value seasons with two in the contemporary range. Table 3 shows the details per season of the prediction results for the second division of JUFA. The best values per season are also highlighted in this table. For the span of twelve seasons, the best setting is w = 0.6. It has six seasons with best values of which three of them are in the contemporary range (2005 onward). The next best setting for w is 0.7 with five seasons with best values and two of them in contemporary seasons. The third place is for w = 0.3 with four seasons of best values and three in the contemporary seasons. The values for w at the four place are 0.5, 0.8 and 0.9, each having four seasons of best values with two of them from 6 7 Total 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 Year 0 8/24 33.33 12/24 50.00 19/52 36.54 24/52 46.15 23/52 44.23 30/52 57.69 56/126 44.44 69/126 54.76 66/126 52.38 59/126 46.83 62/126 49.21 63/126 50.00 491/1012 48.52 0.1 8/24 33.33 13/24 54.17 19/52 36.54 24/52 46.15 22/52 42.31 30/52 57.69 55/126 43.65 67/126 53.17 65/126 51.59 62/126 49.21 62/126 49.21 64/126 50.79 491/1012 48.52 0.2 8/24 33.33 13/24 54.17 20/52 38.46 24/52 46.15 23/52 44.23 30/52 57.69 56/126 44.44 64/126 50.79 65/126 51.59 61/126 48.41 62/126 49.21 64/126 50.79 490/1012 48.42 0.3 8/24 33.33 13/24 54.17 19/52 36.54 25/52 48.08 24/52 46.15 29/52 55.77 55/126 43.65 64/126 50.79 64/126 50.79 60/126 47.62 62/126 49.21 64/126 50.79 487/1012 48.12 0.4 9/24 37.50 12/24 50.00 19/52 36.54 24/52 46.15 26/52 50.00 30/52 57.69 56/126 44.44 63/126 50.00 64/126 50.79 60/126 47.62 63/126 50.00 64/126 50.79 490/1012 48.42 weight (w) 0.5 0.6 9/24 9/24 37.50 37.50 12/24 12/24 50.00 50.00 20/52 19/52 38.46 36.54 23/52 23/52 44.23 44.23 26/52 26/52 50.00 50.00 29/52 29/52 55.77 55.77 56/126 57/126 44.44 45.24 65/126 64/126 51.59 50.79 63/126 62/126 50.00 49.21 61/126 60/126 48.41 47.62 63/126 63/126 50.00 50.00 65/126 64/126 51.59 50.79 492/1012 488/1012 48.61 48.22 0.7 7/24 29.17 12/24 50.00 19/52 36.54 23/52 44.23 27/52 51.92 29/52 55.77 57/126 45.24 64/126 50.79 61/126 48.41 59/126 46.83 63/126 50.00 65/126 51.59 486/1012 48.02 0.8 7/24 29.17 12/24 50.00 19/52 36.54 23/52 44.23 28/52 53.85 29/52 55.77 57/126 45.24 64/126 50.79 60/126 47.62 58/126 46.03 62/126 49.21 64/126 50.79 483/1012 47.73 0.9 7/24 29.17 12/24 50.00 20/52 38.46 23/52 44.23 28/52 53.85 29/52 55.77 56/126 44.44 65/126 51.59 60/126 47.62 58/126 46.03 62/126 49.21 64/126 50.79 484/1012 47.83 1 9/24 37.50 11/24 45.83 21/52 40.38 22/52 42.31 28/52 53.85 27/52 51.92 57/126 45.24 66/126 52.38 57/126 45.24 53/126 42.06 63/126 50.00 63/126 50.00 477/1012 47.13 Table 2: Prediction Results (1st Division JUFA) : correctly predicted games/all games, and corresponding percentage. Alberto Palacios Pawlovsky SPM SPM Alberto Palacios Pawlovsky 0 12/24 50.00 13/24 54.17 20/52 38.46 27/52 51.92 21/52 40.38 21/52 40.38 63/126 50.00 61/119 51.26 73/126 57.94 70/126 55.56 76/126 60.32 58/126 46.03 515/1005 51.24 0.1 13/24 54.17 12/24 50.00 21/52 40.38 28/52 53.85 22/52 42.31 20/52 38.46 64/126 50.79 61/119 51.26 71/126 56.35 70/126 55.56 78/126 61.90 58/126 46.03 518/1005 51.54 0.2 13/24 54.17 12/24 50.00 21/52 40.38 28/52 53.85 22/52 42.31 20/52 38.46 64/126 50.79 60/119 50.42 71/126 56.35 71/126 56.35 78/126 61.90 59/126 46.83 519/1005 51.64 0.3 13/24 54.17 12/24 50.00 20/52 38.46 28/52 53.85 21/52 40.38 21/52 40.38 64/126 50.79 59/119 49.58 70/126 55.56 74/126 58.73 79/126 62.70 59/126 46.83 520/1005 51.74 0.4 13/24 54.17 12/24 50.00 20/52 38.46 28/52 53.85 22/52 42.31 22/52 42.31 64/126 50.79 59/119 49.58 71/126 56.35 72/126 57.14 78/126 61.90 59/126 46.83 520/1005 51.74 weight (w) 0.5 0.6 13/24 13/24 54.17 54.17 12/24 12/24 50.00 50.00 20/52 20/52 38.46 38.46 29/52 29/52 55.77 55.77 22/52 22/52 42.31 42.31 22/52 23/52 42.31 44.23 64/126 64/126 50.79 50.79 60/119 59/119 50.42 49.58 70/126 70/126 55.56 55.56 71/126 71/126 56.35 56.35 81/126 82/126 64.29 65.08 59/126 59/126 46.83 46.83 523/1005 524/1005 52.04 52.14 0.7 13/24 54.17 11/24 45.83 19/52 36.54 29/52 55.77 22/52 42.31 23/52 44.23 63/126 50.00 59/119 49.58 70/126 55.56 71/126 56.35 82/126 65.08 59/126 46.83 521/1005 51.84 0.8 13/24 54.17 11/24 45.83 20/52 38.46 27/52 51.92 22/52 42.31 23/52 44.23 63/126 50.00 59/119 49.58 69/126 54.76 71/126 56.35 82/126 65.08 59/126 46.83 519/1005 51.64 0.9 13/24 54.17 11/24 45.83 20/52 38.46 27/52 51.92 22/52 42.31 23/52 44.23 63/126 50.00 59/119 49.58 69/126 54.76 72/126 57.14 82/126 65.08 59/126 46.83 520/1005 51.74 1 12/24 50.00 10/24 41.67 22/52 42.31 25/52 48.08 24/52 46.15 23/52 44.23 58/126 46.03 61/119 51.26 68/126 53.97 73/126 57.94 78/126 61.90 58/126 46.03 512/1005 50.95 Table 3: Prediction Results (2nd Division JUFA) : correctly predicted games/all games, and corresponding percentage. Year 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Total 8 Alberto Palacios Pawlovsky SPM 2005 onward. If we limit the analysis to the contemporary data, the best w’s settings are 0.3 and 0.6. Second come all other settings, but w = 1. Based on all the above and taking as reference the number of games correctly predicted, for contemporary data, in both divisions we chose as best setting w = 0.5. For this value of w we have in the first division 373 games (of 756) correctly predicted. The highest number is obtained with w = 0 or w = 0.1, but the difference is only of two games. For the second division, the number of games correctly predicted with this setting is 405 games (of 749). The same number is obtained with w = 0.6, but there is a difference of three games in the first division (w = 0.6 has only 370 games correctly predicted). Setting w to 0.5 means that the scores and points metrics must have the same weight. In other words, if we will combine them for rating teams and use this rating for ranking or prediction we should use equation (7). It also seems to be the best tradeoff for the whole span of data and for both divisions. Of course, another possible choice would be to use different weight settings for each division. 2.3 Weighted SPM Evaluation : Fitting We also measured the fitting of the predictions obtained with our weighted metrics. Some authors call it hindsight prediction. It is a way of measuring how well Figure 3: Weighted SPM : Hindsight Prediction Percentages (1st Division JUFA). 9 Alberto Palacios Pawlovsky SPM the models used fitted the target data (the higher the hindsight prediction rate the smaller the error). For hindsight prediction, we could use the data (rating, ranking) at the end of the season to predict the outcomes of all its games, but we opted for measuring it, incrementally, one match day at a time. We used data up to game k-th to predict the results of that match day. The total results for the first division of JUFA are shown in Fig. 3. The total results for all weights and for the second division of JUFA are shown in Fig. 4. As expected, and since the team with more points is the top team at Figure 4: Weighted SPM : Foresight Prediction Percentages (2nd Division JUFA). the end of a season, or a match day, the best fitting is obtained with big weights for the points metric. However, we can also see from both figures that the best weight is not equal to 1. For the first division, and the twelve seasons span, the best fitting is obtained with w set to 0.9, and for the second division this value is 0.8. The best hindsight percentages for the contemporary data (2005 season onward) are also obtained with these (best) weight settings. 3 SPM Evaluation : Comparison to Other Methods We also evaluated the performance of the ratings based on our metrics comparing their prediction results to those we can derive using the methodology of Maher 10 Alberto Palacios Pawlovsky SPM (1982), the score prediction method of Stefani (2008), the ranking method of Colley (2002), and the ODM of Govan et al. (2009). Maher supported the theory that the number of goals in soccer follows a Poisson distribution, and defined parameters to represent the defensive and offensive characteristics of a team. His approach defines four parameters for each team (two at home and two when playing away). However, he studied the importance of all these four values and found that two of them, the offensive strength and defensive weakness, will suffice to describe the quality of a team (without differencing them for home and away games). Using his method we can calculate the mean of the goals distribution of each team and determine the number of goals most likely to be scored in a given game. Once we know the scores, we can predict the result of a game between any two teams. Colley has proposed a method for ranking college (American) football teams that uses only the number of games won and the number of games played as input. In this method, we can calculate the ratings by an iterative scheme or a matrix of linear equations. The ratings for a given game can then be used to rank teams and predict the winner of a game. In our implementation of this method we used as initial rating, for all teams, a value of 0.5. Stefani developed a least squares and an exponential smoothing method for predicting scores and applied it to English Premier Soccer League and Super 12/14 rugby union competitions. His model predicts the scores of the home team and away team using the offensive and defensive ratings of each team. In his method, these ratings have a smooth factor that puts more weight in more recent game results. In our implementation of this method, we used 0.5 as initial value of the offensive and defensive ratings for all teams. The Offense-Defense Model (ODM) of Govan et al. (2009) uses a matrix to define the offensive and defensive ratings of a team. Our implementation follows the details given in that paper. 3.1 Foresight Prediction We used our metrics and the rating based on them for game outcome foresight prediction. All the predictions, for all the methods, are based on the data available before the game we will predict. All the methods used the same information. There is no home advantage in the games so no method uses it. Maher’s and Stefani’s methods predict the scores while all other methods, including ours, compute ratings to compare teams and decide which one will win. In the case the ratings of both teams are equal, the game is predicted as a tie. 11 Alberto Palacios Pawlovsky SPM Table 4 and Table 5 show the prediction results for JUFA’s first and second divisions, respectively. From these tables, we can see that SPM gives the best percentages for both divisions for the whole twelve seasons. Table 4: Foresight Prediction Results Comparison: JUFA’s 1st Division. First Division Season 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Total Maher 3/24 12.50% 8/24 33.33% 21/52 40.38% 23/52 44.23% 18/52 34.61% 26/52 50.00% 56/126 44.44% 60/126 47.61% 43/126 34.12% 47/126 37.30% 47/126 37.30% 50/126 39.68% 402/1012 39.72% Stefani 5/24 20.83% 8/24 33.33% 23/52 44.23% 21/52 40.38% 24/52 46.15% 24/52 46.15% 49/126 38.88% 53/126 42.06% 47/126 37.30% 48/126 38.09% 52/126 41.26% 52/126 41.26% 406/1012 40.11% Method ODM 11/24 45.83% 12/24 50.00% 17/52 32.69% 22/52 42.30% 24/52 46.15% 28/52 53.84% 57/126 45.23% 67/126 53.17% 63/126 50.00% 63/126 50.00% 63/126 50.00% 62/126 49.20% 489/1012 48.32% Colley 9/24 37.50% 11/24 45.83% 19/52 36.53% 23/52 44.23% 28/52 53.84% 29/52 55.76% 57/126 45.23% 62/126 49.20% 63/126 50.00% 52/126 41.26% 61/126 48.41% 61/126 48.41% 475/1012 46.93% SPM 9/24 37.50% 12/24 50.00% 20/52 38.46% 23/52 44.23% 26/52 50.00% 29/52 55.76% 56/126 44.44% 65/126 51.58% 63/126 50.00% 61/126 48.41% 63/126 50.00% 65/126 51.58% 492/1012 48.61% However, if we look only at the first division table, ODM’s method has seven seasons of best values, with five of them in the contemporary data range. It is followed by SPM’s method, with six seasons with best values and three of them in the contemporary seasons. For the second division, Colley’s method has six seasons with best values, with four in the contemporary years (2005 onward). The second place corresponds to 12 Alberto Palacios Pawlovsky SPM SPM’s method with four seasons with best values, and three of them in contemporary seasons. Table 5: Foresight Prediction Results Comparison : JUFA’s 2nd Division. Second Division Season 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Total Maher 8/24 33.33% 9/24 37.50% 18/52 34.61% 23/52 44.23% 16/52 30.76% 13/52 25.00% 54/126 42.85% 45/119 37.81% 55/126 43.65% 59/126 46.82% 54/126 42.85% 46/126 36.50% 400/1005 39.80% Stefani 14/24 58.33% 13/24 54.16% 9/52 17.30% 23/52 44.23% 14/52 26.92% 20/52 38.46% 52/126 41.26% 56/119 47.05% 55/126 43.65% 58/126 46.03% 67/126 53.17% 49/126 38.88% 430/1005 42.78% Method ODM 11/24 45.83% 10/24 41.66% 21/52 40.38% 28/52 53.84% 21/52 40.38% 24/52 46.15% 65/126 51.58% 58/119 48.73% 67/126 53.17% 71/126 56.34% 73/126 57.93% 53/126 42.06% 502/1005 49.95% Colley 12/24 50.00% 9/24 37.50% 23/52 44.23% 25/52 48.07% 23/52 44.23% 19/52 36.53% 65/126 51.58% 59/119 49.57% 71/126 56.34% 73/126 57.93% 77/126 61.11% 60/126 47.61% 516/1005 51.34% SPM 13/24 54.16% 12/24 50.00% 20/52 38.46% 29/52 55.76% 22/52 42.30% 22/52 42.30% 64/126 50.79% 60/119 50.42% 70/126 55.55% 71/126 56.34% 81/126 64.28% 59/126 46.82% 523/1005 52.03% From Figure 5 and Figure 6, we can see that no method reach the 60% line for the first division and the 70% line for the second division. For the first division and seasons 1999 and 2000 where the number of games is small, Maher’s and Stefani’s did not give good results. From 2001 to 2004, where the number of games doubled, these methods improved their predictions but for contemporary data (2005 onward) they hardly reached the 40% line. 13 Alberto Palacios Pawlovsky SPM ODM’s, Colley’s and the prediction based on SPM show the best results for almost all these seasons. With only one exception in 2001, where Stefani’s method shows the highest prediction percentage. Figure 5: Foresight Prediction Results Comparison (1st Division JUFA). For the second division of JUFA and the seasons of 1999 and 2000, Stefani’s based predictions show high figures, but its results for all other seasons are low. Maher’s results started with low values, but they seem to be more stable for seasons with a larger number of games. Its results for contemporary data show an average around the 40% line for all those seasons. In the first division and for all seasons between 2000 and 2004, the predictions based on SPM are better than those given by ODM’s method. For contemporary data, both methods alternate in giving the best results, but without a clear difference between them. For the second division and years 2001, 2004 and 2005, ODM’s method gives better results, but for all other seasons SPM’s method is better. If we compare only Colley’s and SPM’s results for the first division, Colley’s method gives the best results for seasons 2003 and 2005, but SPM’s results are better for all other seasons (Fig. 5). When comparing the results of the second division, we can not see a clear predominance of one of these methods. SPM gives better results for seasons with a small number of games (1999, and 2000), but both methods alternate in giving the best results for almost all other seasons (Fig. 6). 14 Alberto Palacios Pawlovsky SPM Figure 6: Foresight Prediction Results Comparison (2nd Division JUFA). 3.2 Fitting : Hindsight Prediction We also measured the fitting of all the methods we compared. The results obtained are shown in Figure 7 and Figure 8. They are also detailed in Table 6 and Table 7, respectively. For the first division, Maher’s based method gives, almost for all seasons, the lowest results with one exception in the 2004 season where it has one of the highest values. For the same data, Stefani’s based method is almost stable and its results move around an average of 50% for all the contemporary seasons. These two methods show almost the same total prediction percentage. This time again, ODM’s, Colley’s and SPM based methods stand above these methods, with a little total difference between any two of them. ODM’ results keeps a position around the 55% line, while Colley’s and SPM’s results move around the 60% line. Their hindsight prediction percentages are the best for the first division of JUFA (Fig. 7). If we see the total figures of Table 6, these methods have almost a 10% of difference when compared to Maher’s or Stefani’s results. For the first division, Colley’s method has nine seasons with the best percentages, five of them in the contemporary data (2005 onward). It is followed by SPM’s method which has five seasons of best values, with two of them in the contemporary range. 15 Alberto Palacios Pawlovsky SPM Figure 7: Hindsight Prediction Results Comparison (1st Division JUFA). Figure 8: Hindsight Prediction Results Comparison (2nd Division JUFA). 16 Alberto Palacios Pawlovsky SPM Table 6: Hindsight Prediction Results Comparison : JUFA’s 1st Division. First Division Season 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Total Maher 6/24 25.00% 15/24 62.50% 31/52 59.61% 28/52 53.84% 21/52 40.38% 38/52 73.07% 61/126 48.41% 73/126 57.93% 54/126 42.85% 58/126 46.03% 55/126 43.65% 61/126 48.41% 501/1012 49.50% Stefani 12/24 50.00% 15/24 62.50% 32/52 61.53% 26/52 50.00% 27/52 51.92% 33/52 63.46% 54/126 42.85% 68/126 53.96% 60/126 47.61% 60/126 47.61% 66/126 52.38% 61/126 48.41% 514/1012 50.79% Method ODM 17/24 70.83% 15/24 62.50% 27/52 51.92% 28/52 53.84% 30/52 57.69% 35/52 67.30% 65/126 51.58% 77/126 61.11% 73/126 57.93% 70/126 55.55% 75/126 59.52% 77/126 61.11% 589/1012 58.20% Colley 16/24 66.66% 17/24 70.83% 27/52 51.92% 31/52 59.61% 34/52 65.38% 39/52 75.00% 71/126 56.34% 84/126 66.66% 81/126 64.28% 79/126 62.69% 76/126 60.31% 84/126 66.66% 639/1012 63.14% SPM 17/24 70.83% 17/24 70.83% 28/52 53.84% 30/52 57.69% 34/52 65.38% 38/52 73.07% 68/126 53.96% 81/126 64.28% 81/126 64.28% 77/126 61.11% 78/126 61.90% 82/126 65.07% 631/1012 62.35% For the second division (Fig. 8, Table 7), Maher’s and Stefani’s methods alternate, for almost all seasons, for the lowest results. For this division, these methods give total results almost in the same range. Above the 60% line are, again, ODM’s, Colley’s and SPM’s results. They are better than Maher’s and Stefani’s results for almost a 10% of difference. For both divisions, Colley’s and SPM’s methods give very close results. One season to be noted is second division’s 2000 season. In this season, Stefani’s method shows the best result with more than 10% of difference to any other method. Colley’s method also shows a similar value for the 2002 season. 17 Alberto Palacios Pawlovsky SPM Table 7: Hindsight Prediction Results Comparison : JUFA’s 2nd Division. Second Division Season 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Total Maher 13/24 54.16% 15/24 62.50% 25/52 48.07% 33/52 63.46% 24/52 46.15% 24/52 46.15% 69/126 54.76% 54/119 45.37% 63/126 50.00% 72/126 57.14% 68/126 53.96% 54/126 42.85% 514/1005 51.14% Stefani 17/24 70.83% 18/24 75.00% 21/52 40.38% 31/52 59.61% 24/52 46.15% 28/52 53.84% 58/126 46.03% 62/119 52.10% 73/126 57.93% 75/126 59.52% 77/126 61.11% 58/126 46.03% 542/1005 53.93% Method ODM 16/24 66.66% 14/24 58.33% 35/52 67.30% 35/52 67.30% 30/52 57.69% 35/52 61.53% 75/126 59.52% 70/119 58.82% 80/126 63.49% 81/126 64.28% 86/126 68.25% 65/126 51.58% 622/1005 61.89% Colley 17/24 70.83% 14/24 58.33% 32/52 61.53% 39/52 75.00% 29/52 55.76% 29/52 55.76% 81/126 64.28% 80/119 67.22% 80/126 63.49% 86/126 68.25% 88/126 69.84% 69/126 54.76% 644/1005 64.07% SPM 17/24 70.83% 15/24 62.50% 31/52 59.61% 38/52 73.07% 30/52 57.69% 30/52 57.69% 80/126 63.49% 73/119 61.34% 82/126 65.07% 87/126 69.04% 88/126 69.84% 63/126 50.00% 634/1005 63.08% If we compare ODM’s and SPM’s results for the first division, SPM has a clear advantage over ODM. The fitting of the predictions of SPM for all the seasons of this division are better than those of ODM. The differences between their results fall between 2.5% and 6%. However, this is not the case for the results of the second division (Fig. 8). For seasons with a small number of games (1999 and 2000) SPM’s results are better. For the seasons between 2001 and 2004 there is no clear difference, and for the contemporary data SPM is better for all but the last season (2010). For the seasons between 2005 and 2009, the difference in the results lies between 1.5% and 4%. 18 Alberto Palacios Pawlovsky SPM If we compare Colley’ and SPM’s results, for the first division data, there is no clear predominance of one of these methods. However, we could say that Colley’s results are slightly better. Both methods give the same results for three seasons and Colley’s results are better than those of SPM in six seasons (SPM ’s results are better than those of Colley’s only in three seasons). Again, SPM’s results seem to be better for seasons with a small number of games (1999 and 200 seasons). For the contemporary data, between 2005 and 2010, they alternate in giving the best results. For the second division (Table 7), Colley’s method has six seasons with best values of which four are in the contemporary range. It is followed by SPM’s method with five seasons with best results and three of them in the contemporary seasons (2005 onward). Colley’s results are better than those of SPM’s method for three of the six years of contemporary data. Also, for the 2006 and the last season, Colley’s results have a better fitting with an improvement that ranges from 5% to 6%. For all other seasons, there is no clear predominance of one method and the differences between their results are small. From what we explained above, we can say that SPM’s, ODM’s and Colley’s based methods have a slight advantage to all other methods we compared. One way of improving the overall prediction percentages, in both divisions, would be to combine these methods using rank aggregation (Govan et al. (2009)). 4 Conclusions We have shown two metrics and one way of combining them for rating soccer teams. One of the metrics uses the goals scored by a team and the other the points earned by it. We evaluated the combined use of these metrics using a weighted rating to rank the teams and predict the results of the games of the first and second division of the Japanese University Football Association (Kanto League). Based on the results of this evaluation, we determined that our metrics should be used with the same weight when combined for rating. This rating seems to be the only one using these metrics combined in this way. We also compared the game outcome prediction results of our metrics to those obtained with four other methods. The comparison results show that SPM could be a good alternative to Colley’s based method or the Offense Defense Model (ODM) when ranking teams and for prediction. Our metrics are easy to implement and can also be used in other sports. Rugby (targeted in Stefani’s method) and Basketball and Football (targeted in ODM’s method) could probably use them without major changes. There are works, like 19 Alberto Palacios Pawlovsky SPM the one of Pasteur (2010), that aims to improve prediction results. Similar and other approaches tailored to SPM could also be topics for further study. Annex: Brief Description of Other Methods. In this paper we compare SPM predictions to the predictions we can derive using the methodology of Maher (1982), the score prediction method of Stefani (2008), the ranking method of Colley (2002), and the ODM method of Govan et al. (2009). We detail briefly these methods in the following subsections. Maher (1982) Based Prediction Maher supported the theory that the number of goals in soccer follows a Poisson distribution, and defined parameters to represent the defensive and offensive characteristics of a team. In his model two teams i (home team) and j (away team) face each other in a game that ends with a score (xi j , yi j ). He also attributes these scores to occurrences of variables Xi j and Yi j that have a Poisson distribution and means given by αi β j and γi δ j . Where αi defines the offensive strength of (local) team i, β j the defensive weakness of (away) team j, γi the defensive weakness of team i, and δ j the offensive strength of team j. Taking the scores’ log function, the maximum likelihood estimators (MLE) for team i are given by Equation (11) (the values of γ and δ can be determined in the same way). α̂i = ∑ j6=i xi j and β̂i = ∑ j6=i β̂ j ∑i6= j xi j ∑i6= j α̂ j (11) Since α̂ depends on the values of β̂ and vice versa, Maher suggests as initial values the following ones. ∑ j6=i xi j ∑i6= j xi j α̂i = √ and β̂i = √ (12) Sx Sx Where the denominator is given by Equation (13), and is the number of the goals scored by all teams. Sx = ∑ ∑ xi j (13) i j6=i He studied the importance of these values and found that two of them, the offensive strength and defensive weakness, will suffice to describe the quality of a team (without differencing them for home and away games). Equations (8) to (10) can be applied to each match day k based on the data up to match k-1. After determining 20 Alberto Palacios Pawlovsky SPM the α̂ and β̂ for each team we can calculate the mean of its goals distribution and determine the number of goals most likely to be scored in game k. Knowing the scores we can predict the result of a game between any two teams. Colley (2002) Based Prediction Colley has proposed a method for ranking college football teams that uses only the number of games won nw and the number of games played ntot as input. He uses the modified winning percentage shown in equation (11) as the rating of a team. r= 1 + nw 2 + ntot (14) He also works with the number of wins given by equation (12) (nl is the number of games lost). (nw − nl ) ntot 1 (nw − nl ) ntot + = +∑ (15) nw = 2 2 2 2 And modifies the second term to define an adjustement for strength of schedule based on the rates of the opponents of team i as given by equation (13). newf f (nw,i − nl,i ) = + 2 ntot,i ∑ rij (16) j=1 It gives the effective number of wins of team i. Here rij is the rating of the jth opponent of i. In this method we can calculate the ratings by an iterative scheme or a matrix of linear equations. The ratings for a given game can then be used to rank teams and predict the winner of a game. Stefani (2008) Based Prediction Stefani developed a least-squares and an exponential smoothing method for predicting scores and applied it to English Premier Soccer League and Super 12/14 rugby union competitions. His model predicts the score of home team i (si j ) and away team j (s ji ) using the formulas in equations (14) and (15). sPij = roi + rd j (17) sPji = ro j + rdi (18) 21 Alberto Palacios Pawlovsky SPM Where ro and rd are the offensive and defensive ratings of each team and for i team are given by equations (16) and (17) (j team’s ratings are calculated in a similar way). m−1 n−1 n n−1 ](si j − (roi + rdm−1 (19) roi = roi +[ j )) nm − 1 m−1 n n−1 n−1 rdi = rdi +[ ](s ji − (rom−1 (20) j + rdi )) nm − 1 Here n is the number of games of team i and m the number of games of team j. The fraction in the second term of these equations is the smoothing factor. We used 0.5 as initial value of ro (ro0 ) and rd (rd0 ) for all teams to predict the scores of the first games when using this method. ODM Based Prediction The Offense-Defense Model (ODM) of Govan et al. (2009) uses a matrix A = [ai j ] where ai j is the score of team j against team i. It also defines two ratings. The offensive rating of team j is given by the following equation. o j = a1 j ( 1 1 ) + ... + an j ( ) d1 dn (21) And the defensive rating of i is given by equation (19). di = ai1 ( 1 1 ) + ... + ain ( ) o1 on (22) For convergence they define a new matrix P = A + εeeT , where e is a vector of all ones and equal to the initial values of all ds (d(0) = e). This makes possible the calculation of o (all the offensive ratings) as follows. o(k) = PT 1 d(k−1) (23) And then of all the defensive ratings d. d(k) = P 1 o(k) The overall rating of team i is given by the following equation. oi ri = di (24) (25) For prediction we used this overall rating to rank teams and determine the winner of a game. 22 Alberto Palacios Pawlovsky SPM References Bradley, R. A. and M. E. Terry (1952): “Rank Analysis of Incomplete Block Designs I : The Method of Paired Comparisons,” Biometrika, 39, 324–345. Brillinger, D. R. (2010): Wiley Enciclopedia of Operations Research and Management Science, John Wiley and Sons, Inc., chapter Soccer/World Football. Callagham, T., P. J. Mucha, and M. A. Porter (2007): “Random Walker Ranking for NCAA Division I-A Football,” American Mathematical Monthly, 114, 761–777. Colley, W. N. (2002): “Colley’s bias free college football ranking method,” . Gleich, D. F. and L.-H. Lim (2011): “Rank Aggregation via Nuclear Norm Minimization,” in Proceedings of the Conference on Knowledge Discovery and Data Mining, ACM, KDD 11, 60–68. Govan, A. Y., A. N. Langville, and C. D. Meyer (2009): “Offense-Defense Approach to Ranking Team Sports,” Journal of Quantitative Analysis in Sports, 5, 1–17. Hallinan, S. E. (2005): “Paired Comparison Models for Ranking National Soccer Teams,” Technical report, Worcester Polythecnic Institute. Harville, D. A. (2003): “The Selection or Seeding of College Basketball or Football Teams for Postseason Competition,” Journal of the American Statistical Association, 98, 17–27. Ingram, L. C. (2007): Ranking NCAA Sports Teams with Linear Algebra, Master’s thesis, The Graduate School of the College of Charleston. JUFA (2011): http://www.jufa-kanto.jp/. Maher, M. J. (1982): “Modelling Association Football Scores,” Statistica Neerlandica, 36, 109–118. Pasteur, R. D. (2010): Extending the Colley Method to Generate Predictive Football Rankings, number 43 in Dolciani Mathematical Expositions, Mathematical Association of America, chapter 10, 117–129. Reep, C., R. Pollard, and B. Benjamin (1971): “Skill and Chance in Ball Games,” Journal of the Royal Statistical Society. Series A, 134, 623–629. Stefani, R. T. (2008): “Predicting Score Difference Versus Score Total in Rugby and Soccer,” IMA Journal of Management Mathematics, 20, 147–158. 23