Premier League Player Transfer Fee Prediction Algorithms Authors: Colton Freund Zachary Krepps University of North Carolina Wilmington Abstract Several different algorithms have been implemented in an attempt to predict the future transfer value of soccer players within the English Premier League. These algorithms include a feed forward neural network using back propagation, a topological sort, and an averaging algorithm. We will compare these algorithms for time efficiency, as well as accuracy. In order to determine the accuracy of our algorithms we collected transfer market data from 2013, 2014, and 2015. Using the data from 2013, and 2014 to attempt to predict the outcomes of 2015. Intro In today’s professional soccer matches each team is allowed eleven players on the field and is allowed three substitutes per game, if a player is substituted off the field he may not return at any point during the rest of the match. Soccer matches are ninety minutes long, consisting of two forty five minute halves. This means that the maximum number of players from a team to play in a game is fourteen. In these games each player has an important roll in winning the match for his team. Since so few players see the field each match it is vital for teams to have the very best players they can on the field. These teams will attempt to get players from other teams which they believe can help them win, when players move teams it is also known as a transfer, often times there are transfer fees associated with these movements. Transfer fees are financial payouts between two soccer teams in which one team would like a player which the another team has, the team wanting the player may pay a transfer fee to the team which currently has the player in order for them to release them from their contract and allow them to switch teams. Transfer fees are not the same as player salaries. Neural Network Implementing a neural network with back propagation approach to predict the future player transfer values requires a training set of inputs with known outputs. The neural network implemented for this experiment took 30 inputs, and had 15 hidden neurons in layer one, 10 hidden neurons in layer two, and 1 output neuron. These thirty inputs include: Age, Position, Appearances, minutes, tackles, goals, shots per game and many others. Each hidden and output neuron have a weight associated with each input, these weights are randomly assigned at the beginning of the program. Each node produces one output, which will be passed to the next layer as input. After a player’s data from the training set goes through the network the network output (expected transfer value) is compared to the expected output (expected transfer value), the error is determined and used to update the weights of the nodes in the network. This process continues until the total sum squared error reaches a reasonable number at which point the network is considered trained, .5 is used in this program. After the network is trained on the 2013/2014 data the 2015 player statistics are run through the trained network to get their predicted value, which is then compared to their actual Transfer Value for 2015 to determine accuracy. Results of 15 iterations are shown in the following table. Averaging Algorithm variables that go into performance on a test. Likewise, there are numerous factors that dictate how a soccer player will perform during a game. The algorithm is designed to iterate over the 30 data points for the player we are trying to predict. As it stops at each data point it compares it with the 107 comparison data points from other players. It finds any points that have the exact same data and adds that corresponding transfer fee to a list to be averaged. If no exact match can be found it finds the closest value and uses its corresponding transfer The Averaging Algorithm was devised after looking at the 30 data points that we collected for each player and trying to devise a naive and simple approach to predicting transfer fee values. People use averaging every day, it is a simple way to predict a future value. If a student earned an 80 percent on a test and 90 percent on the second test, for the final test most people would use averaging and guess that the student would score around an 85. This score might be dead on or way off. There are some many fee and adds it to the list. After all 30 data points have been matched and our transfer fee list is populated we add up the transfer fees and divide by the total number of transfer fees in the list. This is what we use as the predicted transfer fee value. Below is a table representing the algorithms proficiency at predicting transfer fees. We look at different blocks. The first block looks at all the fees that were correctly predicted within 100,000 euros. The next block looks at the number of players that were correctly predicted within 100,000 and 500,000 euros. The rest follows suite. The algorithm was unable to predict any of the 103 test group within 100,000 euros. And was only able to predict 5 out of 103 players within 100,000 and 500,000. Our maximum player transfer fee was Difference < 100,000 100,000 – 500,000 500,000 – 1,000,000 1,000,000 – 2,000,000 2,000,000 – 3,000,000 3,000,000 – 4,000,000 4,000,000 – 5,000,000 > 5,000,000 75 million euros. Looking at the table we can see that around 32 percent of the test group where within 5 million euros. Another way to look at this is 32% of our players were predicted within 15% of their actual fees. This algorithm allows you to get fairly close to predicting a fee, minus some outliers, quickly. The algorithm takes less than a second to run and has an efficiency of O(n + m). Where n is the number of players to judge against in your database and m is the number of data points you’ve collected on each player. It passes through each player and then through each one of their data points utilizing a simple compare function to see if the test player matches the database player. Number of players 0 5 3 8 6 4 6 71 Topological Algorithm The topological algorithm was built after looking at topological sort. A topological sort is a linear ordering of a directed graph. For every edge xy from vertex x to vertex y, x comes before y in the ordering. Our data doesn’t necessarily represent a directed graph. To traverse our nodes, goals doesn’t necessarily come before minutes played. However, we Percent of players run 0% 4.85% 2.91% 7.77% 5.83% 3.88% 5.83% 68.93% used the idea of a linear ordering of our data. The algorithm goes through each of the data points and finds the closest ordering of the 107 reference players in our data. The table below illustrates the same range for the Averaging Algorithm. This algorithm was able to predict 7 out of the 103 test players within 100,000 euros of there actually transfer fee. Which is a 7 person increase from the Averaging algorithm and a 4.8 person increase from the Neural Network Algorithm. Although, it starts to fail rapidly, with an identical number of players over the 5 million Euro gap as the averaging algorithm. n is the number of players in the database to test against and m is the number of data points or statistics collected, in our case 30 for each player. The algorithm uses two for loops one for the player and one for the statistics. This algorithm ran in under one second and was simple to implement. The efficiency of the Topological Algorithm is the same as the Averaging Algorithm, O(n+m). Where Difference < 100,000 100,000 – 500,000 500,000 – 1,000,000 1,000,000 – 2,000,000 2,000,000 – 3,000,000 3,000,000 – 4,000,000 4,000,000 – 5,000,000 > 5,000,000 Number of players 7 2 1 7 5 4 6 71 Comparing Algorithms When comparing algorithms it is obvious that both the averaging algorithm as well as the topological sort have the neural network beat when it comes to time used being that the averaging and topological each take less than a second to run while the neural network takes over two minutes on average. However speed is not all that needs to be accounted for when comparing algorithms. The Percent of players run 6.8% 1.94% .97% 6.8% 4.85% 3.88% 5.83% 68.93% results of the neural network show that almost 60% of the predicted transfer values were within 5 million euros of their expected transfer values. On the other hand both the topological and averaging algorithm were only able to get about 30% of transfer fees within 5 million of the expected value. When looking at results like these it is obvious that the additional time should be sacrificed for the extra accuracy provided by the network. Percentage of players in group Algorithm Comparison 80% 60% 40% 20% 0% < 100,000 100,000 – 500,000 – 1,000,000 – 2,000,000 – 3,000,000 – 4,000,000 – > 5,000,000 500,000 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 Difference in Price from Actual in Euros Average Topological Neural Conclusion The neural network provides the bestpredicted fees for the datasets we used. Even though it took the longest it provided the most accurate results. For future experimentation in regards to transfer fees more data would a provide a stronger training set for the neural network, and more comparison data to be able to get the predicted transfer value closer to actual transfer value. Sources "European Leagues and Cup Competitions - Transfermarkt." European Leagues and Cup Competitions - Transfermarkt. N.p., n.d. Web. 02 Dec. 2015. "Football Statistics | Football Live Scores | WhoScored.com." Football Statistics | Football Live Scores | WhoScored.com. N.p., n.d. Web. 02 Dec. 2015.