paper - People Server at UNCW - University of North Carolina

advertisement
Premier League Player Transfer Fee Prediction Algorithms
Authors:
Colton Freund
Zachary Krepps
University of North Carolina Wilmington
Abstract
Several different algorithms have been
implemented in an attempt to predict
the future transfer value of soccer
players within the English Premier
League. These algorithms include a
feed forward neural network using
back propagation, a topological sort,
and an averaging algorithm. We will
compare these algorithms for time
efficiency, as well as accuracy. In
order to determine the accuracy of
our algorithms we collected transfer
market data from 2013, 2014, and
2015. Using the data from 2013, and
2014 to attempt to predict the
outcomes of 2015.
Intro
In today’s professional soccer matches
each team is allowed eleven players
on the field and is allowed three
substitutes per game, if a player is
substituted off the field he may not
return at any point during the rest of
the match. Soccer matches are ninety
minutes long, consisting of two forty
five minute halves. This means that
the maximum number of players from
a team to play in a game is fourteen. In
these games each player has an
important roll in winning the match
for his team. Since so few players see
the field each match it is vital for
teams to have the very best players
they can on the field. These teams will
attempt to get players from other
teams which they believe can help
them win, when players move teams it
is also known as a transfer, often
times there are transfer fees
associated with these movements.
Transfer fees are financial payouts
between two soccer teams in which
one team would like a player which
the another team has, the team
wanting the player may pay a transfer
fee to the team which currently has
the player in order for them to release
them from their contract and allow
them to switch teams. Transfer fees
are not the same as player salaries.
Neural Network
Implementing a neural network with
back propagation approach to predict
the future player transfer values
requires a training set of inputs with
known outputs. The neural network
implemented for this experiment took
30 inputs, and had 15 hidden neurons
in layer one, 10 hidden neurons in
layer two, and 1 output neuron. These
thirty inputs include: Age, Position,
Appearances, minutes, tackles, goals,
shots per game and many others.
Each hidden and output neuron have a
weight associated with each input,
these weights are randomly assigned
at the beginning of the program. Each
node produces one output, which will
be passed to the next layer as input.
After a player’s data from the training
set goes through the network the
network output (expected transfer
value) is compared to the expected
output (expected transfer value), the
error is determined and used to
update the weights of the nodes in the
network. This process continues until
the total sum squared error reaches a
reasonable number at which point the
network is considered trained, .5 is
used in this program. After the
network is trained on the 2013/2014
data the 2015 player statistics are run
through the trained network to get
their predicted value, which is then
compared to their actual Transfer
Value for 2015 to determine accuracy.
Results of 15 iterations are shown in
the following table.
Averaging Algorithm
variables that go into performance on
a test. Likewise, there are numerous
factors that dictate how a soccer
player will perform during a game.
The algorithm is designed to iterate
over the 30 data points for the player
we are trying to predict. As it stops at
each data point it compares it with the
107 comparison data points from
other players. It finds any points that
have the exact same data and adds
that corresponding transfer fee to a
list to be averaged. If no exact match
can be found it finds the closest value
and uses its corresponding transfer
The Averaging Algorithm was
devised after looking at the 30 data
points that we collected for each
player and trying to devise a naive and
simple approach to predicting transfer
fee values. People use averaging
every day, it is a simple way to predict
a future value. If a student earned an
80 percent on a test and 90 percent on
the second test, for the final test most
people would use averaging and guess
that the student would score around
an 85. This score might be dead on or
way off. There are some many
fee and adds it to the list. After all 30
data points have been matched and
our transfer fee list is populated we
add up the transfer fees and divide by
the total number of transfer fees in the
list. This is what we use as the
predicted transfer fee value.
Below is a table representing the
algorithms proficiency at predicting
transfer fees. We look at different
blocks. The first block looks at all the
fees that were correctly predicted
within 100,000 euros. The next block
looks at the number of players that
were correctly predicted within
100,000 and 500,000 euros. The rest
follows suite. The algorithm was
unable to predict any of the 103 test
group within 100,000 euros. And was
only able to predict 5 out of 103
players within 100,000 and 500,000.
Our maximum player transfer fee was
Difference
< 100,000
100,000 – 500,000
500,000 – 1,000,000
1,000,000 – 2,000,000
2,000,000 – 3,000,000
3,000,000 – 4,000,000
4,000,000 – 5,000,000
> 5,000,000
75 million euros. Looking at the table
we can see that around 32 percent of
the test group where within 5 million
euros. Another way to look at this is
32% of our players were predicted
within 15% of their actual fees.
This algorithm allows you to get
fairly close to predicting a fee, minus
some outliers, quickly. The algorithm
takes less than a second to run and
has an efficiency of O(n + m). Where n
is the number of players to judge
against in your database and m is the
number of data points you’ve
collected on each player. It passes
through each player and then through
each one of their data points utilizing
a simple compare function to see if the
test player matches the database
player.
Number of
players
0
5
3
8
6
4
6
71
Topological Algorithm
The topological algorithm was built
after looking at topological sort. A
topological sort is a linear ordering of
a directed graph. For every edge xy
from vertex x to vertex y, x comes
before y in the ordering. Our data
doesn’t necessarily represent a
directed graph. To traverse our
nodes, goals doesn’t necessarily come
before minutes played. However, we
Percent of players run
0%
4.85%
2.91%
7.77%
5.83%
3.88%
5.83%
68.93%
used the idea of a linear ordering of
our data.
The algorithm goes through each of
the data points and finds the closest
ordering of the 107 reference players
in our data. The table below
illustrates the same range for the
Averaging Algorithm. This algorithm
was able to predict 7 out of the 103
test players within 100,000 euros of
there actually transfer fee. Which is a
7 person increase from the Averaging
algorithm and a 4.8 person increase
from the Neural Network Algorithm.
Although, it starts to fail rapidly, with
an identical number of players over
the 5 million Euro gap as the
averaging algorithm.
n is the number of players in the
database to test against and m is the
number of data points or statistics
collected, in our case 30 for each
player. The algorithm uses two for
loops one for the player and one for
the statistics. This algorithm ran in
under one second and was simple to
implement.
The efficiency of the Topological
Algorithm is the same as the
Averaging Algorithm, O(n+m). Where
Difference
< 100,000
100,000 – 500,000
500,000 – 1,000,000
1,000,000 – 2,000,000
2,000,000 – 3,000,000
3,000,000 – 4,000,000
4,000,000 – 5,000,000
> 5,000,000
Number of players
7
2
1
7
5
4
6
71
Comparing Algorithms
When comparing algorithms it is
obvious that both the averaging
algorithm as well as the topological
sort have the neural network beat
when it comes to time used being that
the averaging and topological each
take less than a second to run while
the neural network takes over two
minutes on average. However speed is
not all that needs to be accounted for
when comparing algorithms. The
Percent of players run
6.8%
1.94%
.97%
6.8%
4.85%
3.88%
5.83%
68.93%
results of the neural network show
that almost 60% of the predicted
transfer values were within 5 million
euros of their expected transfer
values. On the other hand both the
topological and averaging algorithm
were only able to get about 30% of
transfer fees within 5 million of the
expected value. When looking at
results like these it is obvious that the
additional time should be sacrificed
for the extra accuracy provided by the
network.
Percentage of players in group
Algorithm Comparison
80%
60%
40%
20%
0%
< 100,000 100,000 – 500,000 – 1,000,000 – 2,000,000 – 3,000,000 – 4,000,000 – > 5,000,000
500,000 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000
Difference in Price from Actual in Euros
Average
Topological
Neural
Conclusion
The neural network provides the bestpredicted fees for the datasets we
used. Even though it took the longest
it provided the most accurate results.
For future experimentation in regards
to transfer fees more data would a
provide a stronger training set for the
neural network, and more comparison
data to be able to get the predicted
transfer value closer to actual transfer
value.
Sources
"European Leagues and Cup Competitions - Transfermarkt." European Leagues and
Cup Competitions - Transfermarkt. N.p., n.d. Web. 02 Dec. 2015.
"Football Statistics | Football Live Scores | WhoScored.com." Football Statistics |
Football Live Scores | WhoScored.com. N.p., n.d. Web. 02 Dec. 2015.
Download