Data Analysis in Football Scouting: Master Thesis

DTU Management
Department of Technology, Management and Economics
Data Analysis in Football Scouting
A Competitive Analysis of Scouting Applications
and Market Opportunities
Master Thesis
Data Analysis in Football Scouting
A Competitive Analysis of Scouting Applications and Market Opportunities
Master Thesis
24th of June 2022
By José Dinis Cardoso
Cover photo:
Published by:
Reproduction of this publication in whole or in part must include the cus­
tomary bibliographic citation, including author attribution, report title, etc.
DTU, Department of Technology, Management and Economics,
Akademivej, Building 358, 2800 Kgs. Lyngby Denmark
This thesis has been prepared at the Department of Technology, Management and Eco­
nomics, at the Technical University of Denmark, DTU, in partial fulfilment for the degree
Master of Science in Industrial Engineering and Management, MSc Eng. from 24th of
January 2022 to 24th June 2022.
It is assumed that the reader has a basic knowledge in the areas of statistics.
José Dinis Cardoso ­ s210328
Data Analysis in Football Scouting
Kathrin Kirchner, Associate Professor, DTU Management
Supervisor and reviewer of this thesis project.
Rasmus Aagaard, MSc Human Centered AI, DTU Compute
Creator of this thesis template.
This master thesis was developed with considerable support from people close to me. For
this reason, I would like to show gratitude to Associate Professor Kathrin Kirchner for the
close supervision, enthusiastic support and constructive criticism during the development
of this thesis.
Moreover, I would like to thank my parents and grandparents for their never­ending sup­
port and for giving me the freedom to chase my areas of interest. I would like to thank
my friends who accompanied me in my journey through university and lastly my girlfriend
who gave me invaluable encouragement throughout the development of this thesis.
Data Analysis in Football Scouting
The data revolution is observable in nearly all aspects of human life: from how deci­
sions are made at the highest levels of government to the content we are subject to in
the internet. Sports is not an exception. Although collective sports have largely adopted
data­driven methodologies, football (for its randomness) was late to the new age of data
adoption and particularly talent scouting/ identification. This thesis investigates Football
Scouting applications, the functionalities present in the available software, what benefits
these applications bring, how the users describe their experience using this software and
which conditions are necessary for software to become customary among football clubs
across different countries and competitions. The thesis intends to advance the compre­
hension of consumer behaviour regarding football scouting software and the processes
within football’s governing bodies and recruitment departments inside clubs. Through a
research design with quantitative and qualitative data collection , the thesis finds proof of
user perception of football scouting software and what organizational/technological ob­
stacles are slowing mass adoption of digital methodologies of football scouting. Based
upon one­on­one interviews and association rules mining, the thesis found the most useful
individual functionalities present in football scouting software and their usefulness when
grouped together in software products. The thesis identifies the most relevant difficul­
ties encountered by clubs and individual users when utilizing software and obstacles that
prevent clubs to adopt digital methods of scouting. Based on association rule mining,
the thesis offers recommendations for software manufacturers to alter the services of­
fered and how these are grouped in the final product. Moreover, the thesis recommends
further investment in data collection methods in sport regarding the type of data that is
collected and the locations in the globe where the collection is faulty. The investment
should concern the technologies developed since the thesis shows there is a need for
predictive modelling of player performance and more reliable metrics. Finally, the thesis
suggests the adoption of digital methodologies should be processed complemented by
traditional methodologies heavily linked with eye­for­talent and intuition.
Keywords: football scouting, software functionalities, scout, sports analytics
Data Analysis in Football Scouting
Data Analysis in Football Scouting
Data Analytics has become an integral part of Sports worldwide. Football is no exception
and the use of data as a part of the sport goes back further than some would expect. The
first notes taken about game data in football were the ones of Charles Reep. This english
Air Force Pilot was ”fascinated by the detailed descriptions of the tactics that were being
used to create goal­scoring opportunities for Arsenal’s wingers”. [1] This deep interest led
him to develop a notational system for the documentation of every action on­the­ball in a
football game. [1]
Although Charles Reep was a pioneer in the collection of football data, the interpretations
and implications developed were wrong, thus proving that data availability is not enough
for competitive advantage, there has to be an extensive work that goes into interpreting,
visualizing and ultimately derive conclusions from data. The work of Charles Reep was
dismissed by the football community, thus ignoring a valuable part of Reep’s research
which included data collection.
The publication of the book ”Moneyball: The Art of Winning an Unfair Game” and the
release of the subsequent movie ”Moneyball” were essential to bring the concept of sports
analytics to a greater audience. The book and the movie alike highlight the achievements
of Billy Beane as General Manager of the Oakland Athletics who achieved competitive
success on a smaller budget than their opponents by selling players that were overvalued
and recruiting players whose value is understated in the market. Billy Beane’s case was
the spark in the data revolution in sports recruitment and analytics.
The applications of data in football evidenced by Reep were mainly in the subject of tactical
analysis. Since then, the applications of data in football have extended to numerous
aspects of the game with player recruitment being a relevant one. Moreover, the football
industry has witnessed several successful cases of data­focused approaches to player
scouting and decision­making when it comes to acquisition of the player. Such cases
include Barnsley [2] and Liverpool [3].
Several software companies manufacturing applications have emerged in the football an­
alytics market.The range of services performed by these corporations ranges to data on
players, videos an other statistics to data visualization tools in the form of graphics. The
increasing availability of software services for scouting purposes made it so that the price
became accessible to smaller teams with fewer resources to obtain a competitive edge
and balancing their lack of financial resources with data­driven decisions and ultimately
player acquisitions.
Furthermore, these platforms make it easy to share content within a club and enable ob­
servation of players without the need for attendance to games and training. Traditionally,
football scouting was done relying completely on physical observations, notes taken dur­
ing games by scouts and contact with people familiar with the player being observed. The
features brought by scouting software reduce the reliance on physical observations and
thus reduce expenses related to scouting such as travel costs and player transfer fees.
In a study made in the context of Italian football, Radicchi et al found that ”Digital and mul­
timedia tools for talent recruiting will most likely never replace traditional scouting in full,
Data Analysis in Football Scouting
though they may become an innovative and “supplemental” source for the player evalu­
ation process.”[4] Therefore, scouting software does not offer a substitution of traditional
scouting methods, it is instead a complement that allows the user to extract as many
information as possible on players.
Nonetheless, football clubs fail to realize the value added by data­focused methodologies
in player scouting. Radicchi et al found that “Football clubs still need to develop man­
agerial skills and learn how to deal with new technologies to improve the decision­making
process and strategic value of talent scouting.” [4]. Radicchi concludes that one the de­
tractors that influence the resistance from clubs to use software is the old age of scouts
and football professionals who are not adapted to use software [4].
The study developed by Radicchi et al was restricted to italian football but further studies
englobing further geographical contexts may offer different conclusions. For this reason,
analysis on the issues concerning the mass adoption of football scouting software arises.
The study of football scouting software should help clubs and companies alike to realize
the main reasons for non­usage of software, the issues faced by the users and how well
software tackles their needs.
Project Motivation
The motivation behind this project is the analysis of football scouting software, its appli­
cations, the profile of its users, factors influencing its adoption and where the industry
is heading. The passion for football analytics led the author of this project to embrace
the opportunity to investigate football scouting software. Nowadays, decisions in football
clubs are made by intervenients from different backgrounds.
On the one hand, there are the top­management of football clubs who think football as a
business and whose main objective is bringing value to shareholders and positive financial
results for the club. On the other hand, there is the technical staff and most importantly the
coach whose main priority is the competitive success of the team and player development.
Moreover, decisions in football and in particular player transfers are involve increasing
sums of money meaning that wrong decisions have hefty costs. Therefore, it is impor­
tant to align the objectives of decision­makers from all backgrounds as to balance the
competitive success of the team and its ability to generate profit.
Football analytics may bridge this gap between corporate and sports decision­making by
facilitating communication inside organizations and putting as much information as can
be collected in the hands of decision­makers.
Therefore, the outcome of this project aims at helping software manufacturers to adjust
their products to customer needs and at aiding clubs and professionals to realize the po­
tential value in the implementation of digital methods of talent identification and scouting.
The purpose of this qualitative study is to explore the features of football scouting software
applications and to discover which features are necessary to incorporate in order for mass
adoption of football technical staff at professional­level clubs and academies.
Statement of the Problem
Based on the preliminary literature study and research design, the researcher has identi­
fied the following question as being at the heart of the key problem statement:
Data Analysis in Football Scouting
Which data analysis features are missing in football for it to be mass adopted by
Research Questions
In order to find an answer for the key problem statement, a few research sub­questions
were composed as to structure the diverse aspects of the study:
• What do football technical staff believe new additions are necessary to football
scouting applications?
• Which features exist in football applications, which ones are missing and which are
• What is the user’s satisfaction with functionalities present in available software?
• How does the relation between functionalities influence the way software is built?
How do software users perceive the relation between software functionalities?
• Which competitors exist in the football scouting platforms market?
• What are the barriers preventing the adoption of the software? Is the root cause
usability, data availability or quality, distrust in the end product of these software
(statistics and KPIs provided)?
Learning Objectives
It is relevant to outline a specific set of learning objectives. When completing this master’s
thesis, the student should be able to:
1. Arrange and execute a research project with a comprehensive structure that en­
globes project statement, review of literature, methods, analysis and discussions.
2. Compose a motivation to research the implementation of digital methodologies in
football scouting.
3. Enumerate the current literature on football data analytics concerning talent identi­
fication, management and digital methodologies employed in these activities.
4. Scrutinize the concepts of football data analytics and selected theoretical back­
ground for deeper analysis.
5. Establish both quantitative and qualitative research design that allow the answering
of the research problem.
6. Establish contact with appropriate professionals (in football scouting) and search
relevant football analytics companies as to gather sufficient data collection within
the focus of the research.
7. Conduct interviews and analyze answers in order to gather insights on user experi­
ence regarding football scouting software.
8. Perform and analyze an association rule mining method to identify possible patterns
between software functionalities.
9. Present results in a visually appealing, comprehensive, systematic and complete
10. Report the obtained results in a critical way and transparent manner while drawing
conclusions from them.
Data Analysis in Football Scouting
Structure of the Thesis
The structure of this master’s thesis is summarized below, highlighting chapters and sub­
Figure 1.1: Master’s thesis structure
The introduction chapter will provide context of the research problem, offer background
on the subject of football scouting, research problem and learning objectives. The Lit­
erature Review will offer definitions and critical analysis on key concepts related to the
topic of research problems and overview of the talent identification process and structure.
Moreover, the Methods section will cover the Research Design (both qualitative and quan­
titative) and data collection methods. Next, the Findings chapter will provide the reporting
of the results obtained from the methods employed. The Discussion chapter will cover in­
terpretations of the results, answer to the research question and sub­questions and other
relevant take­aways from the research. Finally, the Conclusion will summarize major find­
ings and their meanings, critical analysis of methods employed and recommendations for
future research.
Data Analysis in Football Scouting
Review of Literature
Given the research questions and the research problem at hand, several areas of knowl­
edge emerge as relevant to look for existing theory and how scholars approached such
topics in the past. The review of the literature will approach topics related to football scout­
ing: the process itself and the individuals intervening in it. In addition, an outlook on Data
Analytics in Sports will provide use cases for Big Data in Sport and which data is being
Key Concepts:
In this paragraph, the listing of relevant concepts to this study will be done. Such con­
cepts need to be defined in the theoretical framework as to improve critical analysis over
data collected and results derived. Hopefully the theoretical framework composed will be
sufficient to give an answer to illustrative research questions and deepen the analysis on
future findings and hypothesis:
1. “Sports Scouting” – more specifically football scouting
2. “Talent Identification”
3. “Talent Management”
4. “Data Analysis in Sports”
5. “Big Data in Sports”
6. “Software Features”
Definitions and Critical Analysis
Sports Scouting
a “Talent scouting involves the process of monitoring and evaluation of talented foot­
ball players, whose final stage can result in a player signing a contract with the
scout’s club.” [5].
b “In many sports scouting is the activity where interpretations and assessments about
the potential and prospects of a (usually) young player are made and shared.” [6]
Scouting offers numerous benefits to professional football clubs that strive to find the best
talents: the competitive advantage over other teams in the same competition and the
financial compensation from future offers made for scouted players. [7]
Therefore, the process of scouting is firmly ingrained in the context of football and extends
its influence to all age groups of football players. The process of signing young talent has
become prioritized by football clubs with the objective of developing elite athletes from an
early stage [8]. Football scouting is done by scouts who observe and record data and
use intuitive judgements based on on­field performance [4]. Observations are usually
undertaken repeatedly until a decision is made [9]. Scouts do not only watch for player
Data Analysis in Football Scouting
performance but also communicate with players and their families [7] regularly as to further
assess player’s mental and physical factors.
Bergkamp et al found that the most frequent attributes (for predicting player performance)
were technical abilities, match sense and awareness of surroundings, physiological and
printing speed and the winner’s mentality [10].
When looking at the Key Performance Indicators used by football scouts to evaluate player
performance, HUGHES et al. found the following indicators for goalkeepers and on­field
Figure 2.1: KPIs for performance evaluation of football scouts [11]
Technology opens the way for the collection of a much larger number of parameters. The
observation of ”physical efforts, technical elements and tactical movements”[12] allows
the scout to collect information on the progression of these parameters during the game
for the entire season, the level of the player compared to the team and the level of the
team in comparison with other collectives in the same competition [12].
The usage of media and other digital data collection methods is common in football scout­
ing at senior­level. However, for younger age groups or academy­level, data collection
is done through a variety of qualitative and quantitative indicators on individuals that are
vaguely set by the club’s scouting department and playing style [13].
Data Analysis in Football Scouting
Regarding the definitions found in literature and shown above, a. definition is more appli­
cable to the research subject due to it directly linking football scouting to its final outcome:
the signing of the player and ultimately how it relates to the decision­makers. Moreover,
it is not restricting to one age group and mentions the several stakeholders in scouting:
the athlete, the club and the scout.
Furthermore, the b. definition restricts to a youth player development perspective where
the outcome is not a player signing but the maximization of an athlete’s potential. Despite
youth football scouting having a central role in clubs today, the usage of digital methods of
scouting is not well implemented and therefore offer little insight on how football scouting
software employs data collected by various methods.
Talent Identification
a “Regnier et al. define talent identification as the process of identifying and matching
the physical and task demands of a chosen activity that are congruent with expert
performance. There are a number of different models that have been used to identify
talent and develop expertise in athletes.”[14]
b “Talent identification in soccer is the complex process of recognizing and selecting
players that have the greatest potential to excel in the future.” [15]
c “The ‘coach­driven’ method of talent identification rests on a multifaceted intuitive
knowledge comprised of socially constructed “images” of the perfect player. When
a coach selects a talent, he/she usually has the feeling of doing something self­
evident, logical, and inevitable as he/she distinguishes between different talented
soccer players without being explicit about the generative principles that guide his
observation.” [16].
d “A multi­dimensional approach of talent identification should include a battery of
sport specific skills (e.g., dribbling, shooting, ball control, passing, etc.) in combina­
tion with physical, physiological and psychological tests.” [16]
Talent Identification in sports has become growingly relevant in sports performance [17].
Based on observations of football games, scouts forecast player’s potential performance
in the future to influence decisions on squad management [18].
There are increasing evidences of factors that heavily influence Talent identification and
evolution [19]. Among these factors: relative age [19], growth, maturation and training
age [20] [21] are central examples of talent identification core factors.
At academy level, frameworks have been put in place to identify and develop players.
These frameworks are named TIDS (Talent Identification and Development Systems )
and are structures aiming at performance, influence in physical health and psychosocial
evolution of young players [22].
These systems are proliferated in professional clubs and are mentioned throughout lit­
erature especially when searching for keywords such as “Sports Scouting” and “Talent
Identification”. Despite this fact, their healthiness and usefulness have been questioned
[23] in recent research. Further research on the topic points out the multi­dimensional
biological and emotional aspects of TIDS [23] .
Regarding the four definitions on talent identification, they were found to offer an overview
of how talent identification (in football) has been done throughout the history of the sport.
Since the creation of football, more complex and broad methodologies have been imple­
mented. These evaluate not only physical factors (directly related with the game) but also
Data Analysis in Football Scouting
social factors as to conclude how the player would fit in with his peers in the squad and
the player’s medical history as to theorize if the player suffers frequent injuries and what
type of injuries, he/she is most likely to suffer.
The first definition points at talent identification as a general concept. The most remark­
able aspect of this definition is the statement that talent identification should start at the
establishment of thresholds of performance to look for [14]. The second definition is rather
incomplete and vague in the context of the research object ­ football scouting. The third
definition offers a method of talent identification based on the experience of one of the pro­
fessionals involved in player development which is the coach and thus drifts away from
the scouts’ perspective which is central to the research questions.
Furthermore, I believe the fourth definition is the most appropriate for the research as it
is the one that englobes the most variables and the one that sets the most measurable
metrics. The approach referenced in the fourth definition implies the collection of a mul­
titude of quantitative and qualitative data which is collected through digital technologies
and facilitates the usage of scouting software.
Talent Management
a “activities and processes that involve the systematic identification of key positions
which contribute to the organizations’ sustainable competitive advantage, the de­
velopment of a talent pool of high potential and high performing incumbents to fill
these roles, and the development of a differentiated human resource architecture
to facilitate filling these positions with competent incumbents and to ensure their
continued commitment to the organizations” [24]
b “Talent management develops, employs, and rewards a multiplicity of abilities across
an entire workforce rather than focusing upon a narrow distribution of perceived high
performers. A deep and broad talent inventory is the single best way to mitigate the
risks of an uncertain threat environment and an increasingly competitive labor mar­
ket.” [25]
Talent Management is at the heart of Human Resources (HR) management. Nonetheless,
there is no consensual definition among scholars for Talent Management. Lewis and
Hackman (2006) highlighted three schools of thought regarding Talent Management [26].
The first is aligned with a. definition and defines the concept of talent pools. According to
this school of thought, Talent Management is a range of processes intending to set a suit­
able flow of collaborators [26]. The second school of thought focuses on a ”generic view”
[26], considers talent should be managed according to performance levels [26] and is in
line with b. definition. Such management should not be done by any specific department
within an organization. The third conception of Talent Management considers it as a set
of traditional Human Resources activities [26].
The generic approach to Talent Management is not aligned with the focus of this research.
The research questions aim at tackling possible issues inside a specific department within
a football club which is the scouting department. Moreover, the HR activities described
by the third school of thought are complemented by the a. definition by the introduction
of ”talent pools”.
The reference to ”talent pools” is in line with the football scouting approach on talent
management. Usually, teams keep several players under observation which acts as their
pool. This number of athletes decreases as filters get narrower and constraints such as
financial restrictions arise. All in all, both definitions are suitable to the project in hand
Data Analysis in Football Scouting
but a. encompasses the human resource capabilities needed to maintain a “talent pool”
which in the context of football involves the scouting professionals whose experience is
at the heart of this research.
A team’s talent management policy is closely related to how the team improves its talent
pool and increases the level of their squad. Regarding Talent Management strategies:
football clubs have several approaches they adopt to ensure good financial and com­
petitive performances. Relying solely on the transfer market for acquiring players is an
ineffective choice that carries financial costs [27].
In that sense, teams should adopt a ”make and buy”[27] strategy: by investing in youth
academies that allow to develop talent in­house and focused on personal and athletic
evolution, by using second teams (or B teams) that allow developing players to evolve to
first team players and at the same time researching the market to spot the missing talent
on the team [27].
The best Talent Management strategies carry numerous challenges regarding Talent Man­
agement. The challenges pointed out by Soriano are the following:
1. The lowering of the age at which players are evaluated and recruited: the challenge
here resides in the labor status of a minor who cannot be bound to a working con­
tract, the financial education of teenagers who starting earning enormous sums of
money at a very young age and the normal academic education of these players.
2. The assessment of talent: what criteria are used to spot talent, who is capable of
identifying talent and what methodologies they use.
3. The human resource challenge of managing very well­paid young players: the re­
muneration of the players and its relation to their motivation. [28]
Soriano points out that none of these challenges have been properly answered but out­
lines several actions in the path to solve them taken by FC Barcelona. The first chal­
lenge can be tackled by the establishment of academies or clubs around the World so
that talented players can grow close to home and family. The second challenge may
be tackled by the development of scientific methodologies of identifying talent and their
parallel use with eye for talent methodologies. Finally, the third challenge may be acted
upon by the implementation of variable (performance­driven) remuneration, guidelines in
justifying salaries based on talent and management initiative to adapt salaries to player
development. [28]
Data Analytics in Sports
a “this term can be understood as ’statistics in sports’, and it encompasses data col­
lection and management, predictive modeling, and computational methods that are
used to find valuable information for sport­related decision making.” [29]
b “Sports analytics is the investigation and modeling of sports performance, imple­
menting scientific techniques. More specifically, sports analytics refers to the man­
agement of structured historical data, the application of predictive analytic models
that use these data, and the utilization of information systems, to inform decision
makers and enable them to assist their organizations in gaining a competitive ad­
vantage on the field of play.” [30]
c “Drawing on the definition of big data given by the McKinsey Global Institute sports
big data can be defined as a sports data collection that is so large that it can ac­
quire, store, manage, and analyze far beyond the capabilities of traditional database
Data Analysis in Football Scouting
software tools, including five features: volume, variety, velocity, veracity, and value
Sports analytics attempts at solving challenges in sports science by employing data min­
ing, network science and statistics [32] [33]. The statistical analysis approached in litera­
ture involve technologies such as machine learning, data mining and predictive analysis.
Moreover, the network science approach to sports analytics has produced results in the
relation between the teenagers’ social networks and their behaviour [34], passing net­
works leading to goals and the role of positioning (in the pitch) variables in the final score
Definitions derived from literature point to the management of data and the formulation
of predictive modeling as is the case of a. and b. definitions. In the context of football,
extensive research has been conducted on game results prediction.
Scholars have concluded that Machine Learning is a suitable method for sports results
predictions [35]. However, the same research stated that further machine learning models
with increased accuracy are necessary to answer the increasing volumes of sports betting
and the demand from sports managers that use predictions to form future match tactics.
A framework for football games results prediction has been proposed and it was concluded
that through the use of Artificial Intelligence, predictive models achieve better accuracy
than human experts [36].
Regarding football scouting, there is a need for predictive models in this topic. Results
from player transfers and market value prediction studies have been found to be suitable to
help a scout or a team manager in building teams [37]. Nonetheless, the unpredictability of
football is a constraint to the accuracy of machine learning models [37] and thus increased
accuracy should be pursued.
The definition found as the best fit for this project is the b. definition. Comparatively
to the first definition, c. covers more stages of data analysis and includes more quality
assurance features (as observed by the five features aforementioned). Hence, the c.
definition theorizes data analysis beyond the traditional stages and ensures its coherence,
possibility of further collection and quality of the data. Furthermore, b. definition points
towards the usability of sports analytics by decision­makers such as coaches, football
scouts. For this reason, b. definition is complementary with the aim of the project of
performing a decision­maker centered study involving contact and data collection from
football scouting professionals.
Big Data in Sports
a “Sports big data management mainly applies data management techniques, tools,
and platforms to deal with sports big data, including storage, preprocessing, pro­
cessing, and security.” [31]
Big Data in Sports is deeply connected to Data Analytics in Sports. The data analysis
tools that have been developed in latter years enable analysts to explore the full value of
big data [34]. In the sports industry, big data services range from exercise performance,
health condition data and training metrics. [34]. Big Data Collection is mainly done through
the Internet. Several companies that will be later identified and analyzed provide data on
a number of sports including football.
One major concept that should approached is what type of data is collected for talent iden­
tification. Besides video and physical observation, there is also statistical data collection.
Data Analysis in Football Scouting
Several types of data collected by recruitment departments and software companies may
be outlined: event data, tracking data and physical data.
Tracking data may be referred to as ”the positional co­ordinates of players and ball through­
out the course of a match, which equates to more than one million data points per 90 min­
utes.” [38] In recent years, technological advances have allowed the existence of devices
for collecting positional tracking data [39]. A certainly interesting application of tracking
data is the study of tactical behaviour which refers to how a collective of players position
over time to score a goal while under the constraints imposed by the opponent team [40].
Moreover, event data may be defined as ”what happens on the ball such as passes,
tackles and shots” [41]. Literature suggests data has been used in the past to evaluate
team performance. Player actions such as shots (combined with positional data) have
been used to calculate goal­scoring probability of a team [42]. Furthermore, sequences
of player actions combined with the frequency of such actions and dependencies between
such actions have been analyzed through process mining techniques and have proven
effective in evaluating team performance and influence decision­making from coaches
and technical staff [43].
Finally, physical data may be described as the physical indicators derived from ”their phys­
ical output” [44] such as speed and distance covered. Objective physical data may give
insight into how training should be adapted and planned as to maintain a player’s long
term fitness across a season constrained by travel and a lot of games [45]. Morevoer, the
combined analysis of physical and other types of data may be found in Literature with the
example of studies employing physical data with the goal of analyzing the extent to which
physical requirements (i.e. sprinting and endurance) are position and player specific [46].
Another challenge facing Big Data in Sports is the security risks of data sources [34]. In
football scouting, several clubs often have the same player targets. With this in mind, data
leaks among clubs are something highly risky to player transfers. Scholars have proposed
a framework for secure data collection regarding mainly in blockchain technology [47]. As
mentioned in the above definition, Big Data in Sports mainly involves storage, processing
and security.
Software Feature
a “a product characteristic from user or customer views, which essentially consists of
a cohesive set of individual requirements” [48]
b “a triplet, f = (R,W, S), where R represents the requirements the feature satisfies,
W the assumptions the feature takes about its environment and S its specification”
The employment of the software feature concept in this research concerns the customer’s
point of view and user experience when using football scouting software to answer the
first research sub­question. The first definition aims directly at defining a software feature
through the customer’s eyes. However, it is rather incomplete and fails to address the
interoperation between features. The assumptions mentioned in the second definition
about the context (W) describe the relations between the different features that make
products when combined. To cite Classen et al: “This means that a system cannot be
proven correct by proving each feature separately.” [49] Thus, the definition deemed
the most appropriate in the context of this project is the second definition. Furthermore,
research questions require the analysis of patterns in a sporting context and particularly
among software features.
Data Analysis in Football Scouting
Data mining techniques have been employed to establish patterns and relations among
different variables in sports. In particular, data mining centered on association rules has
been put to the test in tactical analysis of teams [50]. It was demonstrated that statistics
such as ball possession and passing time were sufficient to establish association rules
that evidenced the team’s tactics [50].
Furthermore, association rules have been applied to a cricket context. Frequent patterns
found in cricket match data were found to be useful in aiding coaches decide on game
strategies [51]. However, no literature was found that approached possible patterns in the
football scouting process and software alike.
Scouting Department Structure
The aim of this project also imposes the need for several definitions related to football
clubs’ structure and the manner the recruitment process is carried out. Most professional
clubs have a dedicated department to talent identification and observation. The existence
of this department and the amount of resources both human and capital depends on the
club’s purchasing power. This department is usually named as the recruitment department
but has different nomenclatures from club to club. Similarly, the structure of the football
clubs varies from league to league and from country to country. This being said, it was
found that there was no fixed structure of scouting department or equivalent across all
leagues and competitions. There are however constant hierarchical relationships inside
football clubs and particularly inside the recruitment personnel. The following chart aims
at illustrating a typical hierarchical structure within a football club that would be behind
talent identification and recruitment. It was built by consulting multiple online sources
[52], LinkedIn profiles of scouting professionals and recording their roles.
Figure 2.2: Football Club Recruitment Hierarchy
According to 2.2, the recruitment department is constituted by a range of functions starting
with the head of department. The name of this role varies from club to club. Some clubs
have a head scout, others prefer naming this role the chief scout or head of scouting but
the responsibilities carried out by this professional are very similar at their core. Taking
Data Analysis in Football Scouting
the example of the job description of the job vacancy of Salford City FC (English League
Two club), some relevant responsibilities of a Head of Scouting are the following:
• The heading up of the local, regional and national scouting network to identify play­
ers for Salford City
• Understand The Club’s philosophy on identifying players to maximise talent and add
value to the 1st Team squad
• Ensure all scout reports are accurate, detailed and recorded on club database which
is available to Sporting Director and senior stakeholders at the Club
• Develop a monitoring policy which includes a robust due diligence process on target
• Be pro­active in the search for latest trends and techniques within the field of scout­
ing. [53]
Within the recruitment department, several other roles can be identified. The most com­
mon role is that of the scout. The function of the scout may be described as ”responsible
for identifying players for the club’s developmental academy or first team.” [10] Profes­
sionals undertaking scouting roles usually have had a lot of time playing or coaching in
football and are not usually required to have any formal qualification [7]. Very little is
known about how scouts detect talent. However, it has been found that coaches consider
”player’s speed, play intelligence and learning the game as criteria” [54] to identify talent.
The identification method mostly used is the traditional physical observation of players.
Analysts at football clubs also weigh in on talent identification. Taking the example of
Brentford Football Club, Scouting Analysts’ main responsibility ”is to create, collate and
monitor scouting intelligence effectively, using all available resources within the Scouting
Department.” [55] Across several types of analysis, the resources used by each type of
analyst varies. For example, video analysts conduct an in­depth analysis of videos of
players while data analysts look at the in­game statistics and performance indicators of
scouted players.
The hierarchy of a scouting department is directly linked to how the scouting process is
done. The process starts with observations of players that fit the team’s philosophy. As­
sociated with this, the flow of information regarding talent identification moves towards the
head of department, putting this professional and sports director/manager in the position
to perform the talent selection.
Data Analysis in Football Scouting
Case Study Definition
The design of the research methodology and data collection framework imposes the def­
inition of the case study. The case study chosen aims at scoping the problem in hand
and constitutes the usage of digital technologies and applications in the football scouting
Football teams’ success relies heavily on their talent identification. The search for method­
ologies that streamline the scouting process thus improving the effectiveness of talent
identification are looked upon by football clubs as competitive advantages.
This statement was expressed by Radicchi et al : “The most successful teams are those
that are able to identify the better athletes earlier than their competitors. Finding alterna­
tive, innovative, and less costly “channels” for advance recognition of promising athletes
with the potential to excel in sports might induce clubs to sustain lower investments for
development and training programs and therefore grant them a competitive edge.” [4]
Moreover, an analysis of digital applications directed at football scouting presents itself as
relevant not only to aid football clubs with their implementation but also to supply software
manufacturers with information on what are the scouting professionals’ needs when time
comes to develop new functionalities for their applications.
In addition, the boundaries of the case study were established considering the scope of
the analysis and the phases that usually englobe the process of football scouting. Accord­
ing to scholars, the phases in talent scouting process are the following: talent detection,
identification, development, confirmation, and selection. [56]
As Lazarević et al [57] stated:
1. Talent detection – detection of talent that have certain predispositions, without ac­
tive participation in the sport at hand;
2. Talent identification ­ identification of professionals as well as forecast of achieve­
ments “based on a multidisciplinary approach to assessment of their physical, phys­
iological, psychological, genetic, sociological and technical potentials and charac­
3. Talent development ­ supplying the athletes with tools to evolve and maximize their
potential according to their attributes;
4. Talent confirmation – observation and authentication of the professional’s inclina­
tions to do well in the sport;
5. Talent selection – choosing the player or group of players that will be capable of
successfully perform a series of functions in a sports setting. [57]
Scouting applications provide data­driven functionalities to their clients to simplify the tra­
ditional scouting tasks. “Traditionally, sports team personnel observe competing teams,
compile reports on player weaknesses and opposing teams’ strategies, and gather other
useful information that may generate a competitive advantage (match analysis).” [4]
Data Analysis in Football Scouting
Consequently, scouting software functionalities range from video analysis tools to statis­
tics of in­game player/team performance and player databases that allow filtering between
players according to selected categories.
Taking into consideration the functionalities performed by football scouting applications,
the research and data collection methods will focus mainly on the talent identification stage
which is addressed by the multivariate big data functionalities provided by the applications.
Research Design
In recent years, digital technologies have opened way for new methodologies and pro­
cesses to be put into place as to increase the effectiveness of football scouting.
The variety of platforms that provide tools to aid scouts and other football professionals in
the talent identification process has multiplied which implies the increase of interest and
usage of the technological breakthroughs.
Digital technologies do not position themselves as substitutes to in situ scouts. Nonethe­
less football scouts fail to realize the benefits of data analytics usage in talent identification.
As to shine light on what is resisting the mass adoption of software as a vital scouting tool,
qualitative data research was the method found to be the most appropriate.
On the one hand, the variety of software tools imposes an analysis of the functionalities
present in them. The analysis aims to understand which “talent identification” tasks are
already covered by the functionalities and how seemingly independent features relate to
each other. Therefore, a search and categorization of software features offered by a range
of manufacturers (companies) will be analyzed and the different patterns between them.
On the other hand, the user’s perspective should be taken into consideration as to com­
prehend how end users use the products, what is working and what is not adapted to the
user’s needs.
In this case, the data collection will be done by one­on­one interviews with football scouting
software users. For this reason, the Data Collection Methodology may be divided into two
1. Software Feature Collection
2. Interviews with Football Scouting Software Users
Software Feature Collection
This phase mainly aims at answering the first research sub­question. The software feature
collection aims at listing the most relevant features and the establishment of a reference
inventory of features to include them in the One­on­One Interviews and gather impres­
sions on these specific features.
Additionally, the software feature analysis aims at finding the patterns among the software
features and how they relate to each other and influence each other’s presence across
the range of software products.
The analysis applied to the features of the different softwares was made taking into con­
sideration the products sold by companies who manufacture these softwares who I will
refer to as products and the features/functionalities included in these products (which I will
refer to as features). A total of 40 products were identified after web browser searches
containing keywords such as ”scouting software”, ”football software”, “soccer scouting
applications” and “football recruitment applications”.
Data Analysis in Football Scouting
After visiting a software company’s website, their products, services, and product plans
were analyzed. In case of absence of reference to a certain feature on the software tool’s
website, the specified feature was considered as not present in the software.
Therefore, when visiting the website of each manufacturer, the presence of every specific
feature found was expressed as a binary variable with 1 indicating the presence of a
feature in a specific manufacturer’s software and 0 indicating its absence.
This interpretation seemed appropriate in the context of the data collected given the low
amount of software websites that did not list their features in the Products/Services or
similar section. These variables were recorded on a spreadsheet in which the columns
were respective to the functionalities and the lines were the range of software products.
The software functionalities were separated into a variety of categories following “similarity­
based relations” [58] criteria. These relations require “resemblance or common features;
their identification is based on comparison” [58]. Regarding the case study, the scope
of categorization “occurs within a particular case so that the contextual relationships are
harder to lose sight of.” [58]
Hence, the divisions into categories highlight the talent identification stage and the corre­
sponding tasks performed by the software functionalities. Consequently, categorization
was made recurring to subsequent comparisons of similar functionalities across the web­
sites, functionalities that perform the same task but have different names and functional­
ities that involve the same underlying concept.
For instance, in the case of Team Management, the concept concerns the decision­
making and accountability of members of a team. The activities involved in this concept
may not have evident similarities, but they are often sequential tasks (e.g. the analysis of
the injury situation in the team and the consequent decision on the starting line­up based
on physical condition).
In this degree, the tasks that fall under the Team Management category are intimately
linked to the actual management performed by software users. The categories found are
displayed below as well what they refer to:
1. Scouting Reports: feature that allows the user to build its own scouting reports.
2. Database: feature that contains a player database or allows the possibility for the
user to build his own scouted player database.
3. Match Analysis: feature that allows for a posteriori (after the game) statistical
and/or video analysis of games played by the user’s team.
4. Opponent Analysis: feature that allows for a priori (before a game) statistical
and/or video analysis of games played by opponent teams.
5. Player Comparison: feature that allows to filter players and pairwise compare them
on specific criteria and in­game performance.
6. Team Management: feature that allows for squad management, selection of the
starting line­up, overview of the physical condition of players, injuries.
7. Shadow Teams: feature that allows for the definition of an ideal line­up of players
taking into consideration player’s roles, positions, style of play and team’s game
Data Analysis in Football Scouting
8. AI generated statistics: feature that displays AI­powered performance indicators
teamwise or individual player­wise.
9. Video Analysis: feature that allows the upload, reproduction, edition and video­
tagging of scouted players’ in­game or in­training videos.
10. Player Registration: feature that allows the management of the player’s labor and
legal status, inscription in the team’s competitions and other administrative tasks.
11. AI generated performance prediction: feature that allows to predict player per­
formance and evolution recurring to machine learning algorithms with scouts’ input
as variables.
Interviews with Football Scouting software Users
This phase aims at answering the second and third research sub­questions. The contact
with scouting staff and professionals involved in player recruitment is the closest possible
to obtaining valuable perspectives on how the end users needs are being met.
Moreover, the interview process is expected to shine light on the areas of scouting that
can be optimized, difficulties encountered by the end users and ultimately what holds
them back in the adoption of such softwares.
Qualitative data research concerns processes and deep meanings. One of the most com­
mon methods to collect qualitative data are one­on­one interviews. One­on­one interviews
should be interactive as to maximize the exchange of experiences between the intervie­
wee and the interviewer.
Interactivity can be drawn from the spoken format of the interview which allows for un­
scripted topics to be brought up, thus increasing the range of answers obtainable. As
Frances et al [59] stated “qualitative interviews have the advantage of being interactive
and allowing for unexpected topics to emerge”.
Written interviews were structured and carried out with scouting staff to draw insights from
their experiences. The targets of the interviews are the users that habitually use football
scouting software inside organizations such as football clubs and scouting companies.
These include coaches, football scouts and roles more involved in the final decision­
making as is the role of the Head of Scouting or Sports Director.
The interview targets were identified and approached through LinkedIn and filtered taking
into consideration their role inside the organization and the competition in which their club
competes. This last criterion was employed to avoid obtaining a significant part of the
answers from one single background or league. The conduction of the interviews was
done via Zoom meeting and the interviewees will remain anonymous as to avoid social­
desirability bias [60] and thus obtain true feelings and thoughts not influenced by society
The information displayed allows to get an overview not only on the scouting professionals’
roles but also the tools they use and which competitions their clubs play in. The relevance
of the club competitions allows to paint a picture of the geographical areas they operate
in and financial power of their employers.
Data Analysis in Football Scouting
Figure 3.1: Overview of Interviewees’ Profiles
Figure 3.2: Overview of Organizations and Tools Used
The sample size of the interviews carried out was not set at the beginning of the inter­
view process but was rather limited by the saturation of the answers obtained. Once it
became evident that similar or same answers were repeatedly given to questions, further
interviews did not pose additional value and the interview process was ceased.
The format of the interview was semi­standardized, using a combination of open­ended
and close­ended questions thus taking advantage of the flexibility of the format and the
possibilities to prospect unprompted issues raised by the respondents derived of their
experiences: ”The flexibility of the semi­standardized interview allows the interviewer to
pursue the exploration of spontaneous issues raised by the interviewee to be explored.”
One­on­One Interviews Structure
1. What is your department (formal name)?
2. What is your role (formal name)?
3. Which league does your team compete in?
4. Do you use any software that aids in the scouting process/talent identification? If
yes, which ?
Data Analysis in Football Scouting
5. Do you think there are any barriers today that prevent clubs from using scouting
software? Which?
6. Which format is the data from the scouting process stored in? Is it stored in sheets
of paper, Word, Excel Sheets, etc.?
7. Do you feel that the data provided by the data providers have quality, reliability?
8. Would you feel safe trusting scouting data with a third­party platform?
9. In the study made of available software, we identified several groups of functional­
ities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.”, 2 being ”Not very useful./Low usefulness.”, 3 being
”Moderately useful.”, 4 being ”Very useful” and 5 being “Extremely useful.”
(a) Scouting Reports
(b) Player Database
(c) Match Analysis
(d) Opponent Analysis
(e) Player Comparison
(f) Team Management
(g) Shadow Teams
(h) AI generated statistics
(i) Video Analysis
(j) Player Registration
(k) AI generated Player performance prediction
10. What areas of the scouting process do you find repetitive or that do not add value?
11. What type of functionalities/information do you think could exist or be used to in­
crease scouting accuracy/efficiency or player potential evaluation?
Reliability Measurements
The conduction of the interviews was a conventional one where the interviewer controlled
the flow of the interview and the topics that are raised. Technical difficulties that may
happen during Zoom meetings as is the example of frozen video, poor internet connec­
tion or poor­quality audio were addressed by regularly pausing the interview giving the
interviewer time to take note of the answers and solve any technical anomalies.
To ensure the consistency of the answers and the sample, answers from different intervie­
wees will be compared as to find out how different the answers are. Once that difference
becomes tenuous and answers become repetitive, then it is time to re­evaluate the con­
tinuity of the interview process.
Data Collection
This phase will target the two data collections employed: the software feature analysis
and the one­on­one interviews. Therefore the procedures will differ considering the type
of data analyzed:
Data Analysis in Football Scouting
1. Football Scouting Software Features Collection
2. One­on­one Interviews Answers Analysis Method
Football Scouting Software Features Collection
After the conclusion of the web search and construction of the dataset, preliminary analy­
sis of the data was employed. The goal of the preliminary data visualization tools was to
draw an overall perspective of the most common features across the manufacturers. For
this reason, a dataset was inserted in a Python Script as to build data visualization and
graphics that provide a general overview of the features and to detect the most prevalent
Figure 3.3: Prevalence of Features across Software Products Source: Own study
The above bar chart indicates the most prevalent features are databases, scouting reports
and player comparison with these being offered in 30, 26 and 21 of the products analyzed
while Player Registration is the least prevalent only being offered by 3 of the products
Lastly, an association rules analysis was employed as to identify patterns and/or co­
occurrences in apparently independent information repositories such as transactional and
relational databases.
”Similar to the idea of correlational analysis (although they are theoretically different), in
which relationships between two variables are uncovered, ARM (Association Rule Mining)
is also used to discover variable relationships, but each relationship (also known as an
association rule) may contain two or more variables.” [61]
The first step was to develop an a priori model. This model aims at finding the most
frequent itemsets (features and combinations of features in this case) by ”calculating
rules that express the probable co­occurrence of items within frequent itemsets” [62]. The
model has three fundamental components:
1. Support: It is the likelihood of an event to occur or two itemsets (events) occuring
Data Analysis in Football Scouting
2. Confidence: It is a measure of conditional probability, if one event occurs the other
one occurs too.
3. Lift: It is the probability of all items occurring together divided by the product of an­
tecedent and consequent occurring as if they are independent. The Lift component
can be formulated in the following manner: How more likely the joint events are to
happen than just one of those singular events.
The following step of the association rules analysis was to use the most frequent com­
binations of features and identify the if­then statements that constitute the most relevant
association rules. In our case, the association rules deemed significant were the ones
with lift greater than one which positively influence the ocurrence of other features. A few
concepts are relevant to the following step of the analysis. As IBM declares:
1. A lift value greater than 1 indicates that the rule body and the rule head appear more
often together than expected, this means that the occurrence of the rule body has
a positive effect on the occurrence of the rule head.
2. A lift smaller than 1 indicates that the rule body and the rule head appear less of­
ten together than expected, this means that the occurrence of the rule body has a
negative effect on the occurrence of the rule head.
3. A lift value near 1 indicates that the rule body and the rule head appear almost as
often together as expected, this means that the occurrence of the rule body has
almost no effect on the occurrence of the rule head. [63]
The findings are expected to show how the packages provided by companies are com­
posed in terms of features. This fact allied to the evaluation of usefulness of the features
done by the interview targets helps to form a picture about the bundles of features offered
as opposed to the bundles of most useful features as expressed by the software users.
Furthermore, it helps to realize whether the needs of the users are being met or whether
there is a mismatch between what is offered and what is sought after by the end user.
One­on­one Interviews Answers Analysis Method
The method used to analyze the answers was the Qualitative Content Analysis which
can be defined as ”a research method for subjective interpretation of the content of text
data through the systematic classification process of coding and identifying themes or
patterns” [64]. As part of this method, it is very important to adjust the answers to a
”model of communication” [22]: this carries the definition of which part of communication
conjectures will be formed.
In this case, the conjectures will revolve around the interviewees’ experiences, feelings
towards software and day­to­day tasks. Therefore, the unit of analysis chosen is the
individual experiences and feelings of scouting professionals interviewed towards football
scouting software.
After collection of data and considering the object of the research questions, it becomes
necessary to interpret the words and expressions present in the interviews. Hence, the
generation of categories was done recurring to a mixed inductive and deductive category
development as to categorize the data obtained from the interview process.
Research questions and sub­questions suggest the following initial categories: Features,
Scouting Tasks, Barriers to Adoption, Used Software, Software Problems, Scouting Data
Storage Format and Unused Information. After the collection of ”10­50 %” [22] of interview
Data Analysis in Football Scouting
answers, these categories were re­evaluated to fit the given answers and substituted in
case of ill­fitness to such answers.
Within the category of Features: sub­categories were developed to address the cate­
gorization of the features according to their specific use and their degree of usefulness.
Barriers to Adoption sub­categories were developed to target the causes of these bar­
riers. Scouting Tasks sub­categories were developed as to reflect the repetitiveness or
value added by the tasks. Software Problems sub­categories were developed to target
the specific problems pointed out by respondents.
Checks for reliability of the interviews were also carried out especially in what concerns
the number of interviews collected, the variability of answers was considered and the open
and honest communication with respondents. The interview recordings were transcribed
recurring to online AI voice­to­text converters that produce PDF transcripts of the audio
files. After this transcription, proofreading was done to ensure the transcripts were reliable
in comparison with what was said by the interviewees.
To ensure validity of results, a ”negative case analysis” [65] methodology was used. Es­
sentially, this methodology involves the analysis of individual answers of respondents that
do not fit patterns identified in the rest of the answers. The analysis of such cases aims at
finding the underlying explanation for such a difference in answers to ”reduce the threat
of researcher bias” [65] and put the validity of results to the test.
The interviews conducted demanded the interviewees to classify the software functional­
ities identified on a scale of 1 to 5 according to their usefulness. The usefulness of the
functionalities is directly related to the relevance of these functionalities to each of the re­
spondents’ day­to­day tasks. The aim of this question was to find out which functionalities
the software users considered most important to their job and ultimately to the scouting
In order to analyze the classifications given by each respondent, the values were inserted
in an Excel Sheet where the column names were the Functionalities, the rows were the in­
terviewees identification and the cells were values from 1 to 5. These functionalities were
examined to find the ones with the best and worst classifications. Moreover, the most
common bundles of functionalities identified in the analysis of available football scout­
ing software and the ones surfacing in the association rules were scrutinized to discern
their usefulness level. The results will be shown in charts as to make information more
The research sub­questions suggest the enumeration and interpretation of barriers and
features that should be present in the football scouting software. Therefore, the interpre­
tation of results will approach these factors according to their prevalence across answers
and their singularity. Moreover, the analysis will also provide results from which conclu­
sions may be derived regarding the user’s satisfaction with current functionalities offered.
Additionally, the results will take into account the way the software users/interviewees
perceive the most significant patterns among functionalities.
Data Analysis in Football Scouting
Scouting Software Feature Analysis
Scouting Software Tools Feature Analysis
Following the methodology aforementioned, the first step was the generation of the a priori
model. The table below allows for an understanding of the most frequent itemsets, which
in this case are the most common features or combinations of features to appear across
the software products analyzed.
Figure 4.1: Most frequent itemsets
The minimum support parameter was set to 0.1, so the a priori algorithm will reduce
the itemsets analyzed to solely the ones with a likelihood of appearing that is superior
to 0.1. Moreover, it was found that [Database, Scouting_Reports] had a probability of
50 % of appearing together in software products, the combination [Scouting_Reports,
Player_Comparison] had a probability of 37.5 % of occurring and [Player_Comparison,
Database] had a probability of 40 % of occurring in the software products.
Parallelly, the combination [Video_Analysis, Database] had a probability of 30 % of be­
ing present in the software services of the manufacturers. The following step was the
implementation of the association rules model.
The minimum threshold parameter indicated the minimum confidence of 90 %. The con­
fidence signals the likelihood of the consequent happening, assuming the antecedent
Data Analysis in Football Scouting
already happened, which in essence measures the reliability of the association rule [63].
The value of 0.9 was obtained by varying the minimum confidence parameter as to obtain
a satisfactory set of association rules considering lift and support. Below we may observe
the different data points corresponding to association rules for several values of minimum
Figure 4.2: Sensitivity Analysis to Association Rules
The variation in the transparency of the markers signaling the association rules is asso­
ciated with the overlapping of data points. The more opaque the markers are, more data
points are overlapping and have the same values of support and lift. The association
rules with confidence between 0.8 and 0.85 either present low lift values and high sup­
port, meaning the lift values are low (close to 1) or present high values for lift but very
low support, in many cases very close to the minimum support of 0.1. The association
rules with confidence between 0.85 and 0.9 have mostly lift values close to 1, while the
association rules with confidence superior to 0.9 register the highest values of lift.
Therefore, the set of association rules with confidence superior to 0.9 was found to gather
the best suited values of support and particularly lift while maintaining a good­sized amount
of association rules. In the following graphic, the association rules, in which the lift is su­
perior to 1, are displayed. Each line links one antecedent to one consequent. Moreover,
each line and respective color correspond to one association rule.
Data Analysis in Football Scouting
Figure 4.3: Overview of Association Rules of Software Functionalities
One conclusion that is evident is the prevalence of Scouting Reports as a common con­
sequent indicating that the combinations of features displayed in the antecedents column
appear more often together with Scouting Reports than expected and the occurrence of
the displayed antecedents has a positive effect on the presence of Scouting Reports. All
in all, 21 association rules were filtered to carry on the analysis. The threshold of lift supe­
rior to 1 was put in place as to select only the association rules which expected confidence
exceeds its confidence.
As seen in 4.4, the highest lift values surpass 3. This means that the bundle formed
by antecedent and consequent is more 200 % more likely to happen than the indepen­
dent antecedent. Association Rules including Match Analysis have the particularly high
lift values, even reaching 3.6 when put together with Database as can be seen in Fig­
ure 4.4. Moreover, the highest lift is 4 and corresponds to the association rule [(Shadow
Teams, Team Management),(Scouting Reports, Match Analysis)] expressed in the form
[antecedent,consequent]. The following graphic presents the lift values of each associa­
tion rule.
Data Analysis in Football Scouting
Figure 4.4: Overview of Association Rules Software Functionalities ­ Lift
Association Rules analyzed have very high confidence levels, indicating high causality in
them as shown in Figure 4.5. A confidence of 1 is a frequent occurrence, meaning there
is 100 % likelihood the consequent appears knowing the antecedent is present. The
following graph displays the confidence parameters of the association rules analyzed:
Data Analysis in Football Scouting
Figure 4.5: Overview of Association Rules Software Functionalities ­ Confidence
Furthermore, a closer look is taken at the support parameter of the association rules an­
alyzed. Besides the association rules support signifies the probability of antecedents and
consequents happening together [63]. The support values are considered as being satis­
factory and thus the association rules are deemed strong to proceed with further analysis.
In particular, the association rules which support is higher than the minimum support of
10 % are the most interesting for analysis. The highest support is 0.23 as can be seen in
Figure 4.7:
Data Analysis in Football Scouting
Figure 4.6: Overview of Association Rules of Software Functionalities ­ Support
Finally, the figure below shows the antecedent and consequent support of each associ­
ation rule. They were numbered in accordance with the legend in Figure 4.3. The most
notable observation would be the high values of consequent support as opposed to an­
tecedent support.
Figure 4.7: Overview of Association Rules Software Functionalities ­ Antecedent/Conse­
quent Support
Data Analysis in Football Scouting
Lastly, we should select which association rules are most interesting and significant among
the ones considered. It was observed that Database and Scouting Reports appeared in
75 % and 65 % of software services analyzed. Additionally, they appear as consequents
very often which is due more to their prevalence across the dataset than to real causal
relationships with other functionalities. For this reason, the association rules that involve
Database and Scouting Reports as individual itemsets either in the place of antecedent
or consequent will be removed to display further results. These are displayed in the table
Figure 4.8: Most significant association rules
One key take away from the Figure 4.8 is the frequency with which Match Analysis appears
as a consequent. The most significant association rules shown above will be compared
with the usefulness in each of the same functionalities as expressed by the interviewees.
This aims to assess the level of satisfaction of the user with the bundles of functionalities
most relevant among software functionalities.
Software Users Feature Analysis
The study was directed at analyzing the bundles of functionalities to which were given the
highest values of usefulness. Therefore, the following bar chart presents the classification
of each Software Functionality by each Interviewee. Interviewee 13 did not classify a few
software functionalities due to lack of usage of such functionalities.
Data Analysis in Football Scouting
Figure 4.9: Overview of Functionalities Classifications
Due to the fact that some interviewees did not rate every functionality, the functional­
ities usefulness will be expressed in average usefulness as rated by every user inter­
viewed.The following chart allows for an understanding of average classifications of all
software functionalities.
Figure 4.10: Software Functionalities Average Usefulness
Figure 4.10 allow to outline which are the most useful functionalities: Database, Scouting
Reports, Video Analysis and Shadow Teams. After this finding, the interviewees were
asked via message on LinkedIn if there were any relationships between this functionali­
ties and what the reason is behind their high values. Only a few interviewees answered
Data Analysis in Football Scouting
this last approach namely Interviewees 7, 9, 10. All the interviewees agreed these func­
tionalities were central to the daily tasks of scouts. Interviewee 7 went one step further
and stated: ”Basically, those 4 features are the basis of scouting itself. I believe that all
clubs that work well in scouting use these features in some way.” The next step involved
taking a look at how software users regarded the bundles of functionalities present in the
most significant association rules displayed in Figure 4.8. The figure below shows the
percentage of users that gave classifications of 4 or 5 (high to very high usefulness) in all
functionalities belonging to the bundles shown in Figure 4.8.
Figure 4.11: Percentage of Users who consider the bundles of functionalities (in significant
association rules) as highly useful
Notably, the majority of interviewees does not consider the groups of functionalities as
highly useful, indicating the causality imposed by software functionalities is not regarded
as very useful by software users. This observation is supported by the low classifications
given to functionalities such as Team Management present in the bundles represented in
green and red in Figure 4.11.
Further analysis imposed comparison between most common functionality bundles as of­
fered by the software and the classification given to those bundles by the software user­
s/interviewees. This comparison was done in the hopes of obtaining results that showed
how the software users regard (in terms of usefulness) the most frequent groups of fea­
tures appearing in software.
Therefore, the most common functionalities bundles were computed and their minimum
probability of occurrence was established at 20 %. This threshold was set taking into
consideration the size of the dataset. When setting the minimum probability to 15 % , 38
software bundles were made available and when setting the same parameter to 10 %, 79
bundles were made available. For this reason, the minimum probability of 20 % was the
one that gave the best amount of bundles with 38 and 79 being too extensive to analyze
profoundly. The following pie chart displays the most frequent groups of functionalities
found across available scouting software:
Data Analysis in Football Scouting
Figure 4.12: Most frequent bundles of functionalities present in software
The analysis conducted resulted in the calculation of the percentage of respondents that
evaluated the most common bundles (shown in Figure 4.12) as very useful to extremely
useful (values of 4 and 5). This is presented in Figure 4.13. Besides the percentage of
respondents that evaluated the same most frequent bundles as not useful to moderately
useful (values of 1 to 3) is presented in Figure 4.14. The bundles not displayed had a
percentage of 0 % of users that evaluated them with values from 1 to 3 in usefulness.
Figure 4.13: Percentage of Users considering the most common bundles as highly useful
Data Analysis in Football Scouting
Figure 4.14: Percentage of Users considering the most common bundles as not useful to
moderately useful
One­on­One Interview Answers Analysis
The interviewees will be referenced by number as to maintain anonymity. During the
interview process, there was the inclusion of one question inquiring about data reliabili­
ty/quality available in scouting software. This alteration is due to observations made by
one interviewee, mentioning concerns in the reliability of statistics and other data in soft­
ware. [66] These observations were reiterated by the next two interviewees and therefore
reliability of data surfaced as a relevant topic to approach during the interviews.
After the majority of interviews were done, the transcripts were analyzed and the initial
codes were used to categorize expressions, words and other forms of speech produced by
the respondents. The coding phase is iterative so it is expected that the number of codes
will be adjusted to reflect their frequency throughout the interviews and their importance
as well as the development of sub­categories. Each group of codes or categories was
given a separate table that holds the expression itself, the category and the sub­category
it belongs to.
Used Software
Interviewees were asked to name which scouting platforms they used. The categorization
of used software was done taking into consideration what functionalities they provided
for the interviewees. The bar chart below shows the number of users employing each
Data Analysis in Football Scouting
Figure 4.15: Used Software
Below we may see the table of used software and respective sub­categories.
Figure 4.16: Scouting Software Categorization
The sub­category of Data Providers holds software tools whose core functionality is the
supply of data in the form of statistical indicators and video of player performance. The
name of this sub­category was drawn from interviewees themselves as they referred to
this software as data providers. This sub­category is the most frequent one among the
interviewees as displayed in Figure 4.15: Wyscout, InStat and StatsBomb.
Moreover, the sub­category of Video Editing englobes software tools whose main func­
tionality is the edition and tagging of in­game or in­training video excerpts of players.
Furthermore, the sub­category of Data Visualization holds tools whose prime functionality
Data Analysis in Football Scouting
is the production of data visualization tools in the form of graphics that showcase statistical
performance of players data and data analytics charts.
Reporting tools englobed by the sub­category of the same name allows for the production
of customizable scouting reports of players.
Finally, the sub­category Internal englobes internal software tools produced for specific or­
ganizations/clubs whose functionalities are diverse and centralized in these tailored plat­
forms. No specific internal platform names names were mentioned by the interviewees
due to discretion concerns.
Barriers To Adoption
The interview carried out held a question directly targeted at finding Barriers to Adoption.
These signify factors that prevent the usage of software by clubs or scouting agencies. As
to explore the nature of these barriers, several sub­categories were created accordingly
to the input given by the interviewees. The following table holds categories and sub­
categories and their respective expression:
Figure 4.17: Barriers to Adoption
Going into the sub­categories:
1. None: sub­category represents inputs where no obstacle to scouting software adop­
tion was identified.
2. Cost: represents inputs where the cost associated with scouting software was
pointed as a barrier to its usage.
3. Lack of Professionals: sub­category is representative of the absence of scouting
professionals knowledgeable in the various available software.
4. Data Coverage: sub­category represents the shortage of data available regarding
leagues, competitions and players in various parts of the world.
5. Generational Gap: represents the reluctance or difficulty experienced by older
scouts when using scouting platforms.
Data Analysis in Football Scouting
Negative Case Analysis
The interviewer found that some interviewees noted there were no barriers to the adoption
of software. Moreover, most interviewees enumerated a single barrier, adding that there
was a widespread of software usage in the scouting industry. The respondents who did
not identify barriers were Interviewee 2 and Interviewee 6.
Both of these interviewees were young and worked for organizations that competed in the
Eredivisie (Dutch First League) and Polish First League. For this reason, they had access
to various platforms and operated in well­documented contexts where an abundance of
data is gathered and reliable. Thus, they did not face the barriers pointed out by other
interviewees. The context of the clubs and subsequent purchasing power and availability
of data were distinguished as core reasons for absence of obstacles identified.
Scouting Data Formats
The interviewees were asked about which formats they used to store the data result­
ing from the scouting process. The data varied from scouting reports, videos, player
databases and data visualization tools such as graphs. The category of Scouting Data
Formats will be divided into sub­categories given the name of the format information is
stored in or platform in which it is stored in.
The mentioned data formats were analyzed considering the number of interviewees that
use them and the degree of digitalisation of clubs. The role of each interviewee has to be
taken into consideration as well because the tools used are intimately related to the tasks
performed by every role in the recruitment department. The table below was generated
as to provide an overview of the aforementioned subjects of analysis:
Figure 4.18: Overview of Scouting Data Formats and Platforms Used
Data Analysis in Football Scouting
Negative Case Analysis
PSD stands for ProSoccerData and is an online management tool for clubs that allows
for consolidation of communication, reporting on player development, creation of players
profiles and scheduling of club activities. [67] When Interviewee 5 was interviewed, he
mentioned PSD as an ”an industry standard”.
However, throughout the answers obtained, it was mentioned only once. PSD has a
worldwide usage being used in ”21 countries by more than 500 organizations” [67] and
covers many functionalities but allows integration of other tools displayed in their website
that range from video editing tools such as Hudl SportsCode and video data providers as
is the case of Ortec Sports.
Although PSD gathers a lot of functionalities in one place, other data storage formats/plat­
forms are less costly as is the case of Excel and Word or even open­source/free such as
R. A promotional offer on the website shows that a 2­month trial of PSD for organizations
is worth 462 euros [68] leading to a monthly cost of 231 euros per organization .Mean­
while a monthly package of Microsoft 365 Business Premium (including Word and Excel)
costs 18,60 euros per user [69], meaning a scouting department would have to have 12
members to pay as much for Excel as for PSD. Varying from club to club, the employment
of 12 members in the scouting department is not something viable for most clubs taking
budget into account.
Interviewee 5 collaborated with a Dutch 1st Division club, granting a bigger purchasing
power than most organizations/clubs. Thus, clubs with lower budgets for such platforms
opt for other formats, explaining Excel as the most common storage format.
Scouting Tasks
The interviews included a question aimed directly at asking which tasks performed by
scouting professionals were repetitive or added no value to the scouting process. The
initial sub­categories reflected the question by dividing the expressions into Non­Value
Adding Tasks and Repetitive.
However, the transcripts expressed by the interviewees varied between repetitive tasks
and labor­intensive tasks. For example, it was referred that tasks such as reporting and
watching games are repetitive and demanding but add value and insight to the scouting
process. [70] [71] [72]
Therefore, the development of sub­categories into Laborious and Repetitive tasks was
found to be a better suit to the answers obtained. Moreover, the interviewees who did
not classify any tasks as repetitive or labor­intensive fall within the sub­category of None.
The table displayed below shows an overview of expressions employed to classify chosen
scouting tasks:
Data Analysis in Football Scouting
Figure 4.19: Overview of Scouting Tasks
Figure 4.20: Overview of Scouting Tasks
Negative Case Analysis
Some interviewees pointed out that none of the tasks involved in scouting were repetitive
or non­value adding. The adjectives repetitive and non­value adding may be regarded as
negative which could make interviewees hesitant about associating the tasks they perform
to adjectives they regard as negative. As Interviewee 1 points out: ”It’s a difficult question
because I like it and can’t say it’s some point of negative or not.”
In addition to this, the repetitiveness of scouting tasks is not regarded as non­value adding.
For example, watching a video several times is found to offer more complete perspectives
and attention to details that would have not been detected with one single viewing. In
essence, repetitive does not mean non­value adding.
Data Analysis in Football Scouting
Software Problems
The first three respondents mentioned concerns referring to reliability and quality of the
data supplied by software data providers. Recognizing the relevance of the matter of data
reliability to the interviewees, a question was inserted as to target concerns about data
accuracy while maintaining an open­ended aspect as to allow respondents to express
their concerns on further areas of scouting software. Therefore, the following intervie­
wees were asked about the data reliability and what other limitations they found in the
information provided by the available software. The table below offers an overview of the
issues mentioned by the diverse software platforms:
Figure 4.21: Overview of Software Problems
Figure 4.22: Overview of Software Problems
Data Analysis in Football Scouting
Negative Case Analysis
The negative case analysis will be conducted on the interviewees that did not identify
any problems or limitation with the present software available. The reasoning behind
this is the consideration that partly unreliable data is better than no data. The scouts
and data analysts at football clubs realize that data is an important component of talent
identification and that the existence of data is in itself an advantage towards traditional in
situ observation of in­game events and player performance. As pointed out by Interviewee
10: ”it’s better to have those data with some limitations than nothing”.
Moreover, there is the consideration that the error margin of software manufacturers’ data
is acceptable and the methods for collecting data have become sufficiently improved to
produce good, relatively accurate data on top of which analysis can be conducted and
conclusions drawn. As Interviewee 13 adds: ”the quality is fairly good. It’s um, computer,
how do you say, I mean, the computer pre­tags and then the human, a qualified, trained
human is going over it. So the quality is surely higher than if, as an example, if I send a
scout to a match”.
Finally, there is also the case for the context of utilization of the data. The interviewees
who were satisfied with software, namely Interviewee 7, 11, 13 collaborated with clubs
in professional senior leagues in european clubs. This fact gave them a fairly complete
coverage and reliability of data to work with, making it possible to avoid problems present
in other geographic locations or outside the most income­generating leagues.
Potentially Useful Information
The last question of the interview required of the interviewees comments on which infor­
mation or data they would find relevant for the software products to have. This question
aimed at finding which functionalities were absent from software platforms available. The
table below offers an overview of the answers given by the interviewees on this topic.
Figure 4.23: Overview of Potentially Useful Information
No negative case analysis was performed for this categorization as it does not present
conflicting findings and merely the enumeration of different potential useful information to
include in software as expressed by interviewees.
Data Analysis in Football Scouting
This chapter will reflect upon interpretations and meanings that may be derived from the
analysis described earlier. It will start by answering the research question and will later
explore several conclusions that pose as answers to the sub­research questions and are
therefore complementary to the research question.
The research question required an answer containing which functionalities were missing
in football scouting software for it to be mass adopted by clubs. One the one hand, the
results derived from the one­on­one interviews point at several types of data missing in
scouting software. On the other hand, the results also show examples of functionalities
and even software programs considered to be relevant for the future of football scouting
The types of data touched on by interviewees were Personality data on players (as evi­
denced by the sub­category of the same name), Normalized Performance Indicators and
a larger range of Data Coverage.
As can be seen on Figure 4.23, the most frequent sub­category mentioned was Person­
ality. This sub­category refers to information regarding psychological factors of football
players: namely personality traits, behavior on the pitch and relationship with teammates
and coaches. Since football is a collective sport, the interaction between players and
other active parts in the game is relevant to team performance. To describe this type of
data and how it is observed, Interviewee 6 states ” uh, behavior, you know, uh, to watch,
to, to have all the data of what he looks like when he misses a shot when he’s talking to
the referee”.
A team where the players have positive interactions and there is a good environment are
more cooperative and resilient to problems in the dressing room. Therefore, assessing
a player’s personality and finding out if he is a positive influence in the team is key for
recruitment purposes. Additionally, professional football players work outside their home
country. Understanding how the player copes with being away from home and away from
family. To illustrate the importance of personality in football, one of the interviewees stated
”We all know very talented players who didn’t make it great because they didn’t have the
emotional capacity (...)” [73]. Additionally, the same respondent added which information
scouts look for in a player’s physcological factors: ”You need to establish, is this a boy that
can go away from home, get along with a landlord or landlady, cope with loneliness, deal
with disappointments. ” [73]. As of now, there is no data gathered and made available in
software on a player’s personality or behavior.
The complement of Personality Data on software is congruent with the tasks performed
by scouts who not only observe players performance but also maintain personal contact
with them and their families to assess psychosocial factors of the athlete [7]. Currently,
insights on player’s personality traits are obtained by going to the pitch and talking to
locals and coaches [73]. Moreover, Interviewee 11 highlights that personal contact grants
information that data cannot provide such as ”type of player he was, was he a gambler,
was he a drinker, did he train well, did he have a good attitude towards his teammates,
was he well liked within the group.”
Besides, Physical Data was mentioned as potentially useful. This sub­category repre­
sents the need for better accuracy in data regarding player’s physicality. As mentioned
Data Analysis in Football Scouting
in the literature review, the physical output [44] of athletes is collected for talent identifi­
cation. Particularly, scouts mentioned that the availability of physical data is insufficient
and its integration in video analysis is key. This integration would be time­saving for video
analysts who have to check in different platforms, the physical characteristics of a player.
As mentioned by one interviewee:”some of the clubs send us requirements that they need
height and speed and this is what we have to check all the time:” [74].
Furthermore, there was also mention of the sub­category of Normalized Performance In­
dicators. This sub­category aims at pointing out the absence of standardized performance
indicators with these being usually considered as isolated measures. Interviewee 4 men­
tions that when watching a video of a player’s actions in a game, there is no mention of
how well the player performed in that game. This may influence (positively or negatively)
the assessment of the player’s ability which is done solely according to the performance
observed by the scout on a specific occasion.
Interviewee 13 suggests metrics aimed at evaluating the value added by a player’s actions
to his team: ”a plus minus (...) on­ball value system.” Plus minus (PM) systems are
used ”to distribute credit for the performance of a team as a whole onto the individual
players appearing for the team.” [75]. Essentially, PM measures an individual player’s
contributions in a team’s success by the differential in a certain target statistic (like goals
scored) and does not take event data and actions made by the player [76]. Plus­minus
value systems have been applied to several sports, examples are American football in
order to estimate the value of the player’s in­game performance [77] and Hockey as to
recognize key players [78]. Plus­minus systems models have not been widely adopted
in football [76], therefore Interviewee 13’s suggestion is reflective of the amount research
done on the topic of plus­minus systems. The same interviewee also points out the need
for research in the area of which indicators contribute to successful transfers such as
”playing time, team level and age” [79].
Besides, the sub­category of Coverage corresponds to the need identified by Interviewee
12 of further data availability and better accuracy of data (more specifically video) from
less prominent geographical locations in the context of football such as is the case of ”the
continent of Africa.” [80] This is coherent with the findings of Akindes et al who states there
are very little talent development structures in Africa [81]. Wider Data Coverage would
provide access to talent in parts of the World that are ”peripheral”[81] when it comes to
football and would be cost­effective for clubs who would not need to send scouts to phys­
ically observe players in those parts of the World. Additionally, one long­term alternative
for wider data collection in Africa resides in the Talent Management policy of building of
football academies that would allow football clubs to observe closely african talent and
monitor its development from a young age.
Diving into functionalities desirable by interviewees, there was mention of need for a cen­
tralized platform that allowed for consultation of data from several platforms (here rep­
resented by the sub­category Centralized Platform). This was made evident by Intervie­
wee 10 who gives the example of ”key passes”. The respondent adds that this indicator
has different calculation methods across distinct data providers and the existence of a
centralized platform that aggregated the same information from different providers would
contribute to a bigger ”precision of information.” [82] Centralized Platforms would not only
allow for concentration of information in one single place but also for integration of every
scouting professional involved in the recruitment process. Thus, centralization allows for
better flow of communication between departments and easier transfer of player statistics,
videos and other data along the scouting process.
Data Analysis in Football Scouting
In addition to the centralization, results show that Predictive Models are desired. This
sub­category concerns Artificial Intelligence Models aiming at forecasting athlete’s per­
formance and evolution. These predictions apply to several moments in a player’s career
that go from predicting how a young player will evolve to when a player’s performance
will decline. As Interviewee 9 mentions: ” through artificial intelligence, to envisage what
is the level that the player can reach and what is the player’s selling point, from when is
the player going to start declining his performance? And so on.” The same interviewee
adds that some companies are already introducing machine learning algorithms and it is
expected in the future that the presence of these models will be massive [70]. This find­
ing is in line with the recommendation of scholars for more studies regarding predictive
modelling in football [35].
According to Interviewee 5, clubs with big ”buying power” [72] will ”develop the models
internally rather than buying a model”. The same respondent adds that at the moment,
the wrong clubs are the ones that are developing extensive departments for data tech­
nology. Giving the example of ”Manchester City, Barcelona and Real Madrid” [72], In­
terviewee 5 argues these clubs are looking for a small number of players worldwide that
can strengthen their already very strong squad. This way, Interviewee 5 adds: ”it doesn’t
make a lot of sense for them to use a lot of data scouting because it’s quite easy for, for
them to cover those 20 players to see who can actually make a difference in our squad.”
Nonetheless, smaller clubs with lower purchasing power are the ones that should be using
data scouting the most given they are competing with numerous other clubs for the same
shortlist of players that ”stand out in data” [72]. Interviewee 5 adds that the reason for
so many clubs using the same statistical benchmarks (resulting in the identification of the
same shortlist of players who perform well in those benchmarks) is due to the fact that
”the game is also following certain standards. It’s very rare that you see teams that are
playing completely different from everyone else” [72]. A suggestion offered by Interviewee
5 would be the development of predictive models accessible to smaller clubs that offered
”a comprehensive customization of data” [72]. Data customization would act to prevent
the identification of the same shortlist of players by many clubs who look at the same
statistical indicators. Customization allows adaptation to the team’s playing style and
looks into metrics that are coherent with it.
Results also point towards tasks in the scouting process deemed as repetitive or labor­
intensive. This information may serve as a starting point for software companies to create
functionalities that match these tasks and optimize the work done by football scouts. This
way, time­consuming and resource­wasting tasks could be reduced by the usage of soft­
ware functionalities. Nonetheless, repetitiveness tasks are not synonym with tasks that do
not add value to the process. It was noted by several interviewees that the repetitiveness
of tasks means the task is more meticulous and detail­oriented.
As can be seen in Figures 4.19 and 4.20, Tasks that fall within the Laborious sub­category
were identified as being the writing of scouting reports, physically attend and watch football
games and watching videos of games and/or training. The main tedious aspect found
when watching videos identified was choosing a suitable video to watch that matches the
scout’s expectations.
As Interviewee 4 mentions: ”overall video providers, they’ll only code the starting position
for a player, but they will never code if a player moves position”. In this case, when the
viewer is considering a player for a certain position in his team, he searches for videos
where the player is playing that position. However, throughout the game players may
change position and the video settings do not reflect positional changes.
Data Analysis in Football Scouting
Therefore, the scout may end up watching a video that offers little insight on what is the
player’s performance on the intended position. This concern is shared by Interviewee 5
who affirms: ” the biggest issue is the platforms that don’t have player tagging. So you’re
forced to sit and watch 90 minutes”. One solution suggested by Interviewee 4 consists on
the video­tagging by the software providers of positional changes and substitutions during
game videos.
One other concern were the metrics and scales provided by software products to make
scouting reports. Interviewee 6 adopted his own metrics due to the fact that, in his words
”the scales that they, uh, they give us to make reports are not good enough to me.” Next,
the Tasks that fit in the Repetitive category were the writing of scouting reports and the
watching of videos. A tedious aspect mentioned is the method of reporting. Interviewee
5 described the process as long and at times not comprehensive. As Interviewee 5 puts
it: ”It’s not very quick, and you can easily get caught in a circle of trying to explain stuff
that you really need to see, visualize.”.
The noncomprehensive aspect of reporting is even more significant taking into consider­
ation reporting is the part of the scouting process where ”there is the biggest margin of
error” [72].
Although repetitive, both video watching and reporting are considered valuable tasks,
necessary to the scouting process. Particularly, viewing a video several times allows the
scout to observe aspects that had been previously missed or undervalued as Interviewee
9 states: ”We end up having always a more complete and detailed observation of the
Degree of Digitalisation of Football Clubs
Results show that football clubs are dependent on Spreadsheets and Excel files. Although
these data formats are regarded as digital, more technologically advanced options are
available as is the case of SQL databases. In fact, the tendency is for football clubs to
transition from spreadsheets to SQL Databases integrated with software tools, allowing
for more practical access to data visualization and reporting tools.
Nonetheless, as can be observed in Figure 4.18, the most common data format for storage
is Excel Sheets followed by Database. Database sub­category enclose not only SQL
databases but also cloud storage tools and the Scout7 storage feature depending on the
organization. Scout7 is a web application that ”offers three different services to not only
help with scouting but also improve the video databases for the clubs as well as provide
tools for training and player development.” [83]
Besides, SQL is named Structured Query Language, was developed by IBM and ”is the
set of statements with which all programs and users access data in a database”. [84]
Essentially, it is the standard for programming languages when it comes to relational
database access and management.[84]
Moreover, Databases are employed combining them with other formats such as Excel and
PDF as Interviewee 7 mentions: ”databases are then converted to PDF.” Python Scripts
and R [85] are storage formats complemented by Excel DataFrames as mentioned by
Interviewee 3: ”to download it in Excel and after that we want to use it in another apps
such as Google Collab where we use Python Code or R.” [66].
Similarly, PDF and Word formats seem to be complemented by the prior use of Excel.
Excel is used to ”convert all the information (...) and send the database” [74] while ”the
Data Analysis in Football Scouting
scouting reports and everything else is in PDF and Word document.” [74].
This is due to the fact that Word and more specifically PDF are universally compatible,
allow for reduction of file size and easier readability. To illustrate the readability of these
formats: Interviewee 1 affirms Word and PDF are ”for the other guys who are not IT
perfect”.[74] Videos store footage of players together with ”statistical data and the visual­
izations” [82] while PowerPoint functions as a reporting tool [70].
Therefore, the dependency on Spreadsheets has to do with several factors: the prone­
ness of scouting professionals on database management, the integration solutions offered
by software platforms allowing Spreadsheets to be uploaded and the compatibility of Ex­
cel and PDF formats. Although this dependency is observable, the tendency is towards
the adoption of other formats such as SQL.
The Market of Football Scouting Software
Results show which competitors exist in the football scouting software market and which
ones are the most common among the respondents analyzed. As observed in Figure 4.15,
the most common software used were Wyscout [86] and InStat [87], both data providers.
The categorization of software made it possible to distinguish among several different
types of software: Data Providers which are the most commonly used, Video Editing
tools mainly used by video analysts, Data Visualization tools such as PowerBI [88] and
Tableau [89] which are not specifically made for football scouting and Internal platforms
tailor made for clubs and organizations who perform talent scouting in football.
User Perception of Software functionalities
Several takeaways were drawn from one­on­one interviews. Among these, important
findings regard the user perception on current software functionalities on the market. The
average usefulness was computed for each functionality. As shown in Figure 4.10, the
most useful functionalities were Database, Scouting Reports, Video Analysis and Shadow
Teams. According to interviewees, these functionalities are central and the most important
to scouts on their daily tasks [82] and hence their high rating.
The functionalities considered the least useful to scouts in their daily tasks were Team
Management and AI Generated Statistics. The functionality of Team Management is
available in several software applications. Nonetheless, it drifts away from the usual tasks
involved in talent identification and falls within the responsibilities of other professionals
such as chief scout [80]. One explanation for the low rating of AI generated statistics is
the decontextualization of these indicators considering the team’s playing style. An inter­
viewee stated ”data will always be an effect of what the player does and what the player
does and what the player can do is two different things.”[80]. This illustrates that certain
statistics in data will be favoured according to the team’s philosophy. For example, ” a
dominant team in a second division playing possession based football, (...) always want
to build from the back (...) the center backs will always have high, uh, passing metrics be­
cause they will play a lot of passes.” [80]. In this case, some indicators will be inflated and
will stand out among statistics. However, these indicators are not portraying the player’s
ability but are illustrative of a team’s playing style and are not good metrics for evaluating
the player. This is contradictory with the statement made by Interviewee 5 who pointed
out the benefits of implementing customized data analysis for scouting purposes [72].
It was found that Match Analysis was a common consequent among the most significant
association rules. Furthermore, the combinations of functionalities that had a positive
Data Analysis in Football Scouting
influence on this feature (antecedents) were (Opponent Analysis, Database and Player
Comparison); (Scouting Reports, Team Management and Shadow Teams) and (Opponent
Analysis,Player Comparison). This can be consulted in Figure 4.8.
Despite having reasonably positive ratings of usefulness when considered separately,
the antecedent groups of functionalities were not considered very useful by the majority
of users as can be seen in Figure 4.11 revealing that the causality established between
functionalities by software manufacturers when mixing manufacturers in their products is
not regarded as very useful by the end user. Therefore, certain patterns of functionalities
(marked by the association rules in Figure 4.8) were identified but not regarded as very
useful by interviewees.
Nonetheless, when looking at the most frequent groups of functionalities across available
software, the usefulness of such bundles is regarded by a significant amount of intervie­
wees as highly useful as shown in Figure 4.13.
Moreover, analysis from one­on­one interviews shines light on what problems current soft­
ware holds as can be seen in Figures 4.21 and 4.22. The Reliability sub­category reflects
the most frequent concern regarding the flawed reliability of the data available. The con­
cern was voiced by 6 out of 13 interviewees. The cause for unreliable data resides in the
form of collection. For example, video­tagging done by humans has an error margin in­
herent to mistakes made by the video­tagger. Similarly, video­tagging done by automated
models also has an error margin inherent to the model. According to Interviewee 13, the
error margin of the software decreases the higher the training of the data collector or the
automation of the method [79]. A scout sent to a game can register ”90 %” [79] of player
actions while if a ”computer goes over it, they may get 95 %” [79].
Moreover, software data providers are aware of the error margin of the data they supply.
The entities behind Leagues and Competitions collect data of their own. For example, the
spanish La Liga has an ”ID Department possessing cameras around the pitch, they track
their own data.” [66] Benchmarking is established in relation to data gathered by official
leagues and institutions regulating football. As mentioned by Interviewee 3: ”They used
to compare their margin of data with the official leagues and most of items are 2 % , 3 %.”
According to Interviewee 5, there is a connection between the age groups in football and
the degree of reliability of the data provided. This interviewee gives the example of the
danish youth national team: ” I had a trial at one point, uh, on InStat for 14 days and all
of the information they had on the Danish youth national team was incorrect.” [72].
Furthermore, the reliability of the data providers varies among companies. As Interviewee
6 points out: ”I always thought that InStat and Wyscout were among the least consistent
data providers (...) I see big differences between their expected models of Wyscout and
InStat. And let’s say Opta or StatsBomb”. The difference in reliability of data providers
reflects on the price of each service. More accurate data costs more and is not accessible
to the budget of all clubs as Interviewee 12 mentions: ”I, I go to sleep, uh, dreaming about
using StatsBomb data at times. Uh, but we cannot afford this. This is, uh, something for
the big, big clubs.”
The second most frequent concern is related to coverage of the data supplied. The dif­
ference in data collection offered in the most popular leagues (with the highest revenue­
generating clubs) compared to less popular leagues reflects in the availability of statistics
of games played in those leagues. According to Deloitte Sports Business Group the top­
10 revenue generating clubs in the World were located in Europe [90]. On the one hand,
the most revenue generating clubs are in Europe so the focus on data collection from soft­
Data Analysis in Football Scouting
ware manufacturers will be in the leagues where these clubs compete in. On the other
hand, this leaves clubs in other regions of the World where football is not a high­revenue
generating sport without data coverage as is the case of Africa, for example. As Inter­
viewee 12 states ”there’s a huge potential for, for better video quality material in, um, in,
especially in the continent of Africa.”
Lastly, ease­of­use was pointed as a limitation from software as well. This was considered
less significant than the previous two sub­categories due to its lower frequency. The
main problem pointed out in this area was in the repetitiveness of a particular software
functionality that allows for the export of PDF files from the online platform [91].
Barriers to Software Adoption
Results show various barriers to the adoption of software. Such factors pose as obstacles
to the mass adoption of football scouting software, slowing the trend of digitalisation in
football and more specifically the player recruitment process. Findings on Barriers to
Adoption may be found in Figure 4.17.
Cost was the most mentioned factor detracting clubs from using scouting software. This is
a recurring problem when the subjects are smaller clubs when compared to the clubs in top
leagues who generate considerably more revenue. According to a Report commissioned
by UEFA, 47 teams hold 60 % of the revenue produced by the 720 clubs from a total of 55
nations analyzed [92]. Tight budgets impose more constraints on spending and therefore
scouting software becomes less of a priority. As Interviewee 4 mentioned: ”(...) what’s
more useful to you as a small club, like in, in the national league, for example, to have a
nice bit of software or a few more people who can watch a game for you and actually give
an opinion.” [91].
Lack of Professionals was mentioned on a single occasion. However, the most com­
mon scouting data providers as is the case of InStat and Wyscout are tools in which the
most basic functionalities involve searching and filtering players and game data which are
functionalities that do not demand deep knowledge.
The absence of professionals is referring to more cutting­edge software involving data
visualization tools and machine learning algorithms. These tools demand deeper knowl­
edge in data analysis and data science. Traditionally, football scouts have a background
in sports science and football: many of them are former players [7], so additional profes­
sionals are required to dive into data science and analysis methodologies.
Data Coverage also appears as a barrier. According to Interviewee 5, the issue with lack
of data or incomplete data is more problematic in youth football: (When talking about
data providers) ”a software that can actually help me out in the markets that I work in. Uh,
so, so the Danish youth leagues are covered by video, but it’s not on Wyscout. You can
not buy access to it and France is simply not covered.” [72].Seemingly, France is a top
producer of footballers.
According to Poli et al: France has the third largest quantitative contribution in the World
for production of professional footballers with 1740 across 132 leagues worldwide [83].
Therefore, it remains unjustified why scouting software does not provide data for the youth
sector of such a country.
The reason lies in the contractual context of the young players, as said by Interviewee
5: ”especially when we talk to younger age groups, is that the clubs have typically not
secured the players on contracts.” [72]. The precariousness of the young players in the
Data Analysis in Football Scouting
team, not bound by a contract leads the clubs to be hesitant in sharing the videos and
explains the absence of data for youth teams in leagues where young players do not get
professional contracts early on. Moreover, scouts have to be creative relying on YouTube
channels and local press sites for video data and statistics as pointed out by one intervie­
wee [72].
As seen before, data coverage does not only affect talent scouting in youth teams but also
in parts of the World where football infrastructure is lacking. This way, Data Coverage or
Inaccurate Data appears as a problem and a barrier specifically for scouts looking for
talent in youth football and regions of the World like Africa. Nonetheless, the lack of data
coverage for youth football scouting is consistent with existing literature [13].
The last barrier pointed out was the Generational Gap. Traditionally, scouting was done
through physical observation of games and writing of paper reports. The Generational
Gap factor is still a barrier to utilization of the software from older scouts. Moreover, it
could be argued that it is a common mindset among decision­makers in football ”who
only have one way of seeing the game and do not wish to see their power or opinions
challenged”[56]. Although the employment of new technologies in football is happening
at a fast pace as hinted by the interviewees, reluctance may be found among the ear­
lier generation in scouting. This finding is consistent with the findings of Radicchi et al
who stated that, in the context of italian football, the older age of scouts was a detractor
of software usage [4]. However, the results show barriers to adoption affecting digital
methodologies of scouting far beyond age.
Several limitations arise from the research and the methods employed. On the one hand,
a limitation arises in the analysis performed on available software functionalities because
the size of the dataset is not substantial which results in a variety of functionalities repeat­
ing themselves across the software products analyzed. Moreover, the construction of the
dataset was done by consulting company websites and gathering their functionalities by
the information provided.
Similarly, a number of limitations in the interview process and analysis of answers arise
due to the nature of the dataset. The first one is the size of the dataset. The sample of
interviewees was 13, which is significant when taken into consideration open­ended inter­
views. However, only 12 of these used software in their daily tasks, making the dataset
built from the classifications of functionalities a relatively small number of data points. The
short size of the dataset is also amplified by the fact that some interviewees could not give
a classification on certain functionalities. This was observed due to fact that such respon­
dents did not use all of the listed functionalities. Moreover, there is also the possibility for
habituation bias [93]. This happens when the repetitiveness of questions or similar word­
ing leads respondents to answer uniformly to questions. In this case, this would manifest
in consistently answering with same values of usefulness to the different functionalities
(for example, giving consistent 5 to usefulness in repeated functionalities).
Data Analysis in Football Scouting
The research question aimed at finding the functionalities and data that were missing in
football scouting software for it to be mass adopted by clubs. Qualitative and quantitative
research show that Data on Personality and Physicality were important factors to include
in future additions to software and therefore new methods of data collection on these fac­
tors are important. Furthermore, wider and more reliable data coverage, standardized
performance indicators and predictive models were pointed out as relevant additions to
software. In the case of predictive models, customization is key to make them accessible
to most clubs and not only the ones with high purchasing power. Moreover, the func­
tionalities concluded to be missing are centralized platforms that aggregated data from
different providers (companies).
The research aimed to identify the main factors impeding football clubs from using football
scouting software. Based on qualitative and quantitative research, it was concluded that
high cost, lack of professionals proficient in data science and analytics, insufficient data
collection/coverage in some regions of the World and older age of scouting professionals
are the main factors preventing adoption of scouting software by football clubs.
Similarly, several shortcomings of current software were identified, some of which also
pose as barriers to adoption. The main shortcomings identified are reliability of the data
and coverage. It was found that the best reliability comes at a higher price and thus is
generally unattainable for most scouting professionals interviewed. Uncommonly, data
coverage is a recurring issue when considering youth tiers of football and the continent of
Despite a trend towards adoption of more advanced technologies in Big Data Manage­
ment such as SQL databases, scouts and clubs alike still use PDF and Excel as the pre­
dominant formats to store and manage data due to the small size and wide compatibility
of these formats and respective files.
On a general note, it was found that data providers were the predominant form of used
software. Moreover, it was found that most scouting professionals regard the most com­
mon functionalities (Scouting Reports and Player Databases) and bundles of functionali­
ties found in scouting software as highly useful.
Furthermore, it was found that Player Comparison, Opponent Analysis, Shadow Teams,
Scouting Reports and Player Database (when mixed in different groups) have a positive
effect on the presence of Match Analysis across software. Despite this fact, a majority
of interviewees does not regard the bundles formed by these features as highly useful.
Therefore, it can be argued that attention should be given at the mix of functionalities
included in software when building such products in favour of the most useful and common
bundles identified.
When looking at inefficiencies in the scouting process, repetitiveness and labor­intensiveness
of tasks usually performed by football scouts englobed watching videos and reporting. Al­
though repetitive and at times exhaustive, these tasks were not regarded as not having
value. On the contrary, the value behind these tasks resided in their repetitiveness and
the attention to detail necessary to perform them.
The one­on­one interview analysis was effective at answering the research question. The
Data Analysis in Football Scouting
open and honest conversation with scouting professionals resulted in numerous valuable,
eye­opening factors inherent to the work they do, the context of each league and gave
them an opportunity to express their concerns with existing software. Although the re­
search answered which functionalities are missing in scouting software, it raises a number
of questions for further study. The first question would be the type of price differentiation
practised by software manufacturers. Studies in this matter could open way for more af­
fordable software and thus wider and faster adoption of digital methodologies by football
Moreover, another question that arises is the way in which the lack of data coverage in
some parts of the World affects the talent that remains undiscovered and possible so­
lutions besides sending scouts to these geographies could be researched to extend the
reach of data to these locations. Additionally, the effects of the combination of traditional
scouting methodologies with data as opposed to traditional scouting alone in talent iden­
tification and performance prediction should be studied as to understand at what extent
the non­existent implementation of data scouting in youth football impacts the predictive
power of talent scouts. Finally, the need for clubs to employ more professionals keen on
data science and analytics applied to sports that can make the most of the functionalities
present in software.
The association rules mining algorithm applied to software functionalities revealed effec­
tive at identifying patterns and relations among the features but the output revealed itself
as hard to interpret. Besides, the amount of software products found was short and limited
the range of the association rule mining. Furthermore, the patterns in software function­
alities and the way users perceive them suggests that further studies could address the
way that the product­mix in software is built as to maximize value added to scouting pro­
fessionals and consequently financial returns for the manufacturers.
All in all, the research conducted clearly identified which functionalities are missing in
software, also underlining problems, barriers and satisfaction of software users. Besides,
the research fills a gap in literature that regarded why digital methodologies were not yet
adopted in the world of football and the way that these methodologies fit the scouting
process. The future seems bright for football scouting software, mass usage seems in­
evitable although it will most likely never fully replace physical observation and eye for
talent. Lastly, there should be improvements, new additions and particularly a focus on
the customer so that the transition from traditional to digital may be smoother.
Data Analysis in Football Scouting
[1] Richard Pollard. “Charles Reep (1904­2002): pioneer of notational and performance
analysis in football”. In: Journal of Sports Sciences 20 (Jan. 2002), pp. 853–855.
DOI: 10.1080/026404102320675684.
[2] Justin Harper. “Data experts are becoming football’s best signings”. In: BBC News
(Mar. 2021). URL: https://www.bbc.com/news/business-56164159.
[3] Bruce Schoenfeld. “Liverpool’s data­driven transfer model at Melwood hailed as a
Premier League leader”. In: New York Times Magazine (May 2019). URL: https :
[4] Elena Radicchi and Mozzachiodi Michele. “Social Talent Scouting: A New Oppor­
tunity for the Identification of Football Players?” In: Physical Culture and Sport :
Studies and Research 70 (June 2016). DOI: 10.1515/pcssr-2016-0012.
[5] Snežana Lazarević, Jelena Lukić, and Vladimir Mirkovic. “Role of football scouts in
player transformation process: From talented to elite athlete”. In: 10 (Aug. 2020),
pp. 65–79. DOI: 10.5937/snp2001065L.
[6] Philip Moore. “Scouting an Anthropology of Sport”. In: Anthropologica 46.1 (2004),
Interview Transcripts
Interview 1
• Respondent: Interviewee 1
• Location: Zoom
• Date: April, 7th 2022
Interviewer: What is your department (formal name)?
Respondent: “Player Scouting and Data Analysis.”
Interviewer: What is your role (formal name)?
Respondent: “Player Scouting and Data Analysis and in some cases if a Hungarian or
other clubs ask us, we make some data analysis tools to bring some players that fit well
to that club philosophy or the coach gives us some requirement of what they need and
we start to check it. First of all, from a data point of view and after that the eye test.”
Interviewer: Which league does your team compete in?
Not applicable.
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “What we use is Wyscout and InStat. We check the main metrics of what
we want to find and the requirements and after that we use those platforms to check the
video and highlights. In the mean time, I am the responsible for this and start to create
some algorithm to find players. We don’t buy it, we make it.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: Not asked.
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Data Analysis in Football Scouting
Respondent: “Basically, for me it’s a difficult question. From my point of view, I store it in
SQL database or other data formats but for the other guys who is not IT perfect, I convert
all the information to Excel sheets and send the database in Excel and the scouting reports
and everything else is in PDF and Word document.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: Not answered.
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: “I think we are safe, this is not what we think about.”
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 4
• Player Database ­ 5
• Match Analysis ­ 4
• Opponent Analysis ­ 3
• Player Comparison ­ 4 “Some platforms need to work on their UI”
• Team Management ­ 3
• Shadow Teams ­ 4
• AI generated statistics ­ “Most of the time, it’s a company secret and it is a black
box, we don’t see what is behind, it’s a 3. That’s why we create our own metrics,
we know what is behind those numbers.”
• Video Analysis ­ 5
• Player Registration ­ Doesn’t use it.
• AI generated Player performance prediction­ “This is also a black box, but from what
I see it’s a 4.”
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: . “It’s a difficult question because I like it and can’t say it’s some point of
negative or not. (…) Sometimes, when we receive it it’s not so easy to filter the players,
we have to fine tune the filters and so on and it’s much better if we can create some
sheets or something like that that can be better for our department. We can receive
informations that is much more clear and closer to the reality. I explain you what I mean:
If we receive players from other agencies, of course we receive the best video highlights, if
he shoots two good in the whole season we only receive the good ones. But if something
can measure that ok this is only the good performance highlights and we receive some
indicator or index which show there is some good highlights and some bad highlights it’s
better for us and could help.”
Interviewer: “This information that you receive is from the clubs?”
Data Analysis in Football Scouting
Respondent: ”Yeah, this is the difference between two worlds. If we receive videos form
the clubs, we know they give us all of the information, the negative also. They don’t hide
this informations. Of course some of the agencies send only some of the good things.”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “Sometimes, what for us is more informative is if we receive the player’s
speed, this is what we double check on the video. Ok he made 6 shots and many crosses
but we don’t find the informations that the rating that the speed of this player is 9.5 from
1 to 10. If we receive this information it’s helpful because some of the clubs send us
requirements that they need height and speed and this is what we have to check all the
time: he is fast but how fast?”
Interview 2
• Respondent: Interviewee 2
• Location: Zoom
• Date: April, 8th 2022
Interviewer: What is your department (formal name)?
Respondent: “First Team Scouting”
Interviewer: What is your role (formal name)?
Respondent: “International Scout, I will go through games from England.”
Interviewer: Which league does your team compete in?
Respondent: ”Eredivisie”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “InStat and Wyscout”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”I wouldn’t say so, anyone can access these.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”It’s in Excel, that doesn’t go through me though.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: Not answered.
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: “Probably not, it depends how confidential it is supposed to. If a transfer
was supposed to go through, I wouldn’t feel comfortable no.”
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
Data Analysis in Football Scouting
• Scouting Reports ­ 2
• Player Database ­ 5
• Match Analysis ­ 4
• Opponent Analysis ­ 5
• Player Comparison ­ 3
• Team Management ­ 2
• Shadow Teams ­ 2
• AI generated statistics ­ 1
• Video Analysis ­ 5
• Player Registration ­ 3
• AI generated Player performance prediction­ 1
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent:“That’s probably a me thing, but just watching the games, find out how to
differentiate the players, I use the same terminology quite a bit but I suppose that isn’t the
process more me. The Excel stuff is more boring. Just like general graphs, comparing
players, the stats I don’t really like. Stats of the last few games.”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “I don’t know how they do it but I would like to see something about the
player’s sociability, so like after game how they get along with the teams and referee. I
would like to see that incorporated in some software but I don’t know how they would do
Interview 3
• Respondent: Interviewee 3
• Location: Zoom
• Date: April, 8th 2022
Interviewer: What is your department (formal name)?
Respondent: “My department is the Head Analytics department, it’s relatively new in the
company, we analyze data and fit it to our work.”
Interviewer: What is your role (formal name)?
Respondent: “Head Data Analyst. My role is not only try to define the areas where we will
use data, how we are gonna use it, the kind of service we give to our players but also to
our employees in terms of, let’s call the employees the scouting people and agents. So
we try to see what are their necessities, we try to make reports, market analysis. “
Interviewer: Which league does your team compete in?
Respondent: Not applicable.
Data Analysis in Football Scouting
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “We use Wyscout: video analysis and they also collect data, not the best
data quality but they also collect data and recently we have signed with another platform
called SciSports. Inside that partnership, we have another partner which is InStat so we
are working with those three platforms so they give us the APIs to connect our environment
to their data. After that, we also have different apps that are internal to the employees in
which we upload information, leave our advices.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: Not asked.
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: “Basically we have two types of data, first we connect our data with their
APIs, we can say have it connected to environments in a way that if they upload new data
from the new weekend games, the data changes and if new players appear on the market,
new players on also appear in our platform. After that, we want to work with that data from
our PC, we have to download it in Excel and after that we want to use it in another apps
such as Google Collab where we use Python Code or R. Also we use it and obviously for
that we pass that Excel DataFrame to csv.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: “Look, let’s say that the official data providers maybe there are, for example
the Spanish league have their special ID department where they with their own cameras
around the pitch they track their own data. That’s the official data. At one point, all those
data providers, companies that also collect data, some of them use the data source of the
leagues and some others collect data with AI where just with a video they extract data
from there. When you ask a provider, how can I be convinced that your data quality is a
good one? They used to compare their margin of data with the official leagues and most
of items they 2 % , 3 %. For example, if this provider tells me this player gives out 20
passes per match and I go to another provider they tell me the player gives out 21 or
18, they are really close. Maybe I can trust this one because their process is better. All
the data is valid but for me this one is better because it gives this detail better than the
others. But it’s a way, that comparison in between the official and the way the providers
collect and after that the margin of error that’s the way for me to be convinced they have a
good quality. Or for example, Wyscout is a good provider, you have the chance to see in
video where they are taking you in data. For example, if I can see this player gives out 20
passes per match and I click to see those 20 passes, maybe I see that in those records
there are not 20 passes, maybe I see they give the ball just in terms of defending. At that
point, you see that’s not the best one but there are cases and cases between the data
providers. Some are best and some are bit worse.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: “In the company we have an internal app, people ask you software engi­
neers and they try to work for us in that point, it’s something that is difficult. But that is
not, at any point some hackers might… but I feel secure, haven’t had any problem.”
Data Analysis in Football Scouting
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 4
• Player Database ­ 5
• Match Analysis ­ 4
• Opponent Analysis ­ 5
• Player Comparison ­ 5
• Team Management ­ 5
• Shadow Teams ­ 4
• AI generated statistics ­ 4 ”basically because in football this area is new and they
need to explore more, have more info in contrast, see if the conclusions are valid”
• Video Analysis ­ 4
• Player Registration ­ 5
• AI generated Player performance prediction­ 5
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: ” If you are in that process that you see a lot of games and videos it is
repetitive but if you don’t do that you cannot go forward in for example select a list of 20,
30 players that fit with your team. This is the most repetitive but all are important, it is like
a machine that starts here and ends here. You can’t say I’m not gonna see 20 games per
month and see the data. In my opinion, you can’t go for a player just for data or just go
for the matches, you need to contrast with data so it is the way that mix.”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: ““Here are some interesting points. It’s true that we have one company that’s
called SkillCorner, I think it’s British, they give out eventing data but they are specialized
in tracking data. Right now, companies need to figure out the way to collect that data, it’s
really important, I mean when you go to 90 % of the data providers in the market, they
can give eventing data, when you go for tracking data maybe 1 give that data with quality.
Nowadays, it’s important point in scouting, there is a lot of demand, football today is very
physical, where physical is highly valued , highly rated. When you consider a player, you
have the tactical point that you can see in the software and video, you have the technical
part that you can see in video and data but the physical you don’t have the intel. Also,
try to measure the psychological part. It’s really difficult but I think we have tactical, we
have technical, are trying to cover physical and those are 3 points most important to rate
a player. But there is always a psychological component that for example: a player has
a really strong mind or is a player that gives a lot of problems. It’s difficult but in the end
in my mind, I think I am not a specialist in this or whatever but feelings of players are
byproducts of our internal organism. There is a biological process that makes that you
have that feeling and in the end that is measurable. It’s not that simple, you have to find
Data Analysis in Football Scouting
a way to measure it but it will be another plan that the scouting software can improve in
that part.”
Interview 4
• Respondent: Interviewee 4
• Location: Zoom
• Date: April, 11th 2022
Interviewer: What is your department (formal name)?
Respondent: “Uh, so I work the recruitment department, um, at oxygenated. Yeah. Yeah.
So recruitment, recruitment, scouting, same thing, but yeah, that’s basically”
Interviewer: What is your role (formal name)?
Respondent: “So I’m a recruitment analyst, so, um, I’m recruitment analyst slash scout
kind of. So I do video data and live games.”
Interviewer: Which league does your team compete in?
Respondent: ”So, yeah, when we’re in league one, so it’s the premier league champi­
onship league one. So the third division in England.”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “Yeah, yeah, a lot. So we use, um, well it tells us like software, so we use like
Sportscode to clip a video, um, for strike, may kind of highlight packages for our manager.
Um, that obviously they’re the data providers, you know, like wife’s got an InStat, we
use them, uh, to get information and to get video on players, but then probably for like
reporting purposes, we use Scout7. So Scout7 just allows us to kind of use templates,
um, that we’ve developed for specific positions. And then all of our Scouts have a login
who can then access that and fill out the template and write their thoughts and plans kind
of good way of tracking. It allows us to kind of go back, have a far we’re part of Scout7
account and kind of see every single report we’ve ever done on, on a player.”
Interviewer: So it’s a platform that allows you to make reports and store them.
Respondent: Yes. Yeah. So, so, so, so on Scout7, I think it’s, I think, I think Bob Terminix
check up to earn it and they, they basically, yeah, you just kind of go in and select a player,
um, you know, Lionel Messi, and you say, okay, cool. I saw him play as a wide player
against this club. And then though, when he said like wide platform, it will send us a
template that’s for that issue. So then you then just fill in, so that certain traits we look for
for a right­back, but we don’t necessarily look for in a, in a center forward. So we’ll kind
of give them a score based on that. And then for each of those traits and then write a
conclusion of where we kind of think they are in terms of their ability, if they’re here or if
they’re at the tire level.”
Interviewer: And you also give them a certain, a certain score in each criteria.
Respondent: ”Yeah. So like, so I will give him a score, his pace, a score for his strength,
a score for his passing range, for example. Um, and then would eventually give him like
a, the top level we think he could play at now. So like where we think he is today, but also
where we think you’ll be in a few years time.”
Data Analysis in Football Scouting
Interviewer: And, uh, that, that, that potential, uh, that measurement is done in, uh, how
in an index of ABC.
Respondent: ”Uh, yeah, so we kind of, kind of, so, so that the find the potential and we
basically just have it benchmarked the leagues that we could realistically be in. So at our
level, you know, we could in theory go to the premier league, but we also in theory could
go the other way. So we kind of just, we benchmark them on the middle level of a league,
the top level of the league all the way up to the championship. And then we say, if they’re
above the top level championship championship, we say that premier league player. So,
so it goes premier league, top champ, middle champ, top league, one middle league, one
top league, two middle league, two national league then below. But then, like we just say
like below where we, we would have a look. So that’s kind of how, how we do it.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”Um, I guess cost would be the big one. So like, you know, Scout7 is really
useful, but I don’t know how much exactly it is, but I know it’s expensive. So that even
a club, a little bit smaller than us, but still professional in might not be the expense that
they choose to go for. They might choose to go down another route for the expense.
But I think, I couldn’t imagine not having the history because I think the history is really
important when, when you know, over some, uh, over one transfer window, you can see,
we watched a player two years ago. What, how they develop now, if we maybe go back
in for them, at least you have that history. But I think if you’re doing just like, cause some
clubs, they might just email a word document or a report it’s really hard to store that and in
a logical way, and in a way that’s easy to access, but you know, I joined the club two years
ago, but I can go see a scouting report from five years ago so I can see, I can see the
progression in a play. I can see how many times we looked at the player. Um, so I think
cost would be the big one. Um, I think they are like ways you can try, like, you know, make
a low budget version of these sort of things. But I think, yeah, cost cost is 100 % going to
be the biggest barrier to access. Um, I think if every club hadn’t had City’s money, every
tool, but, but, but for the reality for most clubs is, I don’t know how much this is again, but
like if you take something like, um, like a data provider, like Statsbomb, the cost of that
for the year, it could be the wage you pay for a few more part­time stuff to watch videos.
So what’s more useful to you as a small club, like in, in the national league, for example,
to have a nice bit of software or a few more people who can watch a game for you and
actually give an opinion. So,”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”It’s in Excel, that doesn’t go through me though.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: Not answered.
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: “Um, no, but I I’m sure you’ve, you know, if you, if you ask me these ques­
tions, I’m sure you’ve seen it. The Scout7 was the software that a few years ago, Manch­
ester city and Liverpool had an argued how to court case over was because a scout had
left. I think scout had left Man City to go to Liverpool, but still his had his Man city login.
So he, he logged in using his Man City login and could see what City will we’re watching
Data Analysis in Football Scouting
or that will strike out. I can’t know the full story, but I know it’s one club, another club.
Um, so other than that, not really because, um, it would, what, what I think if the club
managers that profile well, and just like, let’s say, if I leave the club, if the club is, is, is,
is, does the right day, which is to remove my profile, then there’s no worry, there’s no
consent. But if the club is, you know, uh, sloppy and they don’t, it leaves them open that
sort of thing. But in terms of just like another additional way of accessing it, I’m not that
concerned. I think that’s a good software. They, they take it down for maintenance quite
a lot, which I assume a lot of security updates. Um, but I don’t think there’s too much of a
consent of people like hacking the account to see who was scouting because everyone
was, everyone’s got similar players.”
Interviewer: In the study made of available software, I identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 5
• Player Database ­ 5
• Match Analysis ­ 5
• Opponent Analysis ­ Not answered.
• Player Comparison ­ 4
• Team Management ­ 3
• Shadow Teams ­ 4
• AI generated statistics ­ 4
• Video Analysis ­ 5
• Player Registration ­ 4
• AI generated Player performance prediction­ 3
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: “Um, repetitive. I would say it’s, there’s a lot of, um, there’s a lot of time
spent trying to figure out which is a good match to watch of a player. Yeah. So, you know,
for example, like, like overall video providers, they’ll only code the starting position for a
player, but they will never code if you move. So I literally just got off watching a game
where the guy started in center­mid, went to right­wing and then went to left back. So
going into that as a video, I didn’t know. I thought he was thinking center back, so I was
ready to watch him as a center mid and then Sally’s right­wing and then left back. So
I like it that tells me something about that, but it’s not what I wanted out of rather just
watch the game, play center mid throughout. So I think there’s a lot of wasted time trying
to find a good game and a lot of it’s because of position. So if I, if, if someone tagged
any changes, like maybe after every substitution they tagged what positional changes
happened, it would save me a lot of time because I’d know like, okay. Send to me. And
he went to right­wing. Okay. I can watch that. But if he’s changed positions three times,
it’s kind of hard to like, say he’s a good left back, but a bad center mid. And it’s yeah. It’s
been messy.”
Data Analysis in Football Scouting
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “Um, so I think a similar thing, I think trying like getting a metric of how good
a performance or how bad a performance you watch compared to the average. So you
watch a player. If you go to a game, for example, it’s the first time you’ve watched a player.
You never know if that’s his best performance or his worst performance. You hope after
you watch them a few times, you can find the middle ground, but it would be quite useful
to know if I went to watch a player this last weekend and he had a great game, but if it’s
his best game he’s ever had in his career, that I should maybe think, okay, he might never
hit that again. He might be slightly below that or normal. So a way of kind of seeing what
his average performance is, I don’t know what metric you choose, but a metric that kind
of shows his mid­level performance. Cause that’s what you really are looking for. You’re
not looking for as best or as west. You’re trying to find out what he is on average.”
Interview 5
• Respondent: Interviewee 5
• Location: Zoom
• Date: April, 21st 2022
Interviewer: What is your department (formal name)?
Respondent: “So, um, I am working within the youth scouting department of Ajax Ams­
terdam. Uh, we have an international department, uh, where we are roughly eight people
working. And my focus areas is Denmark and France. Um, so I’m what you would call a
satellite. So my job is to report back from these markets on players, aged 12 until 18, 19
years old. And then when I see something that I think is relevant for us, my job is to make
sure that we are aware and we will send additional Scouts from Poland to come in and
see. And then of course also to be the first point of contact within the market for agents
or whatever, uh, is, is relevant.”
Interviewer: What is your role (formal name)?
Respondent: Previously answered.
Interviewer: Which league does your team compete in?
Respondent: ”Eredivisie”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “Um, currently I don’t use any, uh, like sofa, but I will be using software. Uh,
the issue, um, right now is that there don’t doesn’t exist a software that can actually help
me out in the markets that I work in. Uh, so, so the Danish youth leagues are covered
by video, but it’s not on Wyscout. You can not buy access to it and France is simply not
covered. So the way I use video is actually I have roughly 20 sources, which is everything
from agents to YouTube pages, to local press sites that, uh, somewhat cover specific
and strategic important regions for us in, in France with video and in Denmark, uh, I am
able to cover everything live, uh, but right now I’m actually in the process of, uh, potentially
acquiring a piece of software, a completely new company called eyeball, who are covering
France from a central source by video, and potentially also Scandinavia and Spain and
other other countries. But we don’t use it yet, but it’s, uh, a youth centric, uh, video platform
Data Analysis in Football Scouting
purely based on scouting. So no analytics or anything it’s purely based on scouting, uh,
which is really nice, uh, to, to be honest.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”I think one of the biggest issues is that, uh, especially when we talk to
younger age groups, is that the clubs have typically not secured the players on contracts.
When we talk players above the age of 15, 16, it’s not an issue because they would
typically have secured the top players on our contract. So they don’t have really anything
to lose, but especially in a market like France, it’s clear to see that the professional clubs,
the professional academies are to share the video, they use video internally, but they don’t
share it. So there’s definitely, uh, a barrier there that even if the video is recorded and
covered, it’s not shared, uh, by, by any, uh, internal partner. So what you would say or
from people inside the business. So, so in Mar in France, you have to have a different
strategy. And, and then Margaret’s has said, it’s, it’s really covered, but it’s covered in a
way where you’re not, it’s impossible to gain access through ice­cold unless you have a
player’s account or you have a coaching account robe within a Danish licensed, uh, club.
Um, so I think that the biggest, the biggest barrier for me is 100 % the coverage. Uh,
and then I would also say that in, in the industry right now, there are two products that
everybody uses. They use Wyscout, they use InStat. Yeah. So it, it can be a bit of a, a bit
of a case to, to try and improve why you should buy an additional piece of software. Uh,
the issue though for me, is that both InStat and Wyscout are very incomplete on youth,
uh, and they don’t really focus on youth and it makes sense budget wise because the big
budgets are on first team scouting, the big investments on first team scouting. Uh, so,
so it makes perfect sense, but that’s also why I, I would say that you have Scouts are
typically the more creative in the scouting world, because we know we can’t work on the
same central filtering, uh, as, as the first team can do. So we have to be much more out
there. We have to talk to a lot of people, we have to be structured in the way we analyze
the market. Um, and, and, and the things,”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”We use, uh, we use a product called pro uh, pro soccer data PSD, and it’s
really an industry standard, um, within the club. Um, I have worked both FC Copenhagen
and now at, at Ajax and at FC Copenhagen, we mainly used it for keeping track of like
training sessions and matches and who attended what and blah, blah, blah, where at I
actually, we actually do the full reporting within PSD and that it doesn’t matter if you’re
first team scout or you a volunteer youth scout. We report in the same format. We use
the same a scoring system, and it goes into the same database. So that’s actually a
quite nice, that also means that I never hand in a handwritten report. I always use the
same format. And then we have a video format. We have a live format and we have an
individual specific scouting format, depending on what type of report we are. We are doing
everything is online, everything is digital. And I also create my own schedule within PSD.
So I would typically create my monthly schedule, which games are my attending. And then
afterwards, I go in and report on the same match and it’s handled centrally, uh, through
experts that know nothing about football, but they know a lot about the it infrastructure of
a PSD.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Data Analysis in Football Scouting
Respondent: ”Yes. And that’s actually what I mean when I say they’re incomplete. Uh,
I remember I had a trial at one point, uh, on InStat for 14 days and all of the information
they had on the Danish youth national team was incorrect. And, uh, basically they had
players that might’ve played two 17 national team football, but it’s three years ago and
they are our first team players and louder. So it’s very incomplete. And if you don’t know
anything about the market, you could easily get fooled. It’s the same in France. France is
a big market. France is a very unique market in terms of youth scouting. So if you don’t
know the system, then you don’t know how to actually report on how to cover the market.
So looking at Wyscout and the way they do it, it’s incomplete. And a lot of the data is
actually false. The, they cover a tournament called the (inaudible) and they listed as a
United team, but it’s actually you in tournament. So some of these small things that just
make me not really trust that they know what they’re doing, and when I don’t trust them,
then I don’t trust the data they provide. Uh, so it, so when, when, when it’s like that, it’s not
a scouting tool for me, it’s more of a library. So I already know what I’m looking for. When
I enter by Wyscout, where are, what I would like to do is actually be able to use a piece of
software to identify new players. And you can only do that if you know that the coverage
and the information, the data, the stats and things on point, and when it’s men, it’s not
complete, you can’t really use it for anything. Also an example of France, I think they cover
three clubs. I think they have three clubs. You have 17 where they get video from, uh,
but there are 86, which means if you do a, a player search and you you’re looking for lift
left backs or wingers, or send up X, you get three, and that’s nowhere near the complete
picture. So it was useful. It’s useless, basically. You can’t get any information from it and
you will always have to, to do 85 % of the work outside of the platform. Um, so that’s why
I’m quiet. I’m kind of excited about eyeball eyeball. It was able to develop my personal
friends, but, but, but developed by people that have worked in the youth scouting industry
and are trying to solve the issue with scouting platforms for youth, uh, because we also,
we don’t need 200 data points for, for youth player. It’s not necessary because the risk is
not as big, but what we need is some other type of data and they are building a platform
completely tailored to that.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: Not asked.
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 4
• Player Database ­ 5
• Match Analysis ­ 3
• Opponent Analysis ­ 4
• Player Comparison ­ 3
• Team Management ­ 2
• Shadow Teams ­ 5
• AI generated statistics ­ 4
• Video Analysis ­ 3
Data Analysis in Football Scouting
• Player Registration ­ 5
• AI generated Player performance prediction­ 3
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: . “Um, I think the reporting, the re the way we report is good, but in general,
in the scouting, uh, industry, the way we report it and FC Copenhagen was not very good.
For instance, it can be kind of, kind of comprehensive. It’s not very quick, and you can
easily get caught in a circle of trying to explain stuff that you really need to see visualize.
So what I actually ended up doing and FC Copenhagen was a lot of my scout reports
were actually visual. It was a presentation with video clips where the way we normally did
it was basically an essay four or five pages of different attributes of the player. The way
we do it in, in, in Ajax right now, and have a mass report is that’d be great, the player on
the performance and the potential, and then a note super, super simple. And then if we,
we think this is a player we’ve seen three, four times that we want to S that we believe
have the potential, then we can grow into a more comprehensive model, which is the
specific scouting where we just describe the player more detailed in all aspects. Uh, but
really for me, it’s the reporting that is comprehensive. It can be, it can be a long process.
Um, uh, and it’s probably also where there is the biggest margin of error. Uh, I, I would
say because it’s not, it doesn’t matter how good of a scout you are when you’re watching
things live. It’s difficult to keep track of 11 players. So you will always have a top five from
each team that you’re following closely. We’re able to actually do a scouting report on,
but then you will have six to eight players per team that you are not actually able to, to
report back on. Uh, and it, um, uh, but typically the, you are required to do so. Uh, so,
so that’s, that’s a, that’s a big issue, but of course, if you’re watching video on a platform
it’s different because you can go back and forth. Um, and yeah, I would say when we
talk video platforms, the biggest issue is the platforms that don’t have play tagging. So
you’re forced to sit and watch 90 minutes, uh, as a scout. And I’ve worked as a scout for
four years now, but already within the first year, it takes you five to 10 minutes. Then you
already know which players you want to keep track of in the game. And then it’s kind of,
it’s quite nice that you can then just watch the game with all actions related to the player.
So instead of watching 90 minutes, you watch 15 minutes of super relevant actions of the
player you already identified within the first 10 minutes. You wanted to see something
extra on. Um, so that, that’s probably what I would say in terms of video, when you’re
forced to sit in what, 90 minutes that can be, uh, that could be an issue. Yeah. But of
course you also get paid to do it. So of course,”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “Hmm. I think, uh, what you were mentioning about the prediction prediction
of, uh, of performance is something that I hear. It’s not something I hear a lot in the
industry, but it’s something I hear among the younger Scouts, especially the data Scouts,
the technical Scouts and what they are however concerned about them is actually an
issue that is happening now in the scouting industry. So five years ago, you were very
modern. If you use data to scout today, everybody, even the smallest clubs have some
sort of functionality or some sort of intern that are able to use data to scout players. That
means that right now, you are not really having an hedge. If your data’s scouting what’d,
you end up doing is most of the teams are looking for the exact same player looking for
the exact same data sheets. So they are all standing out in the, in the data, a good case
is to Striker from Sparta Prague. Right now, he has a 2022 market value of 20 million
Data Analysis in Football Scouting
euros. And the only reason he has that marketability right now is because he’s standing
out in every single data sheet, no matter who is providing the data he’s standing out and
he’s a good player, but his market value is also a reflection of the clubs that are interested
in him and everybody in the world is interested in him because he’s standing out in the
data. So everybody is now people actually talking about, it’s a big issue that we all have
access to the same data. So what I think will happen is that you will have a platform where
you will be able to use your internal, uh, internal, uh, research department or whatever,
to work on your own model, based on the data that’s also what’s happening in City and
Barcelona. And these big clubs actually have their own research departments that are
really not football minds, but they are just data mines that are sitting in trying to work out
their own models. Um, and I would say, imagine you having an AI model that is predicting
that this player is going to be the next big thing from the Danish league. I promise you if
everybody has access to that, then everybody is going for that player. Of course. Yeah.
And that, so, so it’s, it’s a good thing. If you were the first or part of the top or the first 10
clubs that start to work with new this new thing, but right now everybody’s using data. So
I actually think if you want to find some interesting, hidden gem, then you need to look at
players that are maybe not performing on the data side, but they have something else.
Uh, so now you have to reverse it again. And I think that will happen continuously for
the next five, 10 years that you go back and forth. Data is the valuable thing life’s calling
is to, and I’m a data guy. I’m an engineer I’m from, from DTU. But, but I can just see
how it doesn’t matter if I’m talking to boy­girl, I’m talking to Anderlecht, I’m talking to FC
Copenhagen or whatever they all talking about the same three players, because they’re
standing out in data. Um, and that’s maybe not ideal if you have 20 clubs looking for the
same one player and then in the, in the data sheets. Um, so that’s what I, what I would
say is I can see AI, I can really see these performance predictors almost like you look at
the stock market as well, how you’re able to use different, uh, analytics to, to predict the
value of a player based on historic data and also performance data up until then. But I
think, I think you will also see that more and more clubs, especially the clubs with buying
power. They will develop the models internally rather than buying a model that is available
out there that everybody can, can tap into. Um, yeah, that’s at least my, my perspective
on, on things. Um, well, right now it’s actually the wrong crops that are doing that because
man city and Barcelona, real Madrid, these clubs that have big money that are building
these big departments for data tech, realistically, they are looking at 20 players in the
whole world that can actually strengthen that. That’s what, so it doesn’t make a lot of
sense for them to use a lot of data scouting because it’s quite easy for, for them to cover
those 20 players to see who can actually make a difference in our squat, where a club like
FC Copenhagen needs to use data for everything because they are competing with 2000
other clubs in the world, uh, on, on, on the players. So they need to be able to go in, in,
uh, in cheaper markets. So if you’re able to make a model that reverses that, uh, so, so
it becomes available for the smaller clubs, uh, even smaller than a Copenhagen as well.
Then I think there is something interesting because they don’t have the money to build it
on their, on their own. Um, but you will end up in a, in a problem where everybody will
be looking for the same top 10, if you don’t have any customization or a comprehensive
customization of filtering, uh, uh, opportunity,”
Interviewer: There is no personalized approach to data.
Respondent: ”Today. There is, you can, of course you can extract the data, but every­
body’s working on the same dataset and you’re looking good and data, then you’re looking
good in data. So of course there are different things. Uh, you, you can, you can monitor
on, but when we’re talking about, uh, if we’re talking about a Striker and we are talking,
Data Analysis in Football Scouting
everybody’s looking at big strikers at the moment, target men, strikers that can play front
face and back faced. Um, it’s really limited. Maybe there’s 15 to 20 key points that you
will be looking for. Uh, but you have 2000 clubs sitting and looking for the same point.
So if you build a top 10 and you want, this is how I want the player to play, you build your
top 10. You can be 100 % sure of that. There are these 50 other clubs that have roughly
the same top 10 because of the way the game, the game is, the game is also following
certain standards. It’s very rare that you see teams that are playing completely different
from everyone else. And if they do want to have success like Leipzig had a few years
ago, then you’ll see teams, uh, taking up this very forward, uh, high pressure, uh, counter
football, um, dive sick. They actually, that was also based on data. They, they realize
that it’s not about where you win the ball. It’s also about how much power you win the
bowl with. There’s actually determined of, uh, how, uh, how, how great the goal changes.
That’s why they play so smart, so much for seek. So they discover this and data. Then
it became publicly known that this is the thing. And then everybody started to play with
that. Now you’ll see mid table sides in Denmark trying to play like RB Leipzig. It’s, that’s
just the way it is. And then you look for the same type of players.”
Interview 6
• Respondent: Interviewee 6
• Location: Zoom
• Date: April, 13th 2022
Interviewer: What is your department (formal name)?
Respondent: “Well, I work, um, I have, uh, I work, uh, free, uh, freelance, but, uh, I worked
for two clubs, one in Poland and one in the French League Two, and, uh, um, uh, video
Scouts. So, uh, for the first team of the two clubs, uh, I watch, uh, games and, uh, I make
report on, on pails, uh, to, uh, for the scouting team of, of that clubs.”
Interviewer: What is your role (formal name)?
Respondent: “International Scout, I will go through games from England.”
Interviewer: Which league does your team compete in?
Respondent: ”First league in Poland, and a second league in France.”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “Yeah, the software I use most is Wyscout. And I also use a StatsBomb.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”Uh, well, a few years ago it was still not used by everyone, but, uh, I think
that now, uh, everyone is using it and, uh, we thought the club who, uh, which were in
that, that they will, were there was using, uh, the source. Those software was, um, uh,
very competitive. So I think that now, uh, no ones, uh, can say that they are not using it.
I don’t think there is any barriers now.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”Uh, for my part, I use a Excel.”
Data Analysis in Football Scouting
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”Uh, sometime, uh, you can see that the, where you, you think you’re gonna
watch, uh, someone, uh, shots and, uh, there is few, uh, well, not in the good a part, so it
could be, uh, sometimes a little bit a problem in the system, but, uh, when you use it, uh,
there very often, uh, I think that, uh, the, that, uh, on one game, it could have a one or
two mistakes, but if you do, if you watch, uh, that as for all the season, to me, that looks
pretty, pretty fine. Uh, I have no problem, uh, with the software, I trust him, I trusted the
software and, uh, and that, uh, I feel confident when I, uh, watch, uh, that does on it.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: Not asked.
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 3
• Player Database ­ 5
• Match Analysis ­ 4
• Opponent Analysis ­ 4
• Player Comparison ­ 2
• Team Management ­ 3
• Shadow Teams ­ 3
• AI generated statistics ­ 2
• Video Analysis ­ 5
• Player Registration ­ 3
• AI generated Player performance prediction­ 4
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: “I’m not using, I don’t, uh, not using the report, uh, player on, um, on the
software. I’m the, I’m doing it myself because I don’t, uh, they don’t look, uh, uh, well, uh,
well enough, we don’t have any, uh, the, the scales that they, uh, they give us to make
reports are not good enough to me. So I don’t know if it’s repetitive, but, uh, uh, I’m not
using their, uh, their report, uh, program. I’m using it myself. Uh, I have my own, uh, report
scales and I’m not making the report on the software. I don’t know if it’s good enough to,
I got my own scales and I feel that they I’m going quickly with that. And I’m not using the
rip up the scales, uh, the report scales of the Software. I don’t know if it’s a good answer
to you. ”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Data Analysis in Football Scouting
Respondent: “Uh, I will like to have a, um, the focus of everything is doing, uh, on these,
uh, behavior, you know, uh, to watch, to, to have, uh, all the data of what he looks like,
he miss a shot when he’s talking to the referee and, uh, because, uh, that is not, uh, we,
we don’t get that on the software, uh, for now you have to go to the, on the pitch to see it.
So to know his behavior on the pitch, uh, all his reaction about, uh, maybe, uh, for now,
uh, it’s out to, uh, to get that behavior. Uh, I know that, uh, it’s out for the, to have this
functionality, but, uh, really the, the miss missing point, I think”
Interview 7
• Respondent: Interviewee 7
• Location: Zoom
• Date: April, 11th 2022
Interviewer: What is your department (formal name)?
Respondent: “I work in the scouting department.”
Interviewer: What is your role (formal name)?
Respondent: “I am. I am responsible right now for international scouting of youngsters
up to U23.”
Interviewer: Which league does your team compete in?
Respondent: ”First League in Portugal.”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “We have a platform created for our club. I think the provider or the person in
charge is Brandt. You have, I have, who is who made the software. That there is only one
the software in our, in our database, which then consulted three other software, mainly
scouting Wyscout and InStat. Those, that kind of software.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”Not taking away the financial aspect that some clubs may, may not make
it, especially the smaller ones. From Otherwise, I don’t see any other barrier. Nowadays
any club already uses and uses several scouting platforms.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”In databases, databases that can then be reconverted to PDF without.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”Yes, I trust the data, but I think it’s more important to have people who know
how to interpret it.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: Not asked.
Data Analysis in Football Scouting
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 4
• Player Database ­ 4
• Match Analysis ­ 4
• Opponent Analysis ­ 5
• Player Comparison ­ 3
• Team Management ­ 3
• Shadow Teams ­ 4
• AI generated statistics ­ 3
• Video Analysis ­ 4
• Player Registration ­ 3
• AI generated Player performance prediction­ 3
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: “I don’t see what the repetitive stuff always ends up being too much. Even
if we see the same game, we will see different things and I think there is nothing very
repetitive in the area of scouting. I don’t know if it’s answered, but. But I think there are
few things that are repetitive. The opponents are different and the contexts are different.
Everything is different always.”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “The information, how is it possible over there or why can’t we put the in­
formation on the platform of the personality of the players. It’s that part of the personality
and the mentality of the players. I don’t think it’s possible to have in the databases and
with the knowledge and so that was to have. But I don’t think we’re going to get it right.
Fundamentals are fundamental. Or rather, it is important for us, for clubs that, when they
want to recruit athletes, to understand a little bit also the personality traits. But this is a
difficult thing to measure. That is why I find it difficult to decide with conversations, with
researches of other things, to get some information to draw a profile, a personality profile.”
Interview 8
• Respondent: Interviewee 8
• Location: Zoom
• Date: April, 19th 2022
Interviewer: What is your department (formal name)?
Data Analysis in Football Scouting
Respondent: “As I’m working for an agency with only three other people? I mean, I’m just
responsible for scouting. So there is no formal name because there is no department.
Um, formerly when I was at <inaudible>, um, it was the, uh, the recruitment department.”
Interviewer: What is your role (formal name)?
Respondent: “I am responsible for scouting.”
Interviewer: Which league does your team compete in?
Respondent: Not applicable.
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “Yes. Um, various things. Uh, we use Wyscout for video scouting and
some data analysis, and we also use InStat mainly for, uh, data out for the under sixteens
and under 18 Sears, Switzerland. We use tools like, uh, DataWrapper to visualize, um,
because it’s a free tool and it’s easy to use because I’m not a computer scientist. I am not,
I’m a scout, but I use DataWrapper to visualize some stats. Um, formerly at Lausanne,
we did use SciSports that as well, we also have from time to tell me how a, a StatsBomb
account, which is for me fill the, uh, uh, a, uh, the, uh, highest standard, uh, of, uh, data
analysis tools at least, uh, what, what kind of data they provide.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”Uh, I mean, I know some older Scouts who have difficulties using software,
but, uh, that’s the only thing. I mean, generally I think software should be given, um, at
this point apps scouting apps should be a given. Um, I, I, I’m still one of the old school
ones, um, with a note pad and a pen at the games, but if there was, or we have the money
to invest into an absolution, it would be absolutely for it. Um, I think scouting is way is, is
a bit behind the times, at least on the level items scouting. I can imagine that let’s say big
club, like Liverpool, or, uh, like, uh, Manchester United. They do have app solutions, but,
um, is way to let us basically note pad with maybe one or two exceptions.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”Uh, I mean now if they at the, at the agency, I just use Google docs. Um,
our, all our administration is, uh, within Google docs. It’s easy to use for everybody ac­
cessible for everybody. Um, back in Lausanne we were using Excel files, so we didn’t
have really have a, uh, like a scouting tool, use the public database to, uh, to store, uh,
scouting reports and they were in, uh, an Excel and then converted to PDF.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”Yes. I have seen on many occasions that data is wrong, um, simply by
comparing it between other, uh, distributors of data when I had the chance. And I always
thought that instead. And Wyscout were among the least consistent data providers be­
cause they had some mistakes within their data sets, um, channel. I think there are much
better, uh, providers for data that they are just the most accessible ones for Switzerland,
for example, because you can get good data, um, for the big, big five leagues, you can
use FBRef for whatever, whatever there is. Um, but for Switzerland, it is, they are the
most accessible ones, uh, in terms of data and for the case of Wyscout and InStat, they
Data Analysis in Football Scouting
also have the data for the under sixteens and under eighteens, which is the only provider
that has them, which is important to us, but yet there are traps not to fall into. That’s why I
don’t use, uh, cross, uh, uh, don’t cross the data across two platforms. I only use Wyscout
or InStat, not both at the same time.”
Interviewer: And by traps, uh, you mean, uh, mistakes?
Respondent: ”Yeah. I mean like wrong data simply, um, inaccurate data. Um, it also
depends on let’s take expected goals. For example, it depends on the model and those
are not always transparent from the provider, but, uh, I see big differences between their
expected models of Wyscout and InStat. And let’s say Opta or a StatsBomb,”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: Not asked.
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 4
• Player Database ­ 5
• Match Analysis ­ 4
• Opponent Analysis ­ 4
• Player Comparison ­ 5
• Team Management ­ 2
• Shadow Teams ­ 5
• AI generated statistics ­ 5
• Video Analysis ­ 5
• Player Registration ­ 3
• AI generated Player performance prediction­ 5
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: “Repetitive? I find the writing of reports. Um, but they do that value ob­
viously. Um, uh, what does not add value? I mean, you can, you can see value in ev­
erything. I don’t have a specific answer, but repetitive, certainly writing reports, but it’s
essential to do it. So that’s, that’s the biggest grind. Everything else is, uh, basically, uh,”
Interviewer: And when you are writing the reports, are they very long? Are they compli­
cated? There’s no. Are they, there are no templates and then you have to do them all
from scratch?
Respondent: ”Yeah, I do them from scratch. We don’t have a template. We might work
on this, um, for, uh, for later when I have more time. But, uh, there, I think there are about
three to four work pages long at, on all the game on a game. If it’s on a specific player, I
try to keep it within one page because nobody reads more than one page.”
Data Analysis in Football Scouting
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “I mean, you, you brought up the, uh, AI generated potential indicators.
That would be very interesting. Um, other than that, um, can come up with anything at
the moment. Um, I think that AI algorithms will only get more and more important with
time. So I think that’s one thing I would like to see improved and introduced into.”
Interview 9
• Respondent: Interviewee 9
• Location: Zoom
• Date: April, 22nd 2022
Interviewer: What is your department (formal name)?
Respondent: “I am here in two areas here at Vilafranquense, as a scout or team manager.
On one hand, in the part of the scouting department and, on the other hand, in the part of
the sports management of the day to day of the team and the players.”
Interviewer: What is your role (formal name)?
Respondent: Answered in the previous answer.
Interviewer: Which league does your team compete in?
Respondent: ”Portuguese 2nd League”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “Yes, we do. Right now I have access. The club has access to Wyscout
and we do the analysis through Wyscout platform. Or that is, we do live observation and
then we do the observation through Wyscout of other markets that we don’t observe in live
video and other complementary markets, so that the process is complete, so to speak.”
Interviewer: So the software is mainly for edition and visualization of video?
Respondent: ”It’s not for video editing, it’s more for viewing. In other words, let’s watch
a live game. Then, another doubt of Wyscout or other more complementary markets, for
example, France, Spain and Italy. We end up not displacing a scout there. Or rather, we
can go there, but in a more, much more specific view. Already when we are sure that we
like the player, but when we make a raw observation, we end up using the scout to watch
the videos.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”Thus, there are barriers not in the use of scouting software and their careers
by clubs, in what is the bet and investment in the strong scouting sector. Although this
reality has been changing recently, because they are starting to look at scouting as an
investment and not as a cost. Now, apart from the software, it is true that they are. They
are expensive software for clubs, even in a professional context, and maybe clubs don’t
have that sensibility. If you look at to that software as something fundamental, now it is
also up to the scouts at the club to demonstrate that video observation is important, i.e.,
live observation, watching games in person, but it also has to be complemented by video
Data Analysis in Football Scouting
observation. Even what we see live we can not capture everything and it’s always good to
review the game to remove some doubts and this can be done through these platforms.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”Yes, in new cases, we end up saving everything in a drive format depends.
For example, we work with PowerPoint with Excel files, with Word files. We end up saving
the information, be it our complete reports in PowerPoint. It’s our database in Excel and
we end up with a cut of all the information in a drive.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”Yes, because we have access to all the markets in the Wyscout and these
are Opta data. The data is. It’s completely reliable and are worked data. The data we
have here is the data that teams like Real Madrid, Manchester City and so on receive. It
is the data that is available on the platform and it is reliable data.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: Not asked.
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 3
• Player Database ­ 5
• Match Analysis ­ 4
• Opponent Analysis ­ 3 ­ ”Yes, three because four is not so much in context, but
certainly for the coaching staff who also have access to the scout and end up to use
that more in a tactical work aspect and much more. The value they give is more
important than anything, of course, but in the case of three.”
• Player Comparison ­ 4
• Team Management ­ 4
• Shadow Teams ­ 5
• AI generated statistics ­ 4
• Video Analysis ­ 2
• Player Registration ­ 1
• AI generated Player performance prediction­ 4
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: ”Well, it depends on how the scouting process was idealized. In our case,
here at Vilafranquense, I don’t think we obviously have the repetition of the games. But I
don’t think that doesn’t create value. Quite the opposite. We end up having... We always
end up having a more complete and detailed observation of the players, and we end up
Data Analysis in Football Scouting
removing doubts. We also end up seeing other players that when we are observing these
players, so it is repetitive, yes, but in my opinion it always ends up creating some value.”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “I think that in this environment we are moving towards what I was talking
about earlier, the machine learning models of the reality of artificial intelligence, that is,
predictive models for the evolution of players. And I know that there are clubs that al­
ready use and internally developed these models to look at the player and get, obviously
through artificial intelligence, to what is the level that the player can reach and what is the
selling point of the player, when, from the when is the player going to start declining his
performance? And so on. And I think that scouting will move very much towards, that is,
of course, it will continue to have that landfill of sensitivity, that human touch, let’s say. But
from the point of view of what is the entrance of new technologies in practically all areas,
this one will also be no exception. And I think we will... And I think we are going to move
towards predictive models, to be able to help this area of artificial intelligence, to be able
to help or at least give tools for the scouts to have a greater perception. And there are
already companies that do that. Spanish companies are able to compare the competitive
level of the championships and players, compare players with other players of the same
position in the championship and in other leagues, and to be able to foresee the player’s
evolution up to a certain point. Already software companies do that now. I think this is
going to start to be more massively in the market and I think the future is right around the
corner in this sense.”
Interviewer: And while there is data captured by vendors like Wyscout? Do you think
there is any part of statistics that is not yet covered by this type of data?
Respondent: ”Not from what of skin information you get. And they work both Wyscout
and Opta and get that kind of operators that end up collecting the data work excellently
and are able to cover a great deal of information and from what that is logical that I am
aware of. I think there are no statistics that are not collected, at least the ones that we are
used to working with nowadays.”
Interview 10
• Respondent: Interviewee 10
• Location: Zoom
• Date: April, 13th 2022
Interviewer: What is your department (formal name)?
Respondent: “Observation or analysis department is scouting.”
Interviewer: What is your role (formal name)?
Respondent: “I, myself and my colleagues who are usually video analysts, make several
selections. We are usually allocated to one and then we do others as needed. In my case,
I’m in the Under 21s and then I go on to do others. Just now I came from the women’s
U23 to do the women’s U15 next week and I’m doing other functions. It can vary a bit in
the teams. If I give the example of the U21s, I do the opponent reports, video data reports
with the collective and then individual characteristics, also to share with the players. I
also follow up on our players who are eligible to be called up, those who are under 21
years of age, follow­up in terms of playing time, for example. And then I also do video
Data Analysis in Football Scouting
reports. What do you do to the scouting work also of eligible players? Yes, of course not
those who are in the Premier League, because those will already be known. But in the
Second League, if there are foreign players, you can get to know the coaching staff in a
very general way. And then more specific things will appear at the moment. But, those
are the general functions.”
Interviewer: Which league does your team compete in?
Respondent: Not applicable
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “We have two, two major data providers, which is Wyscout and InStat, which
I use Wyscout more than InStat? Because in terms of metrics, some that I’m more inter­
ested in and also more used to using, so to collect statistical data and videos. Those, so
to work the videos, Hudl SportsCode yes, to create the player’s video to work on the data
usually PowerBI to do the visualizations.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”The value, probably the price. It’s possible that it’s a barrier in terms of
clubs. I don’t know what is the financial of the first and second league clubs. I know that
many have these platforms and I am not talking only about top clubs, but then I don’t
know. Would it have to do with the bet of each club and the budgets of each club? But I
think that may be the barrier. I don’t know where I could also say. Lack of professionals
able to deal with it. But it is not something so complex as that. So I think it will really be
more. The budget issues.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”Generally, in my case, it’s all in video, because I include in the video ev­
erything that has the statistical data, the views and the video, because it is all stored
according to the date of use of the video and per player. Then there is also the rest. The
information is placed in Excel, where I then have other information regarding game times
in different competitions. If you have the video already made or not, etc.. So it will be
video and Excel.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”The quality is almost like a spectrum. They are not 100 % reliable. They
never will be. Not least because, by the way, and the platforms assume that? There is
a margin of error always and then each I don’t know what they call it, but each person
who collects the data among people will also see some better than others. And then
there is data also there are metrics and game situations that are difficult to analyze and
sometimes even, I have a hard time figuring out if it’s one thing or the other. So, I assume
that whoever collects the data also have these difficulties. Those limitations are assumed.
Still. I don’t think it’s something that should prevent us from using them, because it is better
to have this data with some limitations than to have nothing, because we are talking about
relatively small margins relatively small margins of error. It’s not from the top down, no.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Data Analysis in Football Scouting
Respondent: “Yes, we can never be sure about that, because it is not easy to have that
assurance. Now we have some concern about the storage of our data. We use an online
platform that is subject to risk, like all platforms then, but we restrict sharing only there so
that the information is not spread to more places. Now, of course that’s it. That’s always
very relative. Truth be told, if that information got out. It wouldn’t matter either. I don’t
think it’s such a relevant thing as that, especially the information that I gather. It’s not
like that, nothing from other world. We still have some caution, but in terms of concerns,
knowing that there’s always that risk. Honestly, I don’t even think much about it.”
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 5
• Player Database ­ 5
• Match Analysis ­ 5
• Opponent Analysis ­ 5
• Player Comparison ­ 4
• Team Management ­ 1 ”That’s one area I will answer, because we’ve been there
and not. In other words, I would even give a higher value. But it’s hard to to be able
to create something that meets everyone’s needs, so it has some utility yes. I don’t
think it’s vital. Personally, I don’t think it’s more important than everyone else. It can
also be. Okay. I may even be biased by my my function, which doesn’t go there so
much. It goes more by the others.”
• Shadow Teams ­ 4
• AI generated statistics ­ 3
• Video Analysis ­ 5 ­ ”Because we use it for that purpose. Not to download games,
no. But to work the games. To make to code the games from this highlights etc. We
use the SportsCode, which is one of the easiest tools to use and also most used in a
generally in the soccer world. And I haven’t seen any better yet. I know which scout
has this function, especially for editing function. I don’t know if it does, and if it does,
I don’t know about it. But I have used, for example, Metric. It is also very good, but
it is not as good as the SportsCode. And so, I don’t do video editing the way scout
does. At most, cut one clip or another. Things loose. But usually I prefer to take the
whole video and use it on another platform. So it is important and not scout as they
have that and I don’t use it. So, then I can’t give it importance anymore. So it has
to do with that.”
• Player Registration ­ 2
• AI generated Player performance prediction­ 5
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: “Yes. I don’t know in terms of the process, which had been all right, all of it
is time­consuming. In my specific case, something that I don’t do is see the players live,
because too much time is wasted when I can have the video of the player. Even though I
Data Analysis in Football Scouting
don’t miss many things I could pull out, from seeing live the relationship with colleagues,
etc. But it doesn’t seem, make up for it. And, so, there I prefer to see on video because
I make a report of. Five or ten players when I asked for one otherwise, because I don’t
see what I could also talk about of data collection, but statistical data, which I don’t do
because some people do it for me that I don’t have to do. But other than that, unless I
misunderstood the question, no, I’m not seeing it. I’m not seeing anything else.”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “The information. She used so much of it. Of course there would always be
another metric that we would like to see. An example of are metrics that can be taken, but
in my case I can’t yet because I don’t have access to positional data in the field, speed,
etc. Injury data, which is still a very taboo thing in soccer. It is not easy to have reliable
information in relation to this. These are things that would be missed and would help a lot.
I don’t see so what other things I could say. What I do already. It already seems so easy,
honestly, that I don’t know if I could add to it. There’s also, say, Skype. In my case, it will be
30 of my work. At the end of the day, greater focus will always go to opponent analysis.
But I don’t know. The only thing was really. To have a site, or better yet, to have the
possibility of cross­referencing information from various providers to have a value, maybe
more, more reliable. There are situations in which it is not possible, because imagining
the InStat key passes is not the same as the key passes of Wyscout. It is information that
could never be crossed, but yes, I’m talking about the height of the players. Sometimes,
get the height of a player, for example, does not match in the different sites where we
search and it is difficult to get a value. And that is the biggest difficulty and that was the
thing about the greatest precision of the information. At the end of the day it was very
much that way. In terms of software, what do we use? Isn’t it? That is, there is not just for
scouting the exam for that purpose. I don’t know if we could do something better than the
providers, maybe using.The problem is that you have one very good thing and another
has another very good thing. And then we can’t cut the information from both because
they were different data. That is, perhaps, the greatest difficulty in having a place where
everything was centralized. It was a place where you could have a lot of information,
because, for example, the scouting for lower leagues in Portugal doesn’t work. You don’t
have time to use InStat for professional leagues. I prefer InStat, so that is the biggest
difficulty. And then what? Of course even on those providers, if we had many of the
options that you yourself mentioned before comparing the player to another having metric
views chosen by us, for example, it would be great to be able to customize the metrics
we want to see for each position of the player. It would be great to be able to filter player
metrics by position they play on the field. It would be great. These are things that no,
there aren’t yet. In other words, I have no knowledge and it may be my ignorance. But
that’s where I think it could help a lot.”
Interview 11
• Respondent: Interviewee 11
• Location: Zoom
• Date: April, 27th 2022
Interviewer: What is your department (formal name)?
Respondent: “At the moment, I am not working. I was in scouting for 37 years. (…) I have
been out of work for 18 months.”
Data Analysis in Football Scouting
Interviewer: What is your role (formal name)?
Respondent: “ (…) go to the ground early, talk to the people, find as much as I could,
intelligence on the local team and then watch the game. And in a lot of ways that was
a lot of valuable information because you could find out things that you couldn’t find on
a database, you could find out what type of player he was, was he a gambler, was he a
drinker, did he train well, did he have a good attitude towards his teammates, was he well
liked within the group and that was very very important.”
Interviewer: Which league does your team compete in?
Respondent: Not applicable.
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “We used in the latter years Scout7 but when I started scouting it was just
paper reports. What I had was a team of scouts in Ireland, they would send the reports
to me and I would send them to the club.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”I don’t think there is any barriers but I don’t think we should use scout­
ing software exclusively. Probably, the best scout I ever met, an older gentleman, didn’t
scribble anything on paper even, he had the information in his head. If he lost his mem­
ory, it would have been chaotic. (…) We need the personal touch still. (…) The situation
with Billy Beane in Moneyball went well… You need to do the due diligence, find out the
background of the player, character, things like that, statistics alone is not enough to go
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”I’m not sure of that, I know my only platform was Scout7. I didn’t have any
experience with any other database.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”It certainly has quality and is authenticated and so on. It’s how you access
the information in the first place. (…) I think information has got to be relevant, has got to
be streamlined but I think that originally you wrote 3 or 4 lines per player and there is 22
plus the substitutes and that’s difficult because you are making notes most of the time. In
the latter days, you just observed the 3 best players in a game which made it a lot easier.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: Not asked.
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
Respondent: Not applicable, respondent did not use software ever.
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Data Analysis in Football Scouting
Respondent: “I think it is inefficient and clubs are not liaising with their scouts on a regular
basis. A lot of scouts feel isolated and left alone. I think all scouts like to be valued,
that gesture from your manager makes you feel acted upon. It’s frustrating that it is not
followed up by the club, that happens quite a lot. A common complaint is that the clubs
don’t check up on scouts enough. You rarely speak to a manager directly. Managers will
send players that they know personally.”
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “You need to invest time on the ground and get to know the player. We all
know very talented players who didn’t make it great because they didn’t have the emo­
tional capacity to that. (…) I think clubs need to take a more holistic approach. Emotional
intelligence: You need to establish, is this a boy that can go away from home, get along
with a landlord or landlady, cope with loneliness, deal with disappointments.”
Interview 12
• Respondent: Interviewee 12
• Location: Zoom
• Date: May, 6th 2022
Interviewer: What is your department (formal name)?
Respondent: “Um, I work, um, at Royal Antwerp, um, a Belgian football club, uh, in the
top flight, um, of Belgium. Uh, I work on there, our chief scout in the scouting department.
So we are, uh, one chief scout, uh, named <inaudible> and then we are, um, I think we
are eight full­time Scouts at the moment, um, working under him and I am one of them.”
Interviewer: What is your role (formal name)?
Respondent: Answered previously.
Interviewer: Which league does your team compete in?
Respondent: ”Uh, we play in the highest, uh, league of Belgium. So it’s the first division
of Belgium football, probably.”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “Yeah. Um, actually our, just our club has, um, um, we have a further
increased the, the, the overall scouting budget, uh, since I arrived. So, uh, we’re using
using multiple sources of information, uh, mainly I’m not sure if you’re aware of it, but
Wyscout is the ordinary providers of video scouting? And we do a lot of video scouting
because we, um, we scout players all over the world. So I would say the, um, the central
platform, we use our Wyscout and we, uh, are mainly given the tasks of analyzing players.
And if we, um, also go to live games, we are always, um, um, collecting the information we
saw and doing the reports on Wyscout. But apart from that, we are using other softwares
as well. Um, we are using some data providers, but that is, um, first of all, it is, um, I’m not
sure if I’m, um, I can tell you which exactly, uh, because it’s kind of discreet, which data
providers we use, but, um, we’re using, um, multiple, uh, data providers on players, but
also in Wyscout, you have the data. So if, if, um, I do my work first as a scout, I will always
collect as much info I can, I will, the player, sometimes I go watch in live, but then I will
also take the Wyscout data and, and, and make my own opinion. But then when it gets to
Data Analysis in Football Scouting
the high point to the chief scout, he will always often go to another company with a more,
maybe more clear data, provide the data, provide station, and then, uh, then when he will,
um, also, so he will take the intern for me, and then he would go to another platform. Also,
we, I must say now that, uh, we have a new technical director, um, who wants to rebuild
some things in the club and he wants to bring more, uh, more speed into the club and he
wants some more, uh, proactive and modern style of play. So you need the dynamic and
speed for players. And then we allow at the moment looking into, um, signing a deal with
a company who uses algorithms to, um, to evaluate a foster players because we have
tracking data on players, but there’s only tracking data at the moment in every league. So
there’s not a, there’s not a, you don’t have a platform where you can watch, uh, tracking
data from, from, uh, all over the world. So for example, if you want to sign a player from
Jupiter Pro League, yet we we’re in Belgium. So we already have that information, but
if we look at, uh, a player in Brazil, it takes a lot of time to get a tracking data from bear.
And that is, um, that is something that we’ve further want to, uh, to improve.”
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”Um, yes, there is. I think that at the moment, uh, there is, um, I maybe would
not say a revolution, but, um, the scouting, um, the way of scouting has been almost the
same since, uh, since football, uh, started in the late 18 hundreds, 18 hundreds. Um, but,
uh, since, uh, since Wyscout came in in maybe I think it was 2005, uh, obviously that, that
the first one that you were able to, to collect information. So, so many more players all
over the world. And I think that was, uh, the first thing which changed the whole industry. I
think that was a groundbreaking in the way that, that you could now watch players all over
the world. But I think that the biggest thing at the moment is the data and, uh, um, that is,
um, that is something that I think because I’m young, I’m 23 and I think that is something
I have grown up, um, studying, um, collecting and enhance. It has been, always been a
valuable source for me. And it’s been, uh, it’s been always, um, close to me to use that
kind of information. Um, but I think that all the guys I work with, they have never done that
in their career. So there’s, it’s, it’s harder for them to, um, to get that into their, uh, the, the
workflow or their working process. I think that is the barrier at the moment. I would not
say it’s, it’s using a Wyscout because a Wyscout is, is, is video providing. And there is no
barrier using video because everyone is one to watch players, but I think that the barrier
is for, for the buy­in of, of, of data.”
Interviewer: Uh, so do you, there’s a generational gap between Scouts?
Respondent: ”Yes. Okay. For sure.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”Um, that is, um, that is up to club to club. Um, I know a lot of people, uh,
who, um, in, in clubs who do it differently. Um, we, if I take only Royal Antwerp, we, um,
we are only doing it in a, in, in Wyscout, because why is Cal­PASS, uh, uh, on, on the
platform, they have their own area called the scouting area, which is, um, I think I kind of
like, it it’s, it’s, it’s optimized well for, for all the Scouts. So when everyone does report,
the, the, the report will come up in the scouting area and the, and the chief scout will get,
uh, all the reports in the same area. And then, uh, then for example, if, if there are multiple
people watching the same player, they will get this, uh, kind of average rating on him. Um,
but when we do it, uh, in, in white scout and the reports we make are often based on, on
video scouting, and if we go to a live game, they will do more of a rat match report thing,
Data Analysis in Football Scouting
but it will still be there. So, um,.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”Um, yes, I do. I was, um, I’m, I’m from Sweden and I started out, uh, a
couple of years ago, uh, working for, uh, mid table club in the highest, uh, in the top flight
in Sweden. And, uh, w we don’t, we don’t have the budgets to scout the same players as
<inaudible>. So I was often doing scouting in lower leagues in Sweden, but also lower
leagues in, in whole of Scandinavia. Um, that was my, um, my area of scouting. And I
always felt that, um, the, the, the biggest, the, the matches are, are, they are tagged by,
um, not tagged by AI they’re humanly tag. And there, I think that for at least it was in, in
my, uh, personal experience, there were, um, there were more tagging errors in, in the
lower lakes than they were when I, for example, compared to watching top flight football
in Brazil at the moment. So, um, yes, I could find that it was not a huge issue, but, but
sure, I will. So I had to have that in the mind also when I, when I watched the data from
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: “Probably not, it depends how confidential it is supposed to. If a transfer
was supposed to go through, I wouldn’t feel comfortable no.”
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 5
• Player Database ­ 5
• Match Analysis ­ 3
• Opponent Analysis ­ 1
• Player Comparison ­ 5 ­”Um, um, it’s then I would say, um, that I would have to,
um, uh, to split that questioning to two, because I think there’s an absolute, um,
very important CEO of, of doing optimization of, of the work we do. Um, but I would
not, I would highly say yes, because if you have the scouting workflow and we have
analyzed player B, uh, and we have said that, okay, this statistics and we like this
much when we watch a video, so maybe we’ll give him a, a C rating play a B, and
then we have player a, which we also have watched on data, and we have watched
on video and we like more, so we give him a B maybe in raping, that is highly key,
but if you were only talking about statistics is not as important. So, um, so, um, only
in statistics, it’s another question. So do you mean the, the, the, the holistic view of
the player, or you only mean the, the comparing the statistics, the statistical profile?”
• Team Management ­ 3
• Shadow Teams ­ 3 ­ ”Um, first of all, shadow teams is a hugely important having a
shadow team is fine. Um, but I would, I would maybe say that, um, having it on the
platform is, is, is maybe not the most important thing at the moment. For example,
we, we are using Wyscout and there’s the possibility of having the shadow team
there, but we’re using it in an Excel file. Um, so, um, I’m not sure how you’re interpret
my, uh, my answer, but the, the having a shadow team is, is five. But having on the
Data Analysis in Football Scouting
platform, I would say that I value that, but I would maybe say a three again. Um,
there is an eye that is one of the reasons why we do it offline in an Excel file, and
then we send it, uh, in our own, uh, secure email, um, environment. And that is also,
uh, one of that, one of those cases. So yes, I would say, so.”
• AI generated statistics ­ 2 ­ ”Hmm. And this is also something that I think I have to
further elaborate on because, um, statistically, um, I think there is, um, uh, um, you
cannot, uh, football is not something that you can optimize at the moment, at least.
Um, and I think that, um, for example, if you take a club like ours, we have the, we
have the, the, the budget to, uh, to have all Intel. We have, we have the budget to
have statistical people who will skilled statistically and, and data wise. And we have
players that are L or, um, staff that are skilled, uh, analyzing player profiles on video
or live scouting. So I’m all about it. It is all about holistic watching the players, both
from video and, uh, the data analysis. And it is the job of my chief scout to collect
all the information. But I must say that though, I am from a generation, which uses
data quite a lot more than, than earlier generations. There is a sense of, at least
since I’m from Sweden, I know some other clubs that don’t have the money to have
the video scouting and the data scouting. So the data scouting is, is, um, is, um,
taking over sometimes because it can optimize and it is, uh, it is not as, um, uh, as
big as a cost having a real person. So these plans, they are using platforms that are
make players, but there’s, you have to be clear that data is, will always be, uh, it was
always be, um, it will always be an effect of what the player does and what the player
does and what the player can do is two different things. So for example, you can
have a, a dominant team in a second division playing possession based football, uh,
always want to build from the back and then have the 70 % of possession. Then the
center backs will always have high, uh, passing metrics because they will play a lot
of passes. And then will they, the, um, the, yeah, they’re, they’re, uh, buildup, um,
metrics will always be really high. And then you have another guy playing in the,
in, uh, lower down the table in a, in a team that mainly plays long balls. Uh, then
he, statistics was not, will not be as good. And, uh, but, but they can still have the
same qualities. And this is a problem. I feel lower down the food chain of European
football that it’s for say in, in, in Scandinavia, and at least in Sweden, you have
clubs that will mainly sign place more and more from data. And you, uh, first of all,
you tend to only sign the players. Uh, you, you miss a lot of, of, of opportunities
of signing other players. Um, so I, I’m not sure if I were, um, too long at that, but I
would say that the most important thing is having data. Yeah. The automated data.
I don’t feel it. It is not, uh, it is not that important. So I would say two, but obviously
data is five, but the optimization of data is not as important.”
• Video Analysis ­ 2 ­ ”Um, the, the video tagging for us is, is, uh, is not as important.
So I would be quite low on that, but there have to be, if, if, if I elaborate on it a bit
more as well, I think that, uh, there are still leaves that is not tagged on Wyscout.
So for example, you have in Algeria, for example, uh, there’s a club named the
Pardue, which have a, um, roller in front of a lot of talents over the years, but, um,
the, that league is not tagged, which, uh, which is a problem and we have not done
it, but I know for a fact that other clubs, which with a lot of budget, they tend to, to,
um, collect their own, uh, data on the player then so they can tag it. But, um, for
example, I’m not sure if you’re aware of it, but, uh, Riyad, Mahrez in, uh, in now
playing at City. Uh, I know that, uh, when he was playing for, for Leicester city, it
was playing in the name when, unless the City wants to sign him and was playing in
the second division on France. And this was, um, earlier, uh, maybe 2000 around
Data Analysis in Football Scouting
2010 or something. And there were not enough data on him, so they collected their
own data, but that was a Premier League club. So the, the, um, the usefulness of,
of, of tagging is short can be something additional, but I would still say two, but, uh,
because, um, yeah.”
• Player Registration ­ 4 ­ ”Um, okay. So then I maybe have to take the position of a
chief scout, um, because this is not really my area, but I would, I, for a fact know, um,
but obviously a chief scout one to erase as much, uh, um, work, uh, as possible,
uh, to be able to focus on the football side, but I am, I am, I would be skeptical,
but skeptical about using that kind of, of, um, of platform who would provide, um,
information of, of maybe, um, contract length or, or the, the, the viz, uh, and, and,
and so on, because, uh, uh, you would always, probably at least need to, to double
check and so on. How, how often would this information be updated? Because I
know for a fact there’s a trans you have transfermarkt, which we still use, uh, and,
and Wyscout are collecting the con contract, the data from there, or the contract
information into, into their own platform. But it’s, it’s, it’s, it’s not updated, uh, as,
as, as often as we want. And also the visa thing is, is something that could, um,
um, could also develop, uh, things can happen real quick. Uh, and then I think, so
this is something for sure that we would, if, if I were the chief scout, I would want
this information in my, in my own, uh, optimization and my own, uh, platform, but I
would not want to, to have that data. I would want to collect it myself, put it in my
own platform for my other, um, colleagues. I would, I would not want to collect it
from, from maybe, um, external source. Yeah.”
• AI generated Player performance prediction­ 3 ­ ”Yeah. This is, uh, something that
is, um, on the uprise. And this is for sure, something, something interesting. I think
there is, um, absolutely the, the, where I see the potential storage, a small, which
is working with younger players. Um, for sure. I know that I have a, I have friends
in the industry, which are more skilled than I am you seeing using data, and they
are already doing this on their own. Um, and it’s still there. It will never be, uh, uh,
um, uh, it would not be the, the, what we would only base our, um, our opinion on a
player and so on. But I would say maybe a three, because it would still it’s, it’s still
useful, but it would not be a decisive.”
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: “Um, the, um, scouting work is repetitive. That is for sure. I think that if
you want to work with, uh, with recruitment in football, you have to, this is the first thing
you have to, um, to be aware of that, uh, that it’s, it’s from the repetitiveness that, uh,
that the value comes from. So for sure, I think it can be repetitive sitting and watching,
because I do mostly wise, uh, video scouting, um, and that can be repetitive, but it still
adds value. So, um, I think that, um, if you want something, if I should say something that
may be, it’s taking too much time from my vowel workflow, it’s maybe when we, uh, when
we have meetings and discuss place, I think is not something that is only working with the
recruitment, the football, but that meetings takes too long time, but people want to, um,
express their things and discuss their opinions on players. And everyone are, if you work
with, uh, with scouting and football, you are probably very passionate about football. So
it’s sitting in discussing football with the other colleagues who are as passionate, that that
is for sure something positive, but it’s also something negative. So we have a very long
meetings discussing all types of football. Um, so if, uh, if I think still we could cut, uh, cut
corners on that to, uh, to be more efficient.”
Data Analysis in Football Scouting
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “Um, this is, um, I will, first of all, say more, um, concrete and better data.
So, uh, I would, I, I go to sleep, uh, dreaming about using StatsBomb data at times. Uh,
but we cannot afford this. This is, uh, something for the big, big clubs. Um, but, uh, they
are doing really, really interesting work. They have done their new, um, new, or they have
released a new kind of data set called 360, which, uh, values, uh, players action. And
I think that the data will only continue to be more, uh, more accurate and precise. So
for sure, uh, this is a more embedded data for sure. And then I would also say that I,
although it is probably quite hard to do, we still, it still lack a lot of good video material
on some leagues around the world. A lot of African there’s a lot of talent in Africa, but at
the moment, I think that there’s too much good video quality, um, to find places. So at
the moment, the players that are assigned from there mainly Scouts, uh, traveling there
and seeing the players live. But I think there’s a huge potential for, for better video quality
material in, um, in, especially in the continent of Africa.”
Interview 13
• Respondent: Interviewee 13
• Location: Zoom
• Date: May, 10th 2022
Interviewer: What is your department (formal name)?
Respondent: “Yeah, um, the department is scouting and recruitment.”
Interviewer: What is your role (formal name)?
Respondent: “I’m the head of the department.”
Interviewer: Which league does your team compete in?
Respondent: ”My role started, um, for a footballing group, which had two teams, one
place in the Austrian, Bundesliga, and the other place in the third division in Germany.”
Interviewer: Do you use any software that aids in the scouting process/talent identifica­
tion? If yes, which?
Respondent: “Um, we use, um, SciSports, Um, because SciSports also has the API
where you can get the data downloaded. Um, that part of the data, um, we use, um,
with a model I F um, develop myself. So we create an own map model out of it with, uh,
help with, um, in collaboration, with a data scientist from America, we’ve developed an
own model. Um, I also used the data for Tableau To, um, look at more specific profiles
and also to create radars according to our needs. So, so that’s also the main point why
we’ve chosen SciSports, um, because of cost and, um, what you get all for it it’s, um, in
our opinion, the best on the market because of the API APIs with the data and, um, by
doing all this, um, we have a very comprehensive look at, um, players because SciSports
is a plus minus model, which is a, um, top­down approach. Then, uh, our own model
created is an on­ball value system. And then the radars and the Tableau, um, graphs are
looking at playing styles and profiles, specific profiles. And then we capture all what you
can capture at the moment with data, or we have no tracking data from other leagues. So
something like pitch control we cannot do.”
Data Analysis in Football Scouting
Interviewer: Do you think there are any barriers today that prevent clubs from using scout­
ing software? If yes, which?
Respondent: ”Um, budget? Yeah, I’m on the only budget because this is, um, we have, it
costs you 18,000 per year. If I go with other providers, the price is even higher, but then
I don’t get the data with it. Yeah. If you look at the other, I mean, if I would want API
APIs with Wyscout, just the data, but not a sculpting tool with it, I think, let me sign for
10 leagues. I’m 50,000 here. Um, if you go with Opta, it’s even more, if you go with that
StatsBomb, which is the most granular and, um, data and the highest quality data, um,
you’re also, I mean, PI leaks cost you 30 cells per year, so yes, budget, but should I sync
and also knowledge within the club is a barrier and, um, yeah, open­mindedness people
are still very, yeah. Old school and football.”
Interviewer: So you believe that there is a generational gap, so older Scouts that as a
Respondent: ”Yes. And, and, um, it usually comes from the board level. Um, most boards
you have, um, successful business people. Yes. But as soon as they enter a football club,
um, they don’t make decisions based on evidence. They do it on, it was always done this
way, um, on emotions. Um, but not in a business sense.”
Interviewer: Which format is the data from the scouting process stored in? Is it stored in
sheets of paper, Word, Excel Sheets?
Respondent: ”Uh, when we, then when we do the whole, um, yeah, I mean, I have to­
gether with an American colleague, we’ve done an application, so on the internet, um,
where I do, um, download the visualization from it, the same I do from Tableau. Um, there
it’s a CSV file. Yes. And then in Tableau uploaded and the visualization, I do a screenshot
from it and all this together goes in a scouting report. Okay. So also the radars I create
that’s done in <inaudible> Where the data is uploaded as in an XML file.”
Interviewer: Do you feel that the data provided by the data providers have quality, relia­
Respondent: ”In terms of what you got? That’s a question. That’s one of the questions
we get from skeptics saying like, oh yeah, who’s counting it. Um, a child in Bangladesh
and then it’s not accurate. And if it’s not a hundred percent accurate, I don’t trust the data.
That’s a storyline basically. So, um, am I worried about the inaccuracy? Um, no, because
the quality is fairly good. It’s um, computer, how do you say, I mean, the computer pre­
tags and then the human, a qualified, trained human is going over it. So the quality is
surely higher than if, as an example, if I send a scout to match and, um, player has like a
hundred actions, He might get 90 % recorded in his mind and maybe even correctly, but
if a computer goes over it, he gets 95. And if a qualified and trained person goes over it,
he will get up to 98 or whatever. So the outcome will be better than the pure human on
the match. And we state you’re not covering one match of the player. You have like 10,
15, so you don’t get 90 data points, which one’s called would get you get a solves. And
then am I worried that they don’t get a hundred percent correct? No, because the human
eye would also not get even close to that.”
Interviewer: Would you feel safe trusting scouting data with a third­party platform?
Respondent: Not asked.
Interviewer: In the study made of available software, we identified several groups of func­
tionalities. Rate the following functionalities on their usefulness using a scale of 1 to 5
Data Analysis in Football Scouting
with 1 being “Not useful at all.” and 5 being “Very useful.”
• Scouting Reports ­ 5
• Player Database ­ 5
• Match Analysis ­ 1
• Opponent Analysis ­ Not used.
• Player Comparison ­ 3
• Team Management ­ Not used.
• Shadow Teams ­ 5
• AI generated statistics ­ Not used.
• Video Analysis ­ Not used.
• Player Registration ­ Not used.
• AI generated Player performance prediction­ Not used.
Interviewer: What areas of the scouting process do you find repetitive or that do not add
Respondent: “I cannot name anything, to be honest. I mean, what takes the most time is
to, um, check on tips from agents, which we get in. Yeah. Because that takes away from
your main work, but with these scouting tools, um, you’re much more efficient than in the
Interviewer: What type of functionalities/information do you think could exist or be used
to increase scouting accuracy/efficiency or player potential evaluation?
Respondent: “How we try to do it is basically they do take different approaches into con­
sideration. And, um, like I said, like a plus minus, um, like an on­ball value system, like a
profiling, like radars, um, and hope that we, yes. Um, get better accuracy. Um, we look
mainly as well at, um, team levels. So that’s, that’s a very important thing to consider.
How good is your team compared to the teams you recruit from a playing time of play­
ers? Um, development, curves of players. Um, I’ve done a research with the technical
university of Norway, where we looked into over 5,000 transfers. And these are the main
indicators for successful transfer to look at the, um, development curves, to look at the
playing time and the team level you recruit from so age and, um, language and culture
has no impact.”
Interviewer: Um, what about, uh, personality traits?
Respondent: ”They might have, they might have an impact. Yes. Which you cannot get
with scouting tools at the moment. I think this is also well in our process. We do a lot
of pre­work with the data to identify the right players. Then we do something like an eye
test where we watch on video. If the data reflects what we see on video, if it’s really well
captured. And if a player does well, then yeah, I test them. We do, um, detailed video,
analyzes over a couple of games of the player. And then if you’re still a target for us, we
try to go and see in life and try to get the background information. And this is I think, where
you get the personal traits when you, when you need a scout to be captured, to capture
what a player shows, what the data can not show.”
Data Analysis in Football Scouting
Data Analysis in Football Scouting
